Great Papers

Aggregators

CS

What Papers everyone should read

Math

A single paper everyone should read

General

Big Data

Computer Vision

Machine Learning

  • Breiman, L. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” Statistical Science 16:199–231.

###Supervised Learning

  • Regression: Panik, M. J. 2009. Regression Modeling: Methods, Theory, and Computation with SAS. Boca Raton, FL: CRC Press. (Disclosure: my favorite regression book.)
  • Decision tree: Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Belmont, CA: Wadsworth.
  • Random forest: Breiman, L. 2001. “Random Forests.” Machine Learning 45:5–32.
  • Gradient boosting: Friedman, J. H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 29:1189–1232.
  • Neural network: Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323:533–536.
  • Support vector machine: Cortes, C. and Vapnik, V. 1995. “Support-Vector Networks.” Machine Learning 20:273–297.
  • Naïve Bayes: Friedman, N., Geiger, D., and Goldszmidt, M. 1997. “Bayesian Network Classifiers.” Machine Learning 29:131–163.
  • Neighbors: Cover, T. and Hart, P. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13:21–27.
  • Gaussian processes: Seeger, M. 2004. “Gaussian Processes for Machine Learning.” International Journal of Neural Systems 14:69–106.

###Unsupervised Learning

  • A priori rules: Agrawal, R., Imieliński, T., and Swami, A. 1993. “Mining Association Rules between Sets of Items in Large Databases.” ACM SIGMOD Record 22:207–216.
  • k-means clustering: Hartigan, J. A. and Wong, M. A. 1979. “Algorithm AS 136: A k-Means Clustering Algorithm.” Journal of the Royal Statistical Society, Series C 28:100–108.
  • Mean shift clustering: Cheng, Y. 1995. “Mean Shift, Mode Seeking, and Clustering.” IEEE Transactions on Pattern Analysis and Machine Intelligence 17:790–799.
  • Spectral clustering: Von Luxburg, U. 2007. “A Tutorial on Spectral Clustering.” Statistics and Computing 17:395–416.
  • Kernel density estimation: Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Vol. 26. Boca Raton, FL: CRC Press.
  • Non-negative matrix factorization: Lee, D. D. and Seung, H. S. 1999. “Learning the Parts of Objects by Non-negative Matrix Factorization.” Nature 401:788–791.
  • Kernel PCA: Schölkopf, B., Smola, A., and Müller, K.-R. 1997. “Kernel Principal Component Analysis.” In Artificial Neural Networks—ICANN’97, 583–588. Berlin: Springer.
  • Sparse PCA: Zou, H., Hastie, T., and Tibshirani, R. 2006. “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics 15:265–286.
  • Singular value decomposition: Golub, G. H. and Reinsch, C. 1970. “Singular Value Decomposition and Least Squares Solutions.” Numerische Mathematik 14:403–420.

###Semi-Supervised Learning

  • Denoising autoencoders: Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A. 2008. “Extracting and Composing Robust Features with Denoising Autoencoders.” Proceedings of the 25th International Conference on Machine Learning. New York: ACM.
  • Expectation maximization: Nigam, K., McCallum, A.K., Thrun, S. and Mitchell, T. 2000. “Text Classification from Labeled and Unlabeled Documents using EM.” Machine Learning 39:103-134.
  • Manifold regularization: Belkin, M., Niyogi, P., and Sindhwani, V. 2006. “Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.” The Journal of Machine Learning Research 7:2399-2434.
  • Transductive support vector machines: Joachims, T. 1999. “Transductive Inference for Text Classification Using Support Vector Machines.” Proceedings of the 16th International Conference on Machine Learning. New York: ACM. (source (for ML) : http://qr.ae/LPuHs )