## Aggregators

- Short Science - (summaries) http://www.shortscience.org/venue?key=conf/nips
- VK - Deep Learning community (Papers, codes, articles) https://vk.com/deeplearning
- Arix-Sanity - better arxiv search http://www.arxiv-sanity.com/
- GitXiv (Arxiv and Git code) http://gitxiv.com/
- Arxiv-Vanity - HTML rendeing of Arxiv pages https://www.arxiv-vanity.com/

### General

- CS: What Papers everyone should read http://cstheory.stackexchange.com/questions/1168/what-papers-should-everyone-read
- MATH: A single paper everyone should read http://mathoverflow.net/questions/2144/a-single-paper-everyone-should-read
- How to read a paper - Keshava http://blizzard.cs.uwaterloo.ca/keshav/home/Papers/data/07/paper-reading.pdf
- How to write a proof - Leslie Lamport http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-how-to-write.pdf
- How to write papers Oded Goldreich http://www.wisdom.weizmann.ac.il/~oded/PS/re-writing.pdf
- How not to write papers Oded Goldreich http://www.cs.iastate.edu/~honavar/write-not.pdf

### Big Data

- The PageRank Citation Ranking: Bringing Order to the Web
- MapReduce: Simplified Data Processing on Large Clusters
- The Google File System
- Amazon’s Dynamo
- Bigtable: A Distributed Storage System for Structured Data
- A Few Useful Things to Know about Machine Learning
- Random Forests
- A Relational Model of Data for Large Shared Data Banks
- Map-Reduce for Machine Learning on Multicore
- Pasting Small Votes for Classification in Large Databases and On-Line
- Recommendations Item-to-Item Collaborative Filtering
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
- Spanner: Google’s Globally-Distributed Database
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- F1: A Distributed SQL Database That Scales
- APACHE DRILL: Interactive Ad-Hoc Analysis at Scale
- A New Approach to Linear Filtering and Prediction Problems
- Top 10 algorithms on Data mining
- Resilient Distributed Data - SPARK

### Computer Vision

- Building High-level Features Using Large Scale Unsupervised Learning
- Distinctive image features from scale-invariant keypoints
- A theory for multiresolution signal decomposition: The wavelet representation
- A computational approach to edge detection
- Snakes: Active contour models
- Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Eigenfaces for Recognition
- Determining optical flow
- Scale-space and edge detection using anisotropic diffusion
- Rapid object detection using a boosted cascade of simple features
- An iterative image registration technique with an application to stereo vision
- Normalized cuts and image segmentation
- Histograms of oriented gradients for human detection
- Mean shift: A robust approach toward feature space analysis
- The Laplacian pyramid as a compact image code
- Condensation—conditional density propagation for visual tracking
- Good features to track
- A model of saliency-based visual attention for rapid scene analysis
- A performance evaluation of local descriptors
- Fast approximate energy minimization via graph cuts
- Surf: Speeded up robust features
- Neural network-based face detection
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images
- Shape matching and object recognition using shape contexts
- Shape modeling with front propagation: A level set approach
- The structure of images
- Shape and motion from image streams under orthography: a factorization method
- Active appearance models
- Scale & affine invariant interest point detectors
- Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach
- Feature extraction from faces using deformable templates
- Region competition: Unifying snakes, region growing, and Bayes/MDL for multiband image segmentation
- Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
- Face detection in color images
- Efficient graph-based image segmentation
- Visual categorization with bags of keypoints
- Object class recognition by unsupervised scale-invariant learning
- Recovering high dynamic range radiance maps from photographs
- A comparison of affine region detectors
- A bayesian hierarchical model for learning natural scene categories

### Machine Learning

- Breiman, L. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” Statistical Science 16:199–231.

### Supervised Learning

- Regression: Panik, M. J. 2009. Regression Modeling: Methods, Theory, and Computation with SAS. Boca Raton, FL: CRC Press. (Disclosure: my favorite regression book.)
- Decision tree: Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Belmont, CA: Wadsworth.
- Random forest: Breiman, L. 2001. “Random Forests.” Machine Learning 45:5–32.
- Gradient boosting: Friedman, J. H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 29:1189–1232.
- Neural network: Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323:533–536.
- Support vector machine: Cortes, C. and Vapnik, V. 1995. “Support-Vector Networks.” Machine Learning 20:273–297.
- Naïve Bayes: Friedman, N., Geiger, D., and Goldszmidt, M. 1997. “Bayesian Network Classifiers.” Machine Learning 29:131–163.
- Neighbors: Cover, T. and Hart, P. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13:21–27.
- Gaussian processes: Seeger, M. 2004. “Gaussian Processes for Machine Learning.” International Journal of Neural Systems 14:69–106.

### Unsupervised Learning

- A priori rules: Agrawal, R., Imieliński, T., and Swami, A. 1993. “Mining Association Rules between Sets of Items in Large Databases.” ACM SIGMOD Record 22:207–216.
- k-means clustering: Hartigan, J. A. and Wong, M. A. 1979. “Algorithm AS 136: A k-Means Clustering Algorithm.” Journal of the Royal Statistical Society, Series C 28:100–108.
- Mean shift clustering: Cheng, Y. 1995. “Mean Shift, Mode Seeking, and Clustering.” IEEE Transactions on Pattern Analysis and Machine Intelligence 17:790–799.
- Spectral clustering: Von Luxburg, U. 2007. “A Tutorial on Spectral Clustering.” Statistics and Computing 17:395–416.
- Kernel density estimation: Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Vol. 26. Boca Raton, FL: CRC Press.
- Non-negative matrix factorization: Lee, D. D. and Seung, H. S. 1999. “Learning the Parts of Objects by Non-negative Matrix Factorization.” Nature 401:788–791.
- Kernel PCA: Schölkopf, B., Smola, A., and Müller, K.-R. 1997. “Kernel Principal Component Analysis.” In Artificial Neural Networks—ICANN’97, 583–588. Berlin: Springer.
- Sparse PCA: Zou, H., Hastie, T., and Tibshirani, R. 2006. “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics 15:265–286.
- Singular value decomposition: Golub, G. H. and Reinsch, C. 1970. “Singular Value Decomposition and Least Squares Solutions.” Numerische Mathematik 14:403–420.

### Semi-Supervised Learning

- Denoising autoencoders: Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A. 2008. “Extracting and Composing Robust Features with Denoising Autoencoders.” Proceedings of the 25th International Conference on Machine Learning. New York: ACM.
- Expectation maximization: Nigam, K., McCallum, A.K., Thrun, S. and Mitchell, T. 2000. “Text Classification from Labeled and Unlabeled Documents using EM.” Machine Learning 39:103-134.
- Manifold regularization: Belkin, M., Niyogi, P., and Sindhwani, V. 2006. “Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.” The Journal of Machine Learning Research 7:2399-2434.
- Transductive support vector machines: Joachims, T. 1999. “Transductive Inference for Text Classification Using Support Vector Machines.” Proceedings of the 16th International Conference on Machine Learning. New York: ACM. (source (for ML) : http://qr.ae/LPuHs )

### Other applications

- Character Recognition (SOTA): Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks