- Overview http://www.robots.ox.ac.uk/~vgg/research/very_deep/
- Paper http://arxiv.org/pdf/1409.1556.pdf
- Overview http://googleresearch.blogspot.com/2014/09/building-deeper-understanding-of-images.html
- Paper http://arxiv.org/pdf/1409.4842v1.pdf
- Excellent description http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Paper http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
- Projet website: https://code.google.com/p/word2vec/
- Description: This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
- Related Paper:  Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.  Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.  Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
- Nice tutorial on Word2Vec http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors
- Project website: http://nlp.stanford.edu/projects/glove/
- Description: GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
- Paper: GloVe: Global Vectors for Word Representation http://nlp.stanford.edu/projects/glove/glove.pdf
- Description and nice tutorial https://spacy.io/blog/sense2vec-with-spacy
- Paper : http://arxiv.org/pdf/1511.06388v1.pdf
- Description and intuition about thought-vectors http://deeplearning4j.org/thoughtvectors
- Paper : http://arxiv.org/abs/1506.06726
- Code : https://github.com/ryankiros/skip-thoughts
Visual Question Answering (VQA) dataset: Based on images from the COCO dataset,it currently has 360K questions on 120K images. There are plans of releasing questions on the rest of the COCO images and an additional 50K abstract images. All the questions are human-generated, and were specifically designed to stump a “smart robot”.
Visual Madlibs: It contains fill-in-the-blank type questions along with standard question-answer pairs. It has 360K questions on 10K images from the COCO dataset. A lot of questions require high-level human cognition, such as describing what one feels on seeing an image.
Toronto COCO-QA Dataset: Automatically generated questions from the captions of the MS COCO dataset. At 115K questions, it is smaller than the VQA dataset. Answers are all one word.
DAQUAR - DAtaset for QUestion Answering on Real-world images: A much smaller dataset, with about 12K questions. This was one of the earliest datasets on image question and answering.
Other relevant works
- VQA: Visual Question Answering
- Exploring Models and Data for Image Question Answering
- Learning to Answer Questions From Image Using Convolutional Neural Network
November, 2015 (after deadline of CVPR)
- Deep Compositional Question Answering with Neural Module Networks
- An attention basedconvolutional neural network for visual question answering
- Are you talking to a machine? datasetand methods for multilingual image question answering
- Image question answering using convolutional neural networkwith dynamic parameter prediction
- Where to look: Focus regions for visual question answering
- Ask me anything: Free-form visual question answering based on knowledge from external sources
- Exploring question-guided spatial attention forvisual question answering
- Stacked attention networks for image questionanswering
- Simple Baseline for Visual Question Answering
- Dynamic Memory Networks for Visual and Textual Question Answering
- Baseline only, MIT https://github.com/metalbubble/VQAbaseline/
- VQA - VT vision, Virginia Tech https://github.com/VT-vision-lab/VQA_LSTM_CNN