Results
VQA Dataset
(as self-published by authors- not verified)
Results below are for testdev-2015, except the final column which is for test-standard
Method | All | Y/N | Other | Num | Test-Std[All] |
---|---|---|---|---|---|
~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~ | ~~~~~~~~~~~~ | ~~~~~~~~~ | ~~~~~~~~~~ | ~~~~~~~~ |
Image | 28.1 | 64.0 | 3.8 | 0.4 | - |
Question | 48.1 | 75.7 | 27.1 | 36.7 | - |
Q+I | 52.6 | 75.6 | 37.4 | 33.7 | - |
LSTM Q+I | 53.7 | 78.9 | 36.4 | 35.2 | 54.1 |
[16CMV] | 52.6 | 78.3 | 35.9 | 34.4 | - |
[09AMA] | 55.7 | 79.2 | 40.1 | 36.1 | 56.0 |
[13BOW] | 55.7 | 76.5 | 42.6 | 35.0 | 55.9 |
[07DPP] | 57.2 | 80.7 | 41.7 | 37.2 | 57.4 |
[17LCN] | 57.9 | 80.5 | 43.1 | 37.4 | 58.0 |
[11AAA] | 57.9 | 80.8 | 43.2 | 37.3 | 58.2 |
[12SAN] | 58.7 | 79.3 | 46.1 | 36.6 | 58.9 |
[15DMN] | 60.3 | 80.5 | 48.3 | 36.8 | 60.4 |
OUR | 60.4 | 81.5 | 47.6 | 37.2 | 60.7 |
List of papers
- [01VQA] VQA: Visual Question Answering
- [02EMD] Exploring Models and Data for Image Question Answering
- [03LAQ] Learning to Answer Questions From Image Using Convolutional Neural Network
- [04DCQ] Deep Compositional Question Answering with Neural Module Networks
- [05ABC] An attention based convolutional neural network for visual question answering
- [06ATM] Are you talking to a machine? datasetand methods for multilingual image question answering
- [07DPP] Image question answering using convolutional neural networkwith dynamic parameter prediction
- [08WTL] Where to look: Focus regions for visual question answering
- [09AMA] Ask me anything: Free-form visual question answering based on knowledge from external sources
- [10V7W] Visual7W: Grounded Question Answering in Images
- [11AAA] Ask, Attend and Answer: Exploring question-guided spatial attention for visual question answering
- [12SAN] Stacked attention networks for image questionanswering
- [13BOW] Simple Baseline for Visual Question Answering
- [14ICV] Image Captioning & Visual Question Answering Based on Attributes & External Knowledge
- [15DMN] Dynamic Memory Networks for Visual and Textual Question Answering
- [16CMV] Compositional Memory for Visual Question Answering
- [17LCN] Learning to Compose Neural Networks for Question Answering