Tgif Not Working Tgif Never Working Again
The TGIF-QA dataset contains 165K QA pairs for the blithe GIFs from the TGIF dataset [Li et al. CVPR 2016]. The question & respond pairs are nerveless via crowdsourcing with a advisedly designed user interface to ensure quality. The dataset can be used to evaluate video-based Visual Question Answering techniques.
In this page, you can find the code and the dataset for our IJCV journal article.
- Yunseok Jang, Yale Song, Chris Dongjoo Kim, Youngjae Yu, Youngjin Kim and Gunhee Kim. Video Question Answering with Spatio-Temporal Reasoning. IJCV, 2019. [Journal Link]
Please check this tag if you are interested in our CVPR 2017 setting.
The code and the dataset are free to use for academic purposes only. If y'all use whatsoever of the cloth in this repository equally part of your piece of work, we enquire you to cite:
@commodity{jang-IJCV-2019, author = {Yunseok Jang and Yale Vocal and Chris Dongjoo Kim and Youngjae Yu and Youngjin Kim and Gunhee Kim}, championship = {{Video Question Answering with Spatio-Temporal Reasoning}} journal = {IJCV}, yr = {2019} }
Note: Since our CVPR 2017 paper, nosotros extended our dataset past collecting more question and reply pairs (the full count has increased from 104K to 165K) and re-ran experiments with the new dataset. The journal article and the arXiv newspaper is the most update one.
Have any question? Please contact:
Yunseok Jang (yunseok.jang@snu.ac.kr), Chris Dongjoo Kim (cdjkim@vision.snu.air conditioning.kr), and Yale Song (yalesong@microsoft.com)
Q&A Types and Examples
Q&A Type | Repetition Count | Repeating Activeness | Country Transition | Frame QA |
---|---|---|---|---|
Visual Input (GIF) | ||||
Question | How many times does the cat lick? | What does the cat do 3 times? | What does the model practice after lower coat? | What is the colour of the bulldog? |
Answer | 7 times | Put head downwardly | Pin around | Dark-brown |
# Q&A Pairs
Job | Train | Test | Total |
---|---|---|---|
Repetition Count | 26,843 | 3,554 | 30,397 |
Repeating Activeness | 20,475 | ii,274 | 22,749 |
State Transition | 52,704 | vi,232 | 58,936 |
Frame QA | 39,392 | 13,691 | 53,083 |
Total | 139,414 | 25,751 | 165,165 |
Quantitative Results
Model | Repetition Count (L2 loss) | Repeating Activity (Accuracy) | State Transition (Accuracy) | Frame QA (Accuracy) |
---|---|---|---|---|
Random Chance | xix.62 | 20.00 | 20.00 | 0.06 |
Most Frequent words | 7.78 | 31.40 | 30.05 | 17.49 |
VIS+LSTM (aggr) [NIPS 2015] | 5.09 | 46.84 | 56.85 | 34.59 |
VIS+LSTM (avg) [NIPS 2015] | 4.81 | 48.77 | 34.82 | 34.97 |
VQA-MCB (aggr) [EMNLP 2016] | 5.17 | 58.85 | 24.27 | 25.70 |
VQA-MCB (avg) [EMNLP 2016] | 5.54 | 29.13 | 32.96 | 15.49 |
CT-SAN [CVPR 2017] | 5.xiv | 56.14 | 63.95 | 39.64 |
Co-Retention [CVPR 2018] | four.ten | 68.20 | 74.30 | 51.50 |
ST-VQA (Ours) | four.22 | 73.48 | 79.72 | 51.96 |
Qualitative Results
Spatial Attention
Temporal Attention
The blood-red dotted boxes over heatmaps indicate segments in a video that include the ground-truth answers.
Attentions Visualized in Time
The yellow bar indicates the forcefulness of temporal attention at the visualized time.
Q&A Blazon | Repetition Count | Repeating Activeness | State Transition | Frame QA |
---|---|---|---|---|
Lively Visual (GIF) | ||||
Question | How many times does the human shave chest ? | What does the male child practice 3 times ? | What does the human being practice before kiss toy ? | What are the group of boys singing , dancing , and playing ? |
Answer | 2 times | Wave hands | Pet toy | Instruments |
Lively Visual (GIF) | ||||
Question | How many times does the man flip circle ? | What does the behind practice 3 times ? | What does the woman do later on raise leg ? | What is the colour of the shirt ? |
Answer | 2 times | Shake butt | Kick a mug | White |
Notes
Last Edit: May 22, 2020
radfordablemplaid.blogspot.com
Source: https://github.com/YunseokJANG/tgif-qa
0 Response to "Tgif Not Working Tgif Never Working Again"
Postar um comentário