Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

要約

タイトル: 言語モデルは、ゼロショットの動画質問応答のための因果知識抽出ツールである

要約:
– カジュアルなビデオ質問応答は、単に関連性や時間的関係だけでなく、ビデオ内の因果関係についてもクエリを投げます。
– 既存の質問合成手法は、テキスト説明を入力とする読解データセットで質問生成（QG）システムを事前学習させています。
– しかし、QGモデルは関連問題のみを学習し、「何をしているのか」といった「何をしているのか」という因果問題に関する知識をCVidQA（カジュアルビデオ質問応答）に不十分に転送します。
– そこで、因果関係を利用して質問-回答ペアを生成し、言語モデルからカジュアルコモンセンスの知識を利用するための新しいフレームワーク、CaKE-LMを提案しました。
– LMsから知識を抽出するために、CaKE-LMは行動（サッカー選手がボールを蹴る）でLMを提示し、意図（得点するために）を回収することで、二つのイベントを含む因果関係の質問を生成します。
– NExT-QAとCausal-VidQAデータセットにおいて、CaKE-LMは従来の方法に比べてゼロショットCVidQA精度が4％〜6％向上しました。
– 将来の研究における重要な知見を提供するために、包括的な分析を実施しました。

要約(オリジナル)

Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video. Existing question synthesis methods pre-trained question generation (QG) systems on reading comprehension datasets with text descriptions as inputs. However, QG models only learn to ask association questions (e.g., “what is someone doing…”) and result in inferior performance due to the poor transfer of association knowledge to CVidQA, which focuses on causal questions like “why is someone doing …”. Observing this, we proposed to exploit causal knowledge to generate question-answer pairs, and proposed a novel framework, Causal Knowledge Extraction from Language Models (CaKE-LM), leveraging causal commonsense knowledge from language models to tackle CVidQA. To extract knowledge from LMs, CaKE-LM generates causal questions containing two events with one triggering another (e.g., “score a goal” triggers “soccer player kicking ball”) by prompting LM with the action (soccer player kicking ball) to retrieve the intention (to score a goal). CaKE-LM significantly outperforms conventional methods by 4% to 6% of zero-shot CVidQA accuracy on NExT-QA and Causal-VidQA datasets. We also conduct comprehensive analyses and provide key findings for future research.

arxiv情報

著者	Hung-Ting Su,Yulei Niu,Xudong Lin,Winston H. Hsu,Shih-Fu Chang
発行日	2023-04-07 17:45:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー