TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning

要約

タイトル: TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning

要約:
-視覚的および言語的概念を統一された空間で表す方法を評価するために、様々な視覚言語（V + L）表現学習手法が開発されているが、既存のデータセットはそれらを十分に評価していない。
-V+Lモデル向けにいくつかの新しい評価設定を提案し、クロスモーダル転送を含む。
-また、既存のV+Lベンチマークは、データセット全体でのグローバルな精度スコアを報告することが多く、モデルが失敗したり成功したりする具体的な推論タスクを特定するのが難しい。TraVLRという合成データセットを提供し、4つのV+L推論タスクから構成される。
-TraVLRの合成性質により、トレーニングおよびテスト分布をタスクに関連する次元に沿って制約することができ、分布外推定の評価が可能になる。TraVLRの各例は、シーンを2つの形式で冗長にエンコードするため、トレーニングまたはテスト中にドロップまたは追加できる。 TraVLRのリリースを行い、研究コミュニティ向けのオープンなチャレンジとする。
-4つの最先端のV+Lモデルのパフォーマンスを比較し、同じモダリティからのテスト例では良好な結果を示す一方、クロスモーダル転送には失敗し、1つのモダリティの追加または削除への対応には限界があることがわかった。

要約(オリジナル)

Numerous visio-linguistic (V+L) representation learning methods have been developed, yet existing datasets do not adequately evaluate the extent to which they represent visual and linguistic concepts in a unified space. We propose several novel evaluation settings for V+L models, including cross-modal transfer. Furthermore, existing V+L benchmarks often report global accuracy scores on the entire dataset, making it difficult to pinpoint the specific reasoning tasks that models fail and succeed at. We present TraVLR, a synthetic dataset comprising four V+L reasoning tasks. TraVLR’s synthetic nature allows us to constrain its training and testing distributions along task-relevant dimensions, enabling the evaluation of out-of-distribution generalisation. Each example in TraVLR redundantly encodes the scene in two modalities, allowing either to be dropped or added during training or testing without losing relevant information. We compare the performance of four state-of-the-art V+L models, finding that while they perform well on test examples from the same modality, they all fail at cross-modal transfer and have limited success accommodating the addition or deletion of one modality. We release TraVLR as an open challenge for the research community.

arxiv情報

著者	Keng Ji Chow,Samson Tan,Min-Yen Kan
発行日	2023-04-15 09:48:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー