VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

要約

視覚言語事前トレーニング (VLP) モデルは、最近、多くのクロスモーダルな下流タスクを容易にすることに成功しました。
既存の研究のほとんどは、微調整された下流タスクのパフォーマンスを比較することによってシステムを評価しました。
ただし、ダウンストリームタスクの平均精度だけでは、各 VLP 手法の長所と短所についてほとんど情報が得られず、ましてやコミュニティが将来どのようにシステムを改善できるかについての洞察も得られません。
自然言語処理をテストするための CheckList からインスピレーションを得て、VLP モデルの機能を理解するための新しいフレームワークである VL-CheckList を活用します。
提案された方法は、VLP モデルの画像テキスト送信機能を 3 つのカテゴリ (オブジェクト、属性、関係) に分割し、新しい分類法を使用してこれら 3 つの側面をさらに細分化します。
私たちは、提案されたフレームワークを介して、最近人気のある 7 つの VLP モデルを分析するための包括的な調査を実施します。
結果は、下流のタスクのみの評価では見えなかった、比較されたモデル間の細かい差異を明らかにすることにより、提案された方法の有効性を確認します。
さらなる結果は、より良い VLP モデルを構築する上で有望な研究の方向性を示しています。
データとコードは https://github.com/om-ai-lab/VL-CheckList から入手できます。

要約(オリジナル)

Vision-Language Pretraining (VLP) models have recently successfully facilitated many cross-modal downstream tasks. Most existing works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream task accuracy provides little information about the pros and cons of each VLP method, let alone provides insights on how the community can improve the systems in the future. Inspired by the CheckList for testing natural language processing, we exploit VL-CheckList, a novel framework to understand the capabilities of VLP models. The proposed method divides the image-texting ability of a VLP model into three categories: objects, attributes, and relations, and uses a novel taxonomy to further break down these three aspects. We conduct comprehensive studies to analyze seven recently popular VLP models via the proposed framework. Results confirm the effectiveness of the proposed method by revealing fine-grained differences among the compared models that were not visible from downstream task-only evaluation. Further results show promising research direction in building better VLP models. Our data and code are available at: https://github.com/om-ai-lab/VL-CheckList.

arxiv情報

著者	Tiancheng Zhao,Tianqi Zhang,Mingwei Zhu,Haozhan Shen,Kyusong Lee,Xiaopeng Lu,Jianwei Yin
発行日	2023-06-22 16:55:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー