Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

要約

汎用ビジョンモデルが幅広いタスクでますます効果を発揮するようになると、サポートするタスク全体で一貫性を保つことが不可欠になります。
一貫性のない AI モデルは、人間のユーザーにとって脆弱で信頼できないと見なされ、出力に依存する大規模なシステムに組み込むのがより困難になります。
予測が互いに一致しているかどうかを判断するのは難しいため、さまざまなモダリティの出力を含む可能性のある非常に異種のタスク間の一貫性を測定することは困難です。
解決策として、ベンチマークデータセット COCOCON を導入します。ここでは、複数のタスクのテストインスタンスを小さいながら意味的に意味のある方法で変更して作成したコントラストセットを使用して、ゴールドラベルを変更し、モデルが一貫しているかどうかを測定するためのメトリックを概説します。
タスク全体の元のインスタンスと摂動インスタンス。
最先端のシステムは、タスク間で驚くほど高度な一貫性のない動作に悩まされていることがわかりました。特に、より異種のタスクでは顕著です。
最後に、自動的に作成された大規模なクロスタスクコントラストセットに対して計算されたランク相関ベースの補助目標を使用して、大規模な統合モデルのマルチタスクの一貫性を向上させながら、ダウンストリームタスクでの元の精度を維持することを提案します。
プロジェクトの Web サイトは https://adymaharana.github.io/cococon/ にあります。

要約(オリジナル)

As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that might include outputs in different modalities is challenging since it is difficult to determine if the predictions are consistent with one another. As a solution, we introduce a benchmark dataset, COCOCON, where we use contrast sets created by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label, and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art systems suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks. Finally, we propose using a rank correlation-based auxiliary objective computed over large automatically created cross-task contrast sets to improve the multi-task consistency of large unified models, while retaining their original accuracy on downstream tasks. Project website available at https://adymaharana.github.io/cococon/

arxiv情報

著者	Adyasha Maharana,Amita Kamath,Christopher Clark,Mohit Bansal,Aniruddha Kembhavi
発行日	2023-03-28 16:57:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー