MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

要約

マルチモーダル大規模言語モデル (MLLM) の最近の進歩により、視覚的な認識と理解において優れた能力が実証されました。
ただし、これらのモデルは幻覚症状にも悩まされており、AI システムとしての信頼性が制限されています。
私たちは、これらの幻覚の一部は、モデルが画像から何が知覚でき、何が知覚できないかを理解するのに苦労していることによるものであると考えています。この能力を知覚における自己認識と呼んでいます。
その重要性にもかかわらず、MLLM のこの側面は先行研究では見落とされてきました。
この論文では、知覚におけるMLLMの自己認識を定義し、評価することを目的としています。
これを行うために、まず知覚における知識象限を導入します。これは、MLLM が画像について何を知っていて何を知らないかを定義するのに役立ちます。
このフレームワークを使用して、この能力を評価するために特別に設計された、MLLM の知覚における自己認識 (MM-SAP) という新しいベンチマークを提案します。
私たちは MM-SAP をさまざまな人気のある MLLM に適用し、彼らの自己認識の包括的な分析を提供し、詳細な洞察を提供します。
実験結果は、現在の MLLM の自己認識能力が限られていることを明らかにし、信頼できる MLLM の開発における将来の進歩にとって重要な領域を示しています。
コードとデータは https://github.com/YHWmz/MM-SAP で入手できます。

要約(オリジナル)

Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding. However, these models also suffer from hallucinations, which limit their reliability as AI systems. We believe that these hallucinations are partially due to the models’ struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception. Despite its importance, this aspect of MLLMs has been overlooked in prior studies. In this paper, we aim to define and evaluate the self-awareness of MLLMs in perception. To do this, we first introduce the knowledge quadrant in perception, which helps define what MLLMs know and do not know about images. Using this framework, we propose a novel benchmark, the Self-Awareness in Perception for MLLMs (MM-SAP), specifically designed to assess this capability. We apply MM-SAP to a variety of popular MLLMs, offering a comprehensive analysis of their self-awareness and providing detailed insights. The experiment results reveal that current MLLMs possess limited self-awareness capabilities, pointing to a crucial area for future advancement in the development of trustworthy MLLMs. Code and data are available at https://github.com/YHWmz/MM-SAP.

arxiv情報

著者	Yuhao Wang,Yusheng Liao,Heyang Liu,Hongcheng Liu,Yu Wang,Yanfeng Wang
発行日	2024-02-26 09:28:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー