Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

要約

現在、大規模なビジョン言語モデル（LVLMS）の知覚能力を評価するために、多くのベンチマークが提案されています。
ただし、ほとんどのベンチマークは、既存のデータセットから画像を選択することにより質問を実行し、潜在的なデータリークを実現します。
また、これらのベンチマークは、現実的なスタイルの画像とクリーンなシナリオでLVLMSを評価することに焦点を当てているだけで、マルチスタイライズされた画像と騒々しいシナリオを未開拓のままにします。
これらの課題に応えて、合成画像を活用することによりLVLMSを評価するためのDyscaという名前のダイナミックでスケーラブルなベンチマークを提案します。
具体的には、安定した拡散を活用し、ルールベースの方法を設計して、新しい画像、質問、対応する回答を動的に生成します。
51種類の画像スタイルを検討し、20のサブタスクの知覚能力を評価します。
さらに、4つのシナリオ（すなわち、クリーン、腐敗、印刷攻撃、敵対的な攻撃）と3つの質問タイプ（つまり、マルチ選択、真またはファルス、フリーフォーム）で評価を実施します。
生成パラダイムのおかげで、Dyscaは、新しいサブタスクやシナリオを簡単に追加するためのスケーラブルなベンチマークとして機能します。
合計24の高度なオープンソースLVLMSと2つのクローズソースLVLMがDYSCAで評価され、現在のLVLMの欠点が明らかになります。
ベンチマークは\ url {https://github.com/robin-wzq/dysca}でリリースされます。

要約(オリジナル)

Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored. In response to these challenges, we propose a dynamic and scalable benchmark named Dysca for evaluating LVLMs by leveraging synthesis images. Specifically, we leverage Stable Diffusion and design a rule-based method to dynamically generate novel images, questions and the corresponding answers. We consider 51 kinds of image styles and evaluate the perception capability in 20 subtasks. Moreover, we conduct evaluations under 4 scenarios (i.e., Clean, Corruption, Print Attacking and Adversarial Attacking) and 3 question types (i.e., Multi-choices, True-or-false and Free-form). Thanks to the generative paradigm, Dysca serves as a scalable benchmark for easily adding new subtasks and scenarios. A total of 24 advanced open-source LVLMs and 2 close-source LVLMs are evaluated on Dysca, revealing the drawbacks of current LVLMs. The benchmark is released at \url{https://github.com/Robin-WZQ/Dysca}.

arxiv情報

著者	Jie Zhang,Zhongqi Wang,Mengqi Lei,Zheng Yuan,Bei Yan,Shiguang Shan,Xilin Chen
発行日	2025-01-24 13:58:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー