Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study

要約

Vision-Language Models（VLMS）は、広範囲にわたる実用的な展開のために強力でありながら計算的に集中しています。
費用のかかる再トレーニングなしでこのような課題に対処するために、量子化やトークン削減などのトレーニング後の加速技術が広範囲に調査されています。
ただし、現在の加速評価は、主に最小限の全体的なパフォーマンス劣化をターゲットにしており、重要な質問を見下ろしています。加速モデルは、加速前と同じ質問に同じ答えを与えていますか？
これは、AIベースの疾患診断など、特定の既知の状況に対して一貫して回答が最も重要である安定性中心の産業用途にとって不可欠です。
加速VLMSについてこれを体系的に調査し、10のマルチモーダルベンチマークで8つの加速方法で4つの主要なモデル（Llava-1.5、Llava-Next、QWEN2-VL、QWEN2.5-VL）をテストします。
私たちの調査結果は厳しいものです。最小限の集計パフォーマンスの低下にもかかわらず、加速モデルは元の答えを20％の時間まで変更しました。
重大なことに、これらの変更の最大6.5％は正解を誤って変換しました。
入力の摂動により、これらの矛盾が拡大され、この傾向は医療VLM Llava-Medによるケーススタディによって確認されています。
この研究は、VLM加速における重大な監視を明らかにし、信頼できる現実世界の展開を確保するために、例えばレベルの安定性チェックの緊急のニーズを強調しています。

要約(オリジナル)

Vision-Language Models (VLMs) are powerful yet computationally intensive for widespread practical deployments. To address such challenge without costly re-training, post-training acceleration techniques like quantization and token reduction are extensively explored. However, current acceleration evaluations primarily target minimal overall performance degradation, overlooking a crucial question: does the accelerated model still give the same answers to the same questions as it did before acceleration? This is vital for stability-centered industrial applications where consistently correct answers for specific, known situations are paramount, such as in AI-based disease diagnosis. We systematically investigate this for accelerated VLMs, testing four leading models (LLaVA-1.5, LLaVA-Next, Qwen2-VL, Qwen2.5-VL) with eight acceleration methods on ten multi-modal benchmarks. Our findings are stark: despite minimal aggregate performance drops, accelerated models changed original answers up to 20% of the time. Critically, up to 6.5% of these changes converted correct answers to incorrect. Input perturbations magnified these inconsistencies, and the trend is confirmed by case studies with the medical VLM LLaVA-Med. This research reveals a significant oversight in VLM acceleration, stressing an urgent need for instance-level stability checks to ensure trustworthy real-world deployment.

arxiv情報

著者	Yizheng Sun,Hao Li,Chang Xu,Hongpeng Zhou,Chenghua Lin,Riza Batista-Navarro,Jingyuan Sun
発行日	2025-05-20 14:31:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー