SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

要約

マルチモーダル大手言語モデル（MLLM）は、視覚機能を備えた事前に訓練された大手言語モデル（LLM）を拡張することにより、一般に導出されます。
この作業では、MLLMが注意メカニズムを分析することにより、MLLMSが視覚入力をどのように処理するかを調査します。
驚くべきスパース現象を明らかにします。LLMSの注意ヘッドの小さなサブセット（約5％未満）のみが視覚的理解に積極的に貢献しています。
これらのヘッドを効率的に識別するために、ターゲットを絞った応答分析を通じてヘッドレベルの視覚的関連性を定量化するトレーニングフリーのフレームワークを設計します。
この発見に基づいて、視覚スコアに基づいて非対称計算予算をLLMSのヘッドに割り当てるKVキャッシュ最適化戦略であるSparsemmを紹介し、MLLMSの推論を加速するための視覚ヘッドの格差を活用します。
視覚の特異性を無視する以前のKVキャッシュ加速方法と比較して、Sparsemmは、デコード中にストレスと視覚セマンティクスの保持を優先します。
主流のマルチモーダルベンチマーク全体の広範な評価は、SPARSEMMが優れた精度効率のトレードオフを達成することを示しています。
特に、SPARSEMMは、効率テストでパフォーマンスパリティを維持しながら、生成中に1.38倍のリアルタイム加速度と52％のメモリ削減を提供します。
私たちのプロジェクトは、https：//github.com/cr400af-a/sparsemmで開かれています。

要約(オリジナル)

Multimodal Large Language Models (MLLMs) are commonly derived by extending pre-trained Large Language Models (LLMs) with visual capabilities. In this work, we investigate how MLLMs process visual inputs by analyzing their attention mechanisms. We reveal a surprising sparsity phenomenon: only a small subset (approximately less than 5%) of attention heads in LLMs actively contribute to visual understanding, termed visual heads. To identify these heads efficiently, we design a training-free framework that quantifies head-level visual relevance through targeted response analysis. Building on this discovery, we introduce SparseMM, a KV-Cache optimization strategy that allocates asymmetric computation budgets to heads in LLMs based on their visual scores, leveraging the sparity of visual heads for accelerating the inference of MLLMs. Compared with prior KV-Cache acceleration methods that ignore the particularity of visual, SparseMM prioritizes stress and retaining visual semantics during decoding. Extensive evaluations across mainstream multimodal benchmarks demonstrate that SparseMM achieves superior accuracy-efficiency trade-offs. Notably, SparseMM delivers 1.38x real-time acceleration and 52% memory reduction during generation while maintaining performance parity on efficiency test. Our project is open sourced at https://github.com/CR400AF-A/SparseMM.

arxiv情報

著者	Jiahui Wang,Zuyan Liu,Yongming Rao,Jiwen Lu
発行日	2025-06-05 17:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー