PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models

要約

マルチモーダル大規模言語モデル (MLLM) は、ビジュアルタスク全体で優れたパフォーマンスを示しますが、マルチモーダル入力での長いコンテキストの処理による大量の計算量とメモリ需要によって効率が妨げられます。
これに対処するために、モデルのパフォーマンスを損なうことなく視覚的なトークンを効率的に削減する斬新なプラグアンドプレイのアプローチである PAR (プロンプトアウェアトークン削減) を導入します。
アテンションメカニズムに大きく依存し、クロスモーダルインタラクションを見逃していた以前の方法とは異なり、プロンプト認識戦略を使用して、重要なビジュアルトークンを適応的に識別してクラスター化します。
PAR は、ビジュアルコンテキストの冗長性を外部と内部の 2 つのタイプに分類します。
外部冗長性はセマンティック検索によって最小限に抑えられ、内部冗長性はトークンルーティングメカニズムを使用して対処されます。
この方法では、追加のトレーニングや複雑なアーキテクチャの変更を必要とせずに、計算負荷が大幅に軽減されます。
\textbf{実験結果は、さまざまな視覚的質問応答タスクにわたって、PAR がベースラインの精度の 97\% を維持しながら、89\% の圧縮率で FLOP を 83\% 削減することを示しています。} PAR の適応型設計により、2 倍のトークン削減を達成
従来のアプローチと比較して比率が向上し、パフォーマンスと効率のより良いバランスが可能になります。

要約(オリジナル)

Multimodal large language models (MLLMs) demonstrate strong performance across visual tasks, but their efficiency is hindered by significant computational and memory demands from processing long contexts in multimodal inputs. To address this, we introduce PAR (Prompt-Aware Token Reduction), a novel and plug-and-play approach that reduces visual tokens efficiently without compromising model performance. Unlike previous methods that rely heavily on attention mechanisms and overlooking cross-modal interactions , we uses a prompt-aware strategy to adpative identify and cluster essential visual tokens. PAR categorizes visual context redundancy into two types: external and internal. External redundancy is minimized through semantic retrieval, while internal redundancy is addressed using a token routing mechanism. This method substantially reduces computational load without requiring additional training or complex architectural modifications. \textbf{Experimental results demonstrate that across various visual question answering tasks, PAR reduces FLOPs by 83\% with a compression ratio of 89\%, while retaining 97\% of baseline accuracy.} The adaptive design of PAR achieves a 2x token reduction ratio compared to prior approaches, enabling a better balance between performance and efficiency.

arxiv情報

著者	Yingen Liu,Fan Wu,Ruihui Li,Zhuo Tang,Kenli Li
発行日	2024-12-02 08:43:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー