DYNAMAX: Dynamic computing for Transformers and Mamba based architectures

要約

早期出口（EES）は、データサンプルの満足のいく予測信頼度が達成されたら、推論を動的に終了することにより、計算コストと遅延を削減するための有望なアプローチを提供します。
多くの作業は、EEをエンコーダーのみのトランス、デコーダーのみのアーキテクチャへのアプリケーション、さらに重要なことに、LLM領域の州空間アーキテクチャの新しいファミリーであるMAMBAモデルへのアプリケーションを不十分に調査したままです。
この作業では、早期出口メカニズムのためにMambaアーキテクチャのユニークな特性を活用する最初のフレームワークであるDynamaxを紹介します。
EESをMambaに統合するだけでなく、MambaベースとトランスベースのLLMの両方の効率的なEE分類器としてMambaを再利用し、その汎用性を示しています。
私たちの実験は、Codestral 7B MAMBAモデルと比較してMistral 7Bトランスを採用しています。これは、Truthfulqa、Coqa、Triviaqaなどのデータセットを使用して、計算の節約、精度、一貫性を評価します。
結果は、強力なEE分類器としてのMAMBAの適応性と、NLPタスク全体の計算コストとパフォーマンスの品質のバランスをとる効率を強調しています。
動的処理のためのMambaの固有の設計を活用することにより、埋め込まれたアプリケーションとリソース制約の環境でスケーラブルで効率的な推論のための経路を開きます。
この研究では、LLMの動的コンピューティングパラダイムを再定義する際に、MAMBAの変革の可能性を強調しています。

要約(オリジナル)

Early exits (EEs) offer a promising approach to reducing computational costs and latency by dynamically terminating inference once a satisfactory prediction confidence on a data sample is achieved. Although many works integrate EEs into encoder-only Transformers, their application to decoder-only architectures and, more importantly, Mamba models, a novel family of state-space architectures in the LLM realm, remains insufficiently explored. This work introduces DYNAMAX, the first framework to exploit the unique properties of Mamba architectures for early exit mechanisms. We not only integrate EEs into Mamba but also repurpose Mamba as an efficient EE classifier for both Mamba-based and transformer-based LLMs, showcasing its versatility. Our experiments employ the Mistral 7B transformer compared to the Codestral 7B Mamba model, using data sets such as TruthfulQA, CoQA, and TriviaQA to evaluate computational savings, accuracy, and consistency. The results highlight the adaptability of Mamba as a powerful EE classifier and its efficiency in balancing computational cost and performance quality across NLP tasks. By leveraging Mamba’s inherent design for dynamic processing, we open pathways for scalable and efficient inference in embedded applications and resource-constrained environments. This study underscores the transformative potential of Mamba in redefining dynamic computing paradigms for LLMs.

arxiv情報

著者	Miguel Nogales,Matteo Gambella,Manuel Roveri
発行日	2025-04-29 16:38:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DYNAMAX: Dynamic computing for Transformers and Mamba based architectures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー