LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

要約

大規模言語モデル (LLM) は複雑なタスクで優れたパフォーマンスを示しますが、依然として文脈の理解に時間がかかり、高い計算コストに悩まされています。
効率と品質のバランスをとるために、クエリに依存しないアテンションステアリングを通じて LLM を強化する、微調整不要のフレームワークである LLMSteer を導入します。
一般的な LLM とデータセットでテストされた LLMSteer は、最近のアテンションステアリング手法と比較して、ベースラインとのパフォーマンスの差を 65.9% 縮小し、実行時間の遅延を最大 4.8 倍削減します。

要約(オリジナル)

As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8x compared to recent attention steering methods.

arxiv情報

著者	Zhuohan Gu,Jiayi Yao,Kuntai Du,Junchen Jiang
発行日	2024-11-21 16:49:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー