Exploring Context Window of Large Language Models via Decomposed Positional Vectors

要約

Transformer ベースの大規模言語モデル (LLM) では通常、コンテキストウィンドウが制限されているため、コンテキストウィンドウの長さを超えるテキストを処理すると、パフォーマンスが大幅に低下します。
コンテキストウィンドウを拡張し、LLM の長さの外挿を実現するために広範な研究が提案されていますが、これらのアプローチの詳細な解釈はまだ不足しています。
この研究では、LLM の基礎となるメカニズムを解読するために、コンテキストウィンドウ内およびコンテキストウィンドウを超えた位置情報を調査します。
平均に基づく分解法を使用することで、LLM の隠れ状態から位置ベクトルを解きほぐし、その形成と注意への影響を分析します。
さらに、テキストがコンテキストウィンドウを超える場合、位置ベクトルの変化を直接外挿とコンテキストウィンドウ拡張の 2 つの設定で分析します。
私たちの発見に基づいて、位置ベクトル置換とアテンションウィンドウ拡張という 2 つのトレーニング不要のコンテキストウィンドウ拡張方法を設計します。
実験結果は、私たちの方法がコンテキストウィンドウの長さを効果的に拡張できることを示しています。

要約(オリジナル)

Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can effectively extend the context window length.

arxiv情報

著者	Zican Dong,Junyi Li,Xin Men,Wayne Xin Zhao,Bingbing Wang,Zhen Tian,Weipeng Chen,Ji-Rong Wen
発行日	2024-11-18 11:15:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー