Hansel: Output Length Controlling Framework for Large Language Models

要約

大規模言語モデル (LLM) は大きな成功を収めましたが、出力シーケンスの長さを効率的に制御することは依然として課題です。
この論文では、LLM の生成能力に影響を与えることなく長さを制御するための効率的なフレームワークである Hansel を提案します。
Hansel は、定期的に出力される非表示の特殊トークンを利用して、出力シーケンスの残りのターゲット長を追跡します。
出力の突然の終了を回避するテクニックと併せて、この一見単純な方法は、生成されたテキストの一貫性や流暢さを損なうことなく、効率的かつ多用途であることが証明されました。
このフレームワークは、元の位置エンコード方法に関係なく、モデルの微調整段階で事前トレーニングされた LLM に適用できます。
Hansel を使用して 4 つの異なる LLM を微調整することでこれを実証し、プロンプトベースの長さ制御の微調整と比較して、出力シーケンスの平均絶対誤差がすべてのモデルとデータセットで大幅に減少することを示します。
さらに、このフレームワークは、長いダイアログの応答や非常に短い要約など、微調整中には見えないターゲットの長さを外挿する能力が大幅に向上していることを示しました。
これは、モデルが出力の長さをトレーニング中に見られた長さに一致させることを学習するのではなく、長さ制御の一般的な手段を学習することを示しています。

要約(オリジナル)

Despite the great success of large language models (LLMs), efficiently controlling the length of the output sequence still remains a challenge. In this paper, we propose Hansel, an efficient framework for length control in LLMs without affecting its generation ability. Hansel utilizes periodically outputted hidden special tokens to keep track of the remaining target length of the output sequence. Together with techniques to avoid abrupt termination of the output, this seemingly simple method proved to be efficient and versatile, while not harming the coherency and fluency of the generated text. The framework can be applied to any pre-trained LLMs during the finetuning stage of the model, regardless of its original positional encoding method. We demonstrate this by finetuning four different LLMs with Hansel and show that the mean absolute error of the output sequence decreases significantly in every model and dataset compared to the prompt-based length control finetuning. Moreover, the framework showed a substantially improved ability to extrapolate to target lengths unseen during finetuning, such as long dialog responses or extremely short summaries. This indicates that the model learns the general means of length control, rather than learning to match output lengths to those seen during training.

arxiv情報

著者	Seoha Song,Junhyun Lee,Hyeonmok Ko
発行日	2024-12-18 16:52:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hansel: Output Length Controlling Framework for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー