Instruction Position Matters in Sequence Generation with Large Language Models

要約

大規模言語モデル (LLM) は、命令の微調整を通じて、翻訳や要約などの条件付きシーケンス生成タスクを実行できます。
微調整データは通常、特定のタスク命令、入力文、および対応する応答から順次連結されます。
LLM の自己注意メカニズムによってモデル化された局所性を考慮すると、これらのモデルは、長い入力文に対する応答を生成するときに指示を忘れるリスクに直面します。
この問題を軽減するために、入力文の後にタスク命令の位置を移動することにより、LLM の命令追従能力を強化することを提案します。
理論的分析は、私たちの単純な方法がモデルの学習の焦点を変更し、それによって指示に従う能力のトレーニングを強調できることを示唆しています。
同時に、実験結果は、追加のデータやアノテーションのコストを必要とせずに、さまざまなモデルスケール（1B / 7B / 13B）およびさまざまなシーケンス生成タスク（翻訳と要約）にわたって、私たちのアプローチが従来の設定よりも一貫して優れていることを示しています。
特に、我々の方法は、条件付きシーケンス生成におけるゼロショットのパフォーマンスを大幅に向上させ、たとえば、WMT ゼロショット変換タスクで最大 9.7 BLEU ポイントを向上させます。

要約(オリジナル)

Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization, through instruction fine-tuning. The fine-tuning data is generally sequentially concatenated from a specific task instruction, an input sentence, and the corresponding response. Considering the locality modeled by the self-attention mechanism of LLMs, these models face the risk of instruction forgetting when generating responses for long input sentences. To mitigate this issue, we propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences. Theoretical analysis suggests that our straightforward method can alter the model’s learning focus, thereby emphasizing the training of instruction-following capabilities. Concurrently, experimental results demonstrate that our approach consistently outperforms traditional settings across various model scales (1B / 7B / 13B) and different sequence generation tasks (translation and summarization), without any additional data or annotation costs. Notably, our method significantly improves the zero-shot performance on conditional sequence generation, e.g., up to 9.7 BLEU points on WMT zero-shot translation tasks.

arxiv情報

著者	Yijin Liu,Xianfeng Zeng,Fandong Meng,Jie Zhou
発行日	2023-08-23 12:36:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Instruction Position Matters in Sequence Generation with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー