LongForm: Effective Instruction Tuning with Reverse Instructions

要約

命令チューニングにより、言語モデルをより効果的に一般化し、ユーザーの意図にうまく従うことができます。
ただし、指示データの取得にはコストがかかり、困難です。
これまでの研究では、高価な人間によるアノテーション、位置合わせの問題を伴うクラウドソースのデータセット、LLM を介したノイズの多いサンプルの生成などの方法が採用されていました。
逆の手順で作成された LongForm-C データセットを紹介します。
私たちは、人間が書いたコーパスの例に対して、逆命令を使用して LLM 経由で命令を生成します。
まず、C4 や Wikipedia などのコーパスから人間が書いた文書の多様なセットを選択します。
次に、LLM を介してこれらのドキュメントの指示を生成します。
このアプローチにより、自然な出力と長いテキストの生成に適した、安価でクリーンな命令チューニングデータセットが提供されます。
私たちのモデルは、ストーリー/レシピの生成や長い形式の質問への回答などのタスクで命令をチューニングしなくても、10 倍大きい言語モデルよりも優れたパフォーマンスを発揮します。
さらに、LongForm モデルは、FLAN-T5 や Alpaca などの以前の命令調整モデルを大幅に上回り、言語理解能力をさらに向上させます。
最後に、私たちのモデルは、多言語の指示に効果的に従い、答えることができます。
私たちはニュース生成のためにこれを実証します。
データとモデルは https://github.com/akoksal/LongForm で公開しています。

要約(オリジナル)

Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We generate instructions via LLMs for human-written corpus examples using reverse instructions. First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia; then we generate instructions for these documents via LLMs. This approach provides a cheaper and cleaner instruction-tuning dataset with natural output and one suitable for long text generation. Our models outperform 10x larger language models without instruction tuning on tasks such as story/recipe generation and long-form question answering. Moreover, LongForm models outperform prior instruction-tuned models such as FLAN-T5 and Alpaca by a large margin, and improve language understanding capabilities further. Finally, our models can effectively follow and answer multilingual instructions; we demonstrate this for news generation. We publicly release our data and models: https://github.com/akoksal/LongForm.

arxiv情報

著者	Abdullatif Köksal,Timo Schick,Anna Korhonen,Hinrich Schütze
発行日	2024-02-14 18:00:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LongForm: Effective Instruction Tuning with Reverse Instructions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー