Stronger Models are NOT Stronger Teachers for Instruction Tuning

要約

命令チューニングは、大規模言語モデル (LLM) がユーザーの命令に効果的に従うようにするために広く採用されています。
結果として得られる LLM の命令追従機能は、チューニングに使用される命令データセットに大きく依存します。
最近、合成命令データセットが、LLM に多様で高品質の命令を提供するための経済的に実行可能なソリューションとして登場しました。
ただし、既存のアプローチは一般に、より大きなモデルまたはより強力なモデルが命令調整のためのより強力な教師であると想定しており、したがってこれらのモデルを合成命令に対する応答生成器として単純に採用します。
この論文では、この一般的に採用されている仮定に異議を唱えます。
5 つの基本モデルと 20 の応答ジェネレーターにわたる広範な実験により、大きくて強力なモデルが必ずしも小さいモデルの教師として強力になるわけではないことが明らかになりました。
私たちはこの現象を「大規模モデルのパラドックス」と呼びます。
既存の指標では、教師と微調整される基本モデルの間の互換性が無視されるため、応答ジェネレーターの有効性を正確に予測できないことがわかります。
そこで、応答ジェネレーターの有効性を測定するために、互換性調整報酬 (CAR) と名付けられた新しい指標を開発しました。
5 つの基本モデルにわたる実験では、CAR がほぼすべてのベースラインを上回るパフォーマンスを示しています。

要約(オリジナル)

Instruction tuning has been widely adopted to ensure large language models (LLMs) follow user instructions effectively. The resulting instruction-following capabilities of LLMs heavily rely on the instruction datasets used for tuning. Recently, synthetic instruction datasets have emerged as an economically viable solution to provide LLMs diverse and high-quality instructions. However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt these models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption. Our extensive experiments across five base models and twenty response generators reveal that larger and stronger models are not necessarily stronger teachers of smaller models. We refer to this phenomenon as the Larger Models’ Paradox. We observe that existing metrics cannot precisely predict the effectiveness of response generators since they ignore the compatibility between teachers and base models being fine-tuned. We thus develop a novel metric, named as Compatibility-Adjusted Reward (CAR) to measure the effectiveness of response generators. Our experiments across five base models demonstrate that CAR outperforms almost all baselines.

arxiv情報

著者	Zhangchen Xu,Fengqing Jiang,Luyao Niu,Bill Yuchen Lin,Radha Poovendran
発行日	2024-11-12 04:05:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Stronger Models are NOT Stronger Teachers for Instruction Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー