SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking

要約

最近の研究では、人間の専門家による一貫した応答スタイルを維持し、トレーニングセットのデータ品質を向上させることで、必要なトレーニングサンプルの数を減らしながら、微調整された大規模言語モデル (LLM) のパフォーマンスを大幅に向上できることが示されています。
ただし、スタイルの正確な定義と、スタイル、データ品質、LLM パフォーマンスの関係は依然として不明です。
この研究では、応答における 2 つの重要な文体要素、つまり言語形式と意味上の驚きを特定しました。
同等の品質のトレーニングデータの場合、これらの応答要素の一貫性が高いほど LLM パフォーマンスが向上することがわかりました。
これにヒントを得て、応答のスタイルの一貫性に基づいて、トレーニングセット内の指示と応答のペアに自動的に優先順位を付けるスタイル一貫性を意識した応答ランキング (SCAR) を導入します。
最もスタイルの一貫したサンプル (場合によってはデータセット全体の 0.7% 程度) を選択することで、微調整された LLM は、コーディングや自由形式の質問応答ベンチマークにおいて、データセット全体でトレーニングされたモデルのパフォーマンスと同等、またはそれを超えることができます。
コードとデータは https://github.com/zhuang-li/SCAR で入手できます。

要約(オリジナル)

Recent studies have shown that maintaining a consistent response style by human experts and enhancing data quality in training sets can significantly improve the performance of fine-tuned Large Language Models (LLMs) while reducing the number of training examples needed. However, the precise definition of style and the relationship between style, data quality, and LLM performance remains unclear. This research identifies two key stylistic elements in responses: linguistic form and semantic surprisal. We find that, among training data of comparable quality, higher consistency in these response elements leads to better LLM performance. Inspired by this, we introduce Style Consistency-Aware Response Ranking (SCAR), which automatically prioritizes instruction-response pairs in the training set based on their response stylistic consistency. By selecting the most style-consistent examples, sometimes as few as 0.7% of the full dataset, the fine-tuned LLMs can match or even surpass the performance of models trained on the entire dataset in coding and open-ended question-answering benchmarks. Code and data are available at https://github.com/zhuang-li/SCAR .

arxiv情報

著者	Zhuang Li,Yuncheng Hua,Thuy-Trang Vu,Haolan Zhan,Lizhen Qu,Gholamreza Haffari
発行日	2024-10-02 16:46:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー