SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

要約

人間中心の対話に向けて大規模言語モデル (LLM) を調整するには、命令チューニング (IT) が不可欠です。
最近の進歩により、IT データの小規模で高品質なサブセットを慎重に選択すると、LLM のパフォーマンスが大幅に向上することがわかりました。
それにもかかわらず、一般的なアプローチは追加のモデルやデータに依存することが多く、コストが増加し、広範な採用が制限されます。
この研究では、LLM 自体の基本的な機能を活用する、SelectIT と呼ばれる新しいアプローチを提案します。
具体的には、LLM に存在する本質的な不確実性を利用して、追加のリソースを必要とせずに高品質の IT データをより効果的に選択します。
さらに、Alpaca-GPT4 データセットに SelectIT を適用して作成された、厳選された IT データセットである Selective Alpaca を紹介します。
実証結果は、Selective Alpaca を使用した IT がモデル能力の大幅な向上につながることを示しています。
SelectIT の堅牢性は、さまざまな基盤モデルやドメイン固有のタスクでも実証されています。
私たちの調査結果は、より長く、より計算集約的な IT データが優れた IT ソースとして機能し、この分野の将来の研究に貴重な洞察を提供する可能性があることを示唆しています。
データ、コード、スクリプトは https://github.com/Blue-Raincoat/SelectIT から無料で入手できます。

要約(オリジナル)

Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.

arxiv情報

著者	Liangxin Liu,Xuebo Liu,Derek F. Wong,Dongfang Li,Ziyi Wang,Baotian Hu,Min Zhang
発行日	2025-01-15 08:20:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー