UltraIF: Advancing Instruction Following from the Wild

要約

指導に従うことにより、最新の大手言語モデル（LLMS）が役立つアシスタントになりました。
ただし、複雑な指示でLLMを飼育するための鍵は神秘的なままです。なぜなら、オープンソースコミュニティによって訓練されたモデルと大手企業によって訓練されたモデルとの間に大きなギャップがあるからです。
ギャップを埋めるために、オープンソースデータを使用して複雑な指示に従うことができるLLMを構築するためのシンプルでスケーラブルなアプローチUltraifを提案します。
Ultraifは、実際のユーザープロンプトを最初に分解し、制約のためのより単純なクエリ、制約、および対応する評価の質問に分類します。
次に、超筋ポジャーをトレーニングして、評価の質問で制約関連プロンプトを構成します。
このプロンプト作曲家を使用すると、複雑な指示を合成し、評価の質問を使用して応答をフィルターすることができます。
私たちの実験では、初めてllama-3.1-8b-baseを整列させて、ベンチマーク情報なしで5つの命令に従うベンチマークの指示バージョンに追いつきます。
Alignedモデルは、他のベンチマークでも競争力のあるスコアを達成しました。
さらに、Ultraifが自己調整を通じてLlama-3.1-8B-Instructをさらに改善し、メソッドのより広範なユースケースを動機づけることができることも示しています。
私たちのコードは、https：//github.com/kkk-an/ultraifで入手できます。

要約(オリジナル)

Instruction-following made modern large language models (LLMs) helpful assistants. However, the key to taming LLMs on complex instructions remains mysterious, for that there are huge gaps between models trained by open-source community and those trained by leading companies. To bridge the gap, we propose a simple and scalable approach UltraIF for building LLMs that can follow complex instructions with open-source data. UltraIF first decomposes real-world user prompts into simpler queries, constraints, and corresponding evaluation questions for the constraints. Then, we train an UltraComposer to compose constraint-associated prompts with evaluation questions. This prompt composer allows us to synthesize complicated instructions as well as filter responses with evaluation questions. In our experiment, for the first time, we successfully align LLaMA-3.1-8B-Base to catch up with its instruct version on 5 instruction-following benchmarks without any benchmark information, using only 8B model as response generator and evaluator. The aligned model also achieved competitive scores on other benchmarks. Moreover, we also show that UltraIF could further improve LLaMA-3.1-8B-Instruct through self-alignment, motivating broader use cases for the method. Our code will be available at https://github.com/kkk-an/UltraIF.

arxiv情報

著者	Kaikai An,Li Sheng,Ganqu Cui,Shuzheng Si,Ning Ding,Yu Cheng,Baobao Chang
発行日	2025-02-06 15:39:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

UltraIF: Advancing Instruction Following from the Wild

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー