Learning Composable Chains-of-Thought

要約

大規模な言語モデル（LLM）を推論するための一般的なアプローチは、分布内の推論の問題のチェーン（COT）の痕跡をトレーニングすることですが、そのような注釈付きデータは、関心のあるすべての問題について取得するために費用がかかります。
推論モデルがトレーニングの分布を超えて一般化し、理想的には構成を一般化することを望んでいます。原子推論スキルを組み合わせて、より厳しく目に見えない推論タスクを解決します。
COTデータとラベル付けされていないターゲット構成タスクに対処する際に、推論スキルの構成一般化に向けて一歩を踏み出します。
原子タスクのCOTデータに関する単純なトレーニングモデルは、一般化が限られているが、構成可能になる構成原子タスクのCOT形式を最小限に整えることができることがわかったことがわかります。
複合COTデータを使用してアトミックタスクで「アトミックコット」モデルをトレーニングし、それらをマルチタスク学習またはモデルの合併と組み合わせて、ターゲット組成タスクでゼロショットパフォーマンスを向上させることができます。
このような組み合わせモデルは、除去サンプリング微調整（RFT）を使用して、少量の組成データでさらにブートストラップできます。
ストリング操作と自然言語スキルの構成に関する結果は、合成可能なCOTでのトレーニングLLMがマルチタスク学習を上回り、特定のトレーニングデータ予算内で微調整ベースラインを継続することを示しています。

要約(オリジナル)

A common approach for teaching large language models (LLMs) to reason is to train on chain-of-thought (CoT) traces of in-distribution reasoning problems, but such annotated data is costly to obtain for every problem of interest. We want reasoning models to generalize beyond their training distribution, and ideally to generalize compositionally: combine atomic reasoning skills to solve harder, unseen reasoning tasks. We take a step towards compositional generalization of reasoning skills when addressing a target compositional task that has no labeled CoT data. We find that simply training models on CoT data of atomic tasks leads to limited generalization, but minimally modifying CoT formats of constituent atomic tasks to be composable can lead to improvements. We can train ‘atomic CoT’ models on the atomic tasks with Composable CoT data and combine them with multitask learning or model merging for better zero-shot performance on the target compositional task. Such a combined model can be further bootstrapped on a small amount of compositional data using rejection sampling fine-tuning (RFT). Results on string operations and natural language skill compositions show that training LLMs on Composable CoT outperforms multitask learning and continued fine-tuning baselines within a given training data budget.

arxiv情報

著者	Fangcong Yin,Zeyu Leo Liu,Liu Leqi,Xi Ye,Greg Durrett
発行日	2025-05-28 17:51:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Composable Chains-of-Thought

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー