Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

要約

Chain-Of-Thought（COT）プロンプトは、言語モデルの推論能力を高めるための強力な手法として浮上しています。
ただし、長くて正しいコットの軌跡を生成することは困難です。
最近の研究では、ループされたトランスが顕著な長さの一般化能力を持っていることが実証されていますが、それらの一般性と適応性は、自動再帰ソリューションの代替として機能することを妨げています。
ループトランスの強度をよりよく活用するために、リレーを提案します（ループアライメントを繰り返し推論します）。
具体的には、チェーンオブ考え（COT）の推論の手順をループイテレーションで整列させ、ループトランスのトレーニング中に中間監督を適用します。
この追加の反復ごとの監督により、ループされた変圧器の長さの一般化に対する能力を保持するだけでなく、目に見えないデータのCOT推論ステップを予測することもできます。
したがって、このループされたトランスを活用して、トレーニング長を超える複雑な問題の正確な推論チェーンを生成し、自動回帰モデルを微調整するために使用されます。
私たちは広範な実験を実施し、結果は私たちのアプローチの有効性を実証し、自己回帰モデルのパフォーマンスを大幅に改善します。
コードはhttps://github.com/qifanyu/relayでリリースされます。

要約(オリジナル)

Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model’s reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer’s ability for length generalization but also enables it to predict CoT reasoning steps for unseen data. Therefore, we leverage this Looped Transformer to generate accurate reasoning chains for complex problems that exceed the training length, which will then be used to fine-tune an auto-regressive model. We conduct extensive experiments, and the results demonstrate the effectiveness of our approach, with significant improvements in the performance of the auto-regressive model. Code will be released at https://github.com/qifanyu/RELAY.

arxiv情報

著者	Qifan Yu,Zhenyu He,Sijie Li,Xun Zhou,Jun Zhang,Jingjing Xu,Di He
発行日	2025-02-12 15:17:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー