Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

要約

大規模な言語モデル（LLMS）の最近の進歩は、長い考え方（COT）の推論を通じて顕著な推論能力を実証しています。
R1蒸留スキームは、推論能力が強化された費用対効果の高いモデルをトレーニングするための有望なアプローチとして浮上しています。
ただし、その有効性を促進する基礎となるメカニズムは不明のままです。
この研究では、蒸留データの普遍性を調べ、LLM蒸留における長鎖推論能力の効率的な伝達を可能にする重要なコンポーネントを特定します。
我々の調査結果は、Qwen-QWQのような教師モデルからの長いCOTの推論蒸留の有効性が、非精神病モデルで大幅に分解し、現在の蒸留方法の想定される普遍性に挑戦することを明らかにしています。
長いCOT推論の構造とパターンに関するより深い洞察を得るために、蒸留データ強化フレームワークであるDLCOT（長い考え方を解体する）を提案します。
DLCOTは、（1）複雑な長いCOT構造を分解するためのデータセグメンテーション、（2）解決不可能で冗長なソリューションを排除することによる単純化、および（3）中間誤差状態の最適化による3つの重要なステップで構成されています。
私たちのアプローチは、モデルのパフォーマンスとトークン効率を大幅に改善し、高性能LLMの開発を促進します。

要約(オリジナル)

Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. The R1 distillation scheme has emerged as a promising approach for training cost-effective models with enhanced reasoning abilities. However, the underlying mechanisms driving its effectiveness remain unclear. This study examines the universality of distillation data and identifies key components that enable the efficient transfer of long-chain reasoning capabilities in LLM distillation. Our findings reveal that the effectiveness of long CoT reasoning distillation from teacher models like Qwen-QwQ degrades significantly on nonhomologous models, challenging the assumed universality of current distillation methods. To gain deeper insights into the structure and patterns of long CoT reasoning, we propose DLCoT (Deconstructing Long Chain-of-Thought), a distillation data enhancement framework. DLCoT consists of three key steps: (1) data segmentation to decompose complex long CoT structures, (2) simplification by eliminating unsolvable and redundant solutions, and (3) optimization of intermediate error states. Our approach significantly improves model performance and token efficiency, facilitating the development of high-performance LLMs.

arxiv情報

著者	Yijia Luo,Yulin Song,Xingyao Zhang,Jiaheng Liu,Weixun Wang,GengRu Chen,Wenbo Su,Bo Zheng
発行日	2025-03-20 17:46:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー