Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

要約

大規模な言語モデルの最近の進歩により、大規模な推論モデル（たとえば、Deepseek R1）から蒸留されたチェーンオブ思考（COT）の推論データを使用した監視された微調整（SFT）が、推論機能を非合理モデルに効果的に転送できることが実証されています。
ただし、このアプローチで微調整されたモデルは、教師モデルから「考え直し」問題を継承し、推論中に冗長および冗長な推論チェーンを生成します。
この課題に対処するために、\ textbf {l} ong- \ textbf {s} hort chain-of-textbf {mixture} \ textbf {s} upervised \ textbf {f} ine- \ textbf {t} uning（\ textbf {\ ls-mixture sft combine sft with combines with combine sft com
彼らの短いカウンターパートは、構造に保存された書き換えを通じて得られました。
私たちの実験は、直接SFTでトレーニングされたものと比較して、LSミックス型SFT法を使用してトレーニングされたモデルが、さまざまなベンチマークで2.3％の平均精度改善を達成し、モデルの応答長を約47.61 \％削減することを実証しています。
この作業は、教師モデルから継承された固有の過度の考え直しの問題を避けながら、監視された微調整を通じて、非合理的なモデルを推論能力に授与するアプローチを提供し、それによって微調整されたモデルで効率的な推論を可能にします。

要約(オリジナル)

Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the ‘overthinking’ problem from teacher models, producing verbose and redundant reasoning chains during inference. To address this challenge, we propose \textbf{L}ong-\textbf{S}hort Chain-of-Thought \textbf{Mixture} \textbf{S}upervised \textbf{F}ine-\textbf{T}uning (\textbf{LS-Mixture SFT}), which combines long CoT reasoning dataset with their short counterparts obtained through structure-preserved rewriting. Our experiments demonstrate that models trained using the LS-Mixture SFT method, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3\% across various benchmarks while substantially reducing model response length by approximately 47.61\%. This work offers an approach to endow non-reasoning models with reasoning capabilities through supervised fine-tuning while avoiding the inherent overthinking problems inherited from teacher models, thereby enabling efficient reasoning in the fine-tuned models.

arxiv情報

著者	Bin Yu,Hang Yuan,Yuliang Wei,Bailing Wang,Weizhen Qi,Kai Chen
発行日	2025-05-06 12:18:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー