AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

要約

最近、長年の推論モデルは、複雑な推論タスクで強力なパフォーマンスを達成していますが、多くの場合、かなりの推論オーバーヘッドが発生し、効率性が重要な懸念事項になります。
私たちの経験的分析は、長期コットを使用することの利点は問題によって異なることを明らかにしています。いくつかの問題は、精巧な推論を必要とするものですが、他の問題は改善を示しません。
これは、入力の深さを調整する適応的な推論戦略を動機付けます。
ただし、以前の作業は主に長い推論パス内で冗長性を削減し、長期的なパラダイムを超えたより効率的な戦略の調査を制限します。
これに対処するために、適応的かつ効率的な推論のための新しい2段階のフレームワークを提案します。
まず、長いCOTモデルと短いCOTモデルをマージして、多様な推論スタイルを可能にすることにより、ハイブリッド推論モデルを構築します。
第二に、バイレベルの優先トレーニングを適用して、モデルをガイドして適切な推論スタイル（グループレベル）を選択し、各スタイルグループ内の簡潔で正しい推論（インスタンスレベル）を好みます。
実験は、私たちの方法が、パフォーマンスを維持しながら、他のベースラインアプローチと比較して推論コストを大幅に削減することを示しています。
特に、5つの数学データセットでは、推論の平均長さが50％以上削減され、大規模な言語モデルの推論効率を最適化する適応戦略の可能性を強調しています。
私たちのコードは、https://github.com/stardewxxx/adar1に近日公開されます

要約(オリジナル)

Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive reasoning strategies that tailor reasoning depth to the input. However, prior work primarily reduces redundancy within long reasoning paths, limiting exploration of more efficient strategies beyond the Long-CoT paradigm. To address this, we propose a novel two-stage framework for adaptive and efficient reasoning. First, we construct a hybrid reasoning model by merging long and short CoT models to enable diverse reasoning styles. Second, we apply bi-level preference training to guide the model to select suitable reasoning styles (group-level), and prefer concise and correct reasoning within each style group (instance-level). Experiments demonstrate that our method significantly reduces inference costs compared to other baseline approaches, while maintaining performance. Notably, on five mathematical datasets, the average length of reasoning is reduced by more than 50%, highlighting the potential of adaptive strategies to optimize reasoning efficiency in large language models. Our code is coming soon at https://github.com/StarDewXXX/AdaR1

arxiv情報

著者	Haotian Luo,Haiying He,Yibo Wang,Jinluan Yang,Rui Liu,Naiqiang Tan,Xiaochun Cao,Dacheng Tao,Li Shen
発行日	2025-04-30 14:01:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー