Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

要約

大規模な推論モデル（LRM）内の専門家の混合（MOE）アーキテクチャは、専門家を選択的にアクティブにして構造化された認知プロセスを促進することにより、印象的な推論能力を達成しました。
顕著な進歩にもかかわらず、既存の推論モデルは、考え過ぎや考え直しなどの認知的非効率性に苦しむことがよくあります。
これらの制限に対処するために、追加のトレーニングや複雑なヒューリスティックなしで推論パフォーマンスを向上させるように設計された、Renforcing Cognitive Experts（Rice）と呼ばれる新しい推論時間ステアリング方法論を紹介します。
正規化されたポイントワイズ相互情報（NPMI）を活用すると、「」のようなトークンを特徴とするメタレベルの推論オペレーションを調整する「認知専門家」と呼ばれる専門の専門家を体系的に特定します。
厳格な定量的および科学的推論ベンチマークに関する主要なMOEベースのLRMS（DeepSeek-R1およびQWEN3-235B）との経験的評価は、推論の精度、認知効率、およびクロスドメインの一般化の顕著で一貫した改善を示しています。
重要なことに、私たちの軽量アプローチは、モデルの一般的な指導に従うスキルを維持しながら、迅速な設計やデコード制約など、一般的な推論姿勢の手法を大幅に上回っています。
これらの結果は、認知の専門家を、高度な推論モデル内で認知効率を高めるための有望で実用的で解釈可能な方向として強化することを強調しています。

要約(オリジナル)

Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes. Despite notable advances, existing reasoning models often suffer from cognitive inefficiencies like overthinking and underthinking. To address these limitations, we introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning performance without additional training or complex heuristics. Leveraging normalized Pointwise Mutual Information (nPMI), we systematically identify specialized experts, termed ”cognitive experts” that orchestrate meta-level reasoning operations characterized by tokens like ””. Empirical evaluations with leading MoE-based LRMs (DeepSeek-R1 and Qwen3-235B) on rigorous quantitative and scientific reasoning benchmarks demonstrate noticeable and consistent improvements in reasoning accuracy, cognitive efficiency, and cross-domain generalization. Crucially, our lightweight approach substantially outperforms prevalent reasoning-steering techniques, such as prompt design and decoding constraints, while preserving the model’s general instruction-following skills. These results highlight reinforcing cognitive experts as a promising, practical, and interpretable direction to enhance cognitive efficiency within advanced reasoning models.

arxiv情報

著者	Mengru Wang,Xingyu Chen,Yue Wang,Zhiwei He,Jiahao Xu,Tian Liang,Qiuzhi Liu,Yunzhi Yao,Wenxuan Wang,Ruotian Ma,Haitao Mi,Ningyu Zhang,Zhaopeng Tu,Xiaolong Li,Dong Yu
発行日	2025-05-20 17:59:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー