Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models

要約

大規模な推論モデル（LRMS）の最近の進歩は、複数のタスクの推論能力を強化するために、テスト時間計算のスケーリングの有効性を実証しています。
ただし、LRMは通常、「考え直し」問題に悩まされます。この問題では、モデルは、限られたパフォーマンスの向上をもたらしながら、大幅に冗長な推論ステップを生成します。
既存の作業は、過剰な考えを緩和するための微調整に依存しています。これには、追加のデータ、型破りなトレーニングセットアップ、危険な安全性の不整合、および貧弱な一般化が必要です。
経験的分析を通じて、LRM行動の重要な特徴を明らかにします。これは、思考トークン（$ \ texttt {} $と$ \ texttt {）} $の間に小さなモデルによって生成された外部コットを配置することで、モデルを効果的に操作してより少ない思考を生成できます。
これらの洞察に基づいて、LRMが不必要な中間ステップをバイパスし、計算コストを大幅に削減できるようにするために、シンプルでありながら効率的なパイプライン、Thoughnmaniを提案します。
思考マニの有用性と効率を検証するために、広範な実験を実施します。
たとえば、ライブベンチ/コードデータセットでQWQ-32Bに適用すると、Thoughtmaniは元のパフォーマンスを維持し、コットジェネレーターからのオーバーヘッドで出力トークンカウントを約30％減らします。
さらに、思考は平均10％の安全アライメントを強化することがわかります。
モデルベンダーは通常、さまざまなサイズのモデルを同時に提供するため、Thoughtmaniは、実際のアプリケーション向けに、より効率的でアクセス可能なLRMを構築するための効果的な方法を提供します。

要約(オリジナル)

Recent advancements in large reasoning models (LRMs) have demonstrated the effectiveness of scaling test-time computation to enhance reasoning capabilities in multiple tasks. However, LRMs typically suffer from ‘overthinking’ problems, where models generate significantly redundant reasoning steps while bringing limited performance gains. Existing work relies on fine-tuning to mitigate overthinking, which requires additional data, unconventional training setups, risky safety misalignment, and poor generalization. Through empirical analysis, we reveal an important characteristic of LRM behaviors that placing external CoTs generated by smaller models between the thinking token ($\texttt{}$ and $\texttt{)}$ can effectively manipulate the model to generate fewer thoughts. Building on these insights, we propose a simple yet efficient pipeline, ThoughtMani, to enable LRMs to bypass unnecessary intermediate steps and reduce computational costs significantly. We conduct extensive experiments to validate the utility and efficiency of ThoughtMani. For instance, when applied to QwQ-32B on the LiveBench/Code dataset, ThoughtMani keeps the original performance and reduces output token counts by approximately 30%, with little overhead from the CoT generator. Furthermore, we find that ThoughtMani enhances safety alignment by an average of 10%. Since model vendors typically serve models of different sizes simultaneously, ThoughtMani provides an effective way to construct more efficient and accessible LRMs for real-world applications.

arxiv情報

著者	Yule Liu,Jingyi Zheng,Zhen Sun,Zifan Peng,Wenhan Dong,Zeyang Sha,Shiwen Cui,Weiqiang Wang,Xinlei He
発行日	2025-04-18 11:07:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー