Efficient Inference for Large Reasoning Models: A Survey

要約

大きな推論モデル（LRMS）は、推論を学ぶことにより、大規模な言語モデル（LLM）の推論能力を大幅に改善し、複雑なタスク解決で有望なパフォーマンスを示します。
ただし、彼らの審議的推論プロセスは、トークンの使用、メモリ消費、および推論時間の非効率性につながります。
したがって、この調査では、LRMS専用に設計された効率的な推論方法のレビューを提供し、推論の質を維持しながらトークン非効率性の緩和に焦点を当てています。
まず、最近の方法を2つの主要なカテゴリにグループ化するために分類法を導入します。（a）明示的な推論構造を維持しながらトークンを減らす明示的なコンパクトチェーン（COT）、および（b）明示的なトークンの代わりに隠された表現内の推論ステップをコードする暗黙的な潜在的なCOT。
一方、私たちは彼らの長所と短所について説明します。
次に、パフォーマンスと効率の側面から既存の方法について経験的分析を実施します。
また、人間中心の制御可能な推論、解釈可能性と推論の効率性のトレードオフ、効率的な推論の安全性の確保、効率的な推論のより広範なアプリケーションなど、この分野で開かれた課題を提示します。
さらに、モデルのマージ、新しいアーキテクチャ、エージェントルーターなどの手法を介してLRMSの推論効率を高めるための重要な洞察を強調します。
この作品が貴重なガイドとして役立つことを願っています。研究者がこの活気に満ちたフィールド\ footnote {https://github.com/yueliu1999/awesome-efficience-inference-for-lrms}で課題を克服するのを支援します。

要約(オリジナル)

Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason, exhibiting promising performance in complex task-solving. However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time. Thus, this survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality. First, we introduce a taxonomy to group the recent methods into two main categories: (a) explicit compact Chain-of-Thought (CoT), which reduces tokens while keeping the explicit reasoning structure, and (b) implicit latent CoT, which encodes reasoning steps within hidden representations instead of explicit tokens. Meanwhile, we discuss their strengths and weaknesses. Then, we conduct empirical analyses on existing methods from performance and efficiency aspects. Besides, we present open challenges in this field, including human-centric controllable reasoning, trade-off between interpretability and efficiency of reasoning, ensuring safety of efficient reasoning, and broader applications of efficient reasoning. In addition, we highlight key insights for enhancing LRMs’ inference efficiency via techniques such as model merging, new architectures, and agent routers. We hope this work serves as a valuable guide, helping researchers overcome challenges in this vibrant field\footnote{https://github.com/yueliu1999/Awesome-Efficient-Inference-for-LRMs}.

arxiv情報

著者	Yue Liu,Jiaying Wu,Yufei He,Hongcheng Gao,Hongyu Chen,Baolong Bi,Ruihan Gong,Jiaheng Zhang,Zhiqi Huang,Bryan Hooi
発行日	2025-06-16 16:51:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Inference for Large Reasoning Models: A Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー