InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models

要約

大規模な言語モデルの高度な推論は、挑戦的なタスクで顕著なパフォーマンスを達成しましたが、一般的な長いコンテキストの推論パラダイムは、シーケンス長、最大コンテキスト境界に制約された推論、およびトレーニング前のコンテキストウィンドウを超えたパフォーマンスの分解を伴う二次計算スケーリングに直面しています。
既存のアプローチは、基本的なスケーリング問題に対処することなく、主に推論チェーンを圧縮します。
これらの課題を克服するために、中間要約を伴うモノリシックな推論を反復プロセスに変換するパラダイムであるInfthinkを紹介します。
簡潔な進捗概要を備えた短い推論セグメントをインターリーでインテリアすることにより、私たちのアプローチにより、境界のある計算コストを維持しながら、無制限の推論の深さが可能になります。
これにより、従来のアプローチと比較して計算の複雑さを大幅に削減する特徴的な鋸歯状記憶パターンが作成されます。
さらに、OpenR1-Mathを333Kトレーニングインスタンスに変換して、ロングコンテキストの推論データセットを反復形式に再構築する方法を開発します。
複数のモデルアーキテクチャの実験により、このアプローチがパフォーマンスを改善しながら計算コストを削減し、QWEN2.5-MATH-7BがMath500、AIME24、およびGPQA_Diamondベンチマーク全体で3-13％の改善を示していることが示されています。
私たちの仕事は、推論の深さと計算効率との間の想定されるトレードオフに挑戦し、建築的修正なしで複雑な推論に対してよりスケーラブルなアプローチを提供します。

要約(オリジナル)

Advanced reasoning in large language models has achieved remarkable performance on challenging tasks, but the prevailing long-context reasoning paradigm faces critical limitations: quadratic computational scaling with sequence length, reasoning constrained by maximum context boundaries, and performance degradation beyond pre-training context windows. Existing approaches primarily compress reasoning chains without addressing the fundamental scaling problem. To overcome these challenges, we introduce InftyThink, a paradigm that transforms monolithic reasoning into an iterative process with intermediate summarization. By interleaving short reasoning segments with concise progress summaries, our approach enables unbounded reasoning depth while maintaining bounded computational costs. This creates a characteristic sawtooth memory pattern that significantly reduces computational complexity compared to traditional approaches. Furthermore, we develop a methodology for reconstructing long-context reasoning datasets into our iterative format, transforming OpenR1-Math into 333K training instances. Experiments across multiple model architectures demonstrate that our approach reduces computational costs while improving performance, with Qwen2.5-Math-7B showing 3-13% improvements across MATH500, AIME24, and GPQA_diamond benchmarks. Our work challenges the assumed trade-off between reasoning depth and computational efficiency, providing a more scalable approach to complex reasoning without architectural modifications.

arxiv情報

著者	Yuchen Yan,Yongliang Shen,Yang Liu,Jin Jiang,Mengdi Zhang,Jian Shao,Yueting Zhuang
発行日	2025-03-13 16:00:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー