Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

要約

人間の知性のコアコンポーネントである推論能力は、AGIの追求において、大規模な言語モデル（LLM）に大きな課題をもたらし続けています。
トレーニングスケーリング法の下でモデルのパフォーマンスは向上しましたが、特に壊滅的な忘却や新しいトレーニングデータの利用可能性が限られているなどのトレーニングアルゴリズムに関して、重要な課題が残っています。
別の方法として、テスト時間スケーリングは、パラメーターの更新なしでテスト時間計算を増やすことにより、推論パフォーマンスを向上させます。
トークンスペースに焦点を当てたこのパラダイムの以前の方法とは異なり、より効果的な推論とテスト時間スケーリング法のより良い順守のために潜在スペースを活用することを提案します。
モデルの潜在空間内でテスト時間インスタンスレベルの適応（TTIA）を通じてLLMの推論を強化する新しいフレームワークであるLatenteekを紹介します。
具体的には、LatentSeekはポリシーの勾配を活用して、自己生成された報酬信号に導かれた潜在的な表現を繰り返し更新します。
LaTentEntseekは、複数のLLMアーキテクチャにわたって、GSM8K、MATH-500、AIME2024を含むさまざまな推論ベンチマークで評価されます。
結果は、Latentseekが、考え方の促しや微調整ベースの方法など、強力なベースラインよりも一貫して優れていることを示しています。
さらに、我々の分析は、LatentSeekが非常に効率的であり、通常、平均的な複雑さの問題のいくつかの反復内に収束し、追加の反復の恩恵を受けて、潜在空間でのテスト時間スケーリングの可能性を強調することを示しています。
これらの発見は、LATENTEEKをLLMSの推論能力を強化するための軽量でスケーラブルで効果的なソリューションとして位置付けています。

要約(オリジナル)

Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has improved under the training scaling law, significant challenges remain, particularly with respect to training algorithms, such as catastrophic forgetting, and the limited availability of novel training data. As an alternative, test-time scaling enhances reasoning performance by increasing test-time computation without parameter updating. Unlike prior methods in this paradigm focused on token space, we propose leveraging latent space for more effective reasoning and better adherence to the test-time scaling law. We introduce LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model’s latent space. Specifically, LatentSeek leverages policy gradient to iteratively update latent representations, guided by self-generated reward signals. LatentSeek is evaluated on a range of reasoning benchmarks, including GSM8K, MATH-500, and AIME2024, across multiple LLM architectures. Results show that LatentSeek consistently outperforms strong baselines, such as Chain-of-Thought prompting and fine-tuning-based methods. Furthermore, our analysis demonstrates that LatentSeek is highly efficient, typically converging within a few iterations for problems of average complexity, while also benefiting from additional iterations, thereby highlighting the potential of test-time scaling in the latent space. These findings position LatentSeek as a lightweight, scalable, and effective solution for enhancing the reasoning capabilities of LLMs.

arxiv情報

著者	Hengli Li,Chenxi Li,Tong Wu,Xuekai Zhu,Yuxuan Wang,Zhaoxin Yu,Eric Hanchen Jiang,Song-Chun Zhu,Zixia Jia,Ying Nian Wu,Zilong Zheng
発行日	2025-05-19 16:26:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー