ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

要約

大規模な言語モデル（LLMS）の推論に関する最近の研究では、メタ考えを統合することにより、モデルがより適応的で効果的な問題解決のために推論プロセスを監視、評価、制御できるようにすることにより、パフォーマンスをさらに向上させようとしています。
ただし、現在のシングルエージェント作業には、メタ考えを獲得するための専門的な設計が欠けているため、有効性が低くなります。
この課題に対処するために、マルチエージェント補強学習（MARL）を活用してメタを考えている行動を引き出すためにLLMSを考えるように促す新しいフレームワークである強化されたメタ考えエージェント（REMA）を紹介します。
Remaは、推論プロセスを2つの階層エージェントに切り離します。戦略的監視と計画の生成を担当する高レベルのメタ考えのエージェントと、詳細な実行のための低レベルの推論エージェントです。
整合した目的を伴う反復強化学習を通じて、これらのエージェントはコラボレーションを探求し、学習し、一般化と堅牢性の向上につながります。
実験結果は、Remaが競合レベルの数学ベンチマークやLLM-As-a-Judgeベンチマークを含む複雑な推論タスクのシングルエージェントRLベースラインよりも優れていることを示しています。
包括的なアブレーション研究は、各異なるエージェントの進化するダイナミクスをさらに示し、メタを考える推論プロセスがLLMSの推論能力をどのように強化するかについての貴重な洞察を提供します。

要約(オリジナル)

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking — enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.

arxiv情報

著者	Ziyu Wan,Yunxiang Li,Yan Song,Hanjing Wang,Linyi Yang,Mark Schmidt,Jun Wang,Weinan Zhang,Shuyue Hu,Ying Wen
発行日	2025-03-12 16:05:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー