Evaluating Multi-Agent Coordination Abilities in Large Language Models

要約

現代の AI 研究における極めて重要な目的は、人間と他のシステムの両方との効果的なコラボレーションを可能にする、マルチエージェント調整に熟達したエージェントを開発することです。
大規模言語モデル (LLM) は、人間のような方法で言語を理解し、生成し、解釈する顕著な能力を備えており、そのようなエージェントの開発の有望な候補として際立っています。
この研究では、さまざまな調整シナリオで LLM を使用して作成されたエージェントの有効性を構築し、評価します。
LLM がコーディネーションゲームをプレイできるようにするために特別に設計された LLM-Coordination (LLM-Co) フレームワークを紹介します。
LLM-Co フレームワークを使用して、3 つのゲーム環境で評価を実施し、評価を 5 つの側面 (心の理論、状況に応じた推論、持続的な調整、パートナーに対する堅牢性、および明示的な支援) に整理します。
まず、心の理論と状況に応じた推論の評価により、パートナーの意図を推測し、それに応じて行動を推論する LLM の能力が明らかになります。
次に、持続的な調整とパートナーに対する堅牢性に関する評価は、複雑な長期タスクにおいて未知のパートナーと調整する LLM の能力をさらに示し、強化学習のベースラインを上回ります。
最後に、積極的に支援を提供するエージェントの能力を指す明示的な支援をテストするために、Overcooked-AI ベンチマークに 2 つの新しいレイアウトを導入し、エージェントがパートナーの支援を優先して、パートナーのサポートに費やすことができた時間を犠牲にできるかどうかを調べます。
タスク。
この研究は、洗練された調整環境における LLM の有望な機能を強調し、マルチエージェント調整のための強力な現実世界エージェントの構築における LLM の可能性を明らかにします。

要約(オリジナル)

A pivotal aim in contemporary AI research is to develop agents proficient in multi-agent coordination, enabling effective collaboration with both humans and other systems. Large Language Models (LLMs), with their notable ability to understand, generate, and interpret language in a human-like manner, stand out as promising candidates for the development of such agents. In this study, we build and assess the effectiveness of agents crafted using LLMs in various coordination scenarios. We introduce the LLM-Coordination (LLM-Co) Framework, specifically designed to enable LLMs to play coordination games. With the LLM-Co framework, we conduct our evaluation with three game environments and organize the evaluation into five aspects: Theory of Mind, Situated Reasoning, Sustained Coordination, Robustness to Partners, and Explicit Assistance. First, the evaluation of the Theory of Mind and Situated Reasoning reveals the capabilities of LLM to infer the partner’s intention and reason actions accordingly. Then, the evaluation around Sustained Coordination and Robustness to Partners further showcases the ability of LLMs to coordinate with an unknown partner in complex long-horizon tasks, outperforming Reinforcement Learning baselines. Lastly, to test Explicit Assistance, which refers to the ability of an agent to offer help proactively, we introduce two novel layouts into the Overcooked-AI benchmark, examining if agents can prioritize helping their partners, sacrificing time that could have been spent on their tasks. This research underscores the promising capabilities of LLMs in sophisticated coordination environments and reveals the potential of LLMs in building strong real-world agents for multi-agent coordination.

arxiv情報

著者	Saaket Agashe,Yue Fan,Xin Eric Wang
発行日	2023-10-05 21:18:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Multi-Agent Coordination Abilities in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー