ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork

要約

以前に見えないパートナーと協力できるAIエージェントの開発は、アドホックチームワーク（AHT）として知られるマルチエージェント学習における基本的な一般化課題です。
既存のAHTアプローチは通常、2段階のパイプラインを採用します。最初に、チームメイトの固定集団が展開時間に見られるチームメイトの代表であるべきだという考えで生成され、第二に、AHTエージェントは人口のエージェントとよく協力するように訓練されています。
これまで、研究コミュニティは、各段階の個別のアルゴリズムの設計に焦点を当ててきました。
この分離は、可能性のある動作のカバレッジが限られているチームメイトプールを生成するアルゴリズムにつながり、生成されたチームメイトがAHTエージェントのために簡単に学ぶことができるかどうかを無視します。
さらに、AHTエージェントをトレーニングするためのアルゴリズムは通常、トレーニングチームメイトのセットを静的として扱うため、トレーニングチームメイトの配布を制御することなく、以前に見えなかったパートナーエージェントに一般化しようとします。
このホワイトペーパーでは、ADHOCエージェントと敵対チームメイトジェネレーターの間の自由回答形式の学習プロセスとして問題を再定式化することにより、AHTの統一されたフレームワークを紹介します。
AHTエージェントの改善とその欠陥を調査するチームメイトを生成することとを交互にする、後悔したオープンエンドのトレーニングアルゴリズムであるRotateを紹介します。
多様なAHT環境にわたる広範な実験は、目に見えない評価チームメイトに一般化することでベースラインを大幅に上回ることを示しており、したがって、堅牢で一般化可能なチームワークの新しい基準を確立します。

要約(オリジナル)

Developing AI agents capable of collaborating with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches typically adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammate pools with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the distribution of training teammates. In this paper, we present a unified framework for AHT by reformulating the problem as an open-ended learning process between an ad hoc agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Extensive experiments across diverse AHT environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.

arxiv情報

著者	Caroline Wang,Arrasy Rahman,Jiaxun Cui,Yoonchang Sung,Peter Stone
発行日	2025-05-29 17:24:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー