Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

要約

協調的なマルチエージェント強化学習のための値関数分解方法個々のエージェントユーティリティからの共同値を構成し、共同目標を使用してそれらを訓練します。
個々のユーティリティと共同値間のアクション選択プロセスが一貫していることを確認するために、構成が個々のグロバルMax（IGM）プロパティを満たすことが不可欠です。
IGM自体を満足させることは簡単ですが、ほとんどの既存の方法（VDN、QMIXなど）は表現能力が限られており、IGM値の完全なクラスを表すことができず、そのような制限がない1つの例外（QPlex）は不必要に複雑です。
この作業では、IGM値の完全なクラスの単純な定式化を提示し、QFIXの導出に自然につながります。これは、薄い「固定」層によって以前のモデルの表現能力を拡張する価値関数分解モデルの新しいファミリーです。
QFIXの複数のバリエーションを導き出し、2つのよく知られたマルチエージェントフレームワークに3つのバリアントを実装します。
複数のSMACV2および過剰調理済み環境で経験的評価を実行します。これは、QFIX（i）が以前の方法のパフォーマンスを向上させることに成功し、（ii）メインの競合他社QPlexよりも安定してパフォーマンスを発揮することを確認し、（iii）これを達成しながら最もシンプルで最小の混合モデルを採用します。

要約(オリジナル)

Value function decomposition methods for cooperative multi-agent reinforcement learning compose joint values from individual per-agent utilities, and train them using a joint objective. To ensure that the action selection process between individual utilities and joint values remains consistent, it is imperative for the composition to satisfy the individual-global max (IGM) property. Although satisfying IGM itself is straightforward, most existing methods (e.g., VDN, QMIX) have limited representation capabilities and are unable to represent the full class of IGM values, and the one exception that has no such limitation (QPLEX) is unnecessarily complex. In this work, we present a simple formulation of the full class of IGM values that naturally leads to the derivation of QFIX, a novel family of value function decomposition models that expand the representation capabilities of prior models by means of a thin ‘fixing’ layer. We derive multiple variants of QFIX, and implement three variants in two well-known multi-agent frameworks. We perform an empirical evaluation on multiple SMACv2 and Overcooked environments, which confirms that QFIX (i) succeeds in enhancing the performance of prior methods, (ii) learns more stably and performs better than its main competitor QPLEX, and (iii) achieves this while employing the simplest and smallest mixing models.

arxiv情報

著者	Andrea Baisero,Rupali Bhati,Shuo Liu,Aathira Pillai,Christopher Amato
発行日	2025-05-15 16:36:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー