Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

要約

多目的強化学習 (MORL) は、複数の目的が関係する問題を解決するために使用されます。
MORL エージェントは、個別の報酬関数によって提供される多様なシグナルに基づいて意思決定を行う必要があります。
MORL エージェントをトレーニングすると、一連のソリューション (ポリシー) が生成され、それぞれが目的 (期待収益) 間の明確なトレードオフを示します。
MORL は、単一のポリシーを使用するのではなく、ソリューションセット内のポリシーをトレードオフに基づいて詳細に比較できるようにすることで、説明可能性を高めます。
ただし、ソリューションセットは通常、大規模かつ多次元であり、各ポリシー (ニューラルネットワークなど) はその目的値によって表されます。
MORL によって生成されたソリューションセットをクラスタリングするためのアプローチを提案します。
政策行動と客観的値の両方を考慮することにより、私たちのクラスタリング手法は、政策行動と目的空間内の地域との関係を明らかにすることができます。
このアプローチにより、意思決定者 (DM) は、各ポリシーを個別に調査するのではなく、ソリューションセット内の全体的な傾向と洞察を特定できるようになります。
4 つの多目的環境でメソッドをテストしたところ、従来の k-medoids クラスタリングよりも優れたパフォーマンスを発揮することがわかりました。
さらに、実際のアプリケーションを実証するケーススタディも含まれています。

要約(オリジナル)

Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. An MORL agent must make decisions based on the diverse signals provided by distinct reward functions. Training an MORL agent yields a set of solutions (policies), each presenting distinct trade-offs among the objectives (expected returns). MORL enhances explainability by enabling fine-grained comparisons of policies in the solution set based on their trade-offs as opposed to having a single policy. However, the solution set is typically large and multi-dimensional, where each policy (e.g., a neural network) is represented by its objective values. We propose an approach for clustering the solution set generated by MORL. By considering both policy behavior and objective values, our clustering method can reveal the relationship between policy behaviors and regions in the objective space. This approach can enable decision makers (DMs) to identify overarching trends and insights in the solution set rather than examining each policy individually. We tested our method in four multi-objective environments and found it outperformed traditional k-medoids clustering. Additionally, we include a case study that demonstrates its real-world application.

arxiv情報

著者	Zuzanna Osika,Jazmin Zatarain-Salazar,Frans A. Oliehoek,Pradeep K. Murukannaiah
発行日	2024-11-07 15:26:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー