Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning

要約

深層強化学習 (DRL) は、逐次的な意思決定問題において目覚ましい成功を収めました。
しかし、既存の DRL エージェントは不透明な方法で決定を下すため、ユーザーが信頼を確立したりエージェントの弱点を精査したりすることが妨げられています。
最近の研究では、エージェントがどのように行動するかを説明するための解釈可能なポリシー抽出 (IPE) 手法が開発されましたが、その説明はエージェントの行動と矛盾することが多く、したがって説明できないことがよくあります。
この問題に取り組むために、我々は、Fidelity-Induced Policy Extraction (FIPE) という新しい方法を提案します。
具体的には、既存の IPE 手法の最適化メカニズムを分析することから始め、累積報酬を増やす一方で一貫性を無視する問題について詳しく説明します。
次に、忠実度の測定を強化学習フィードバックに統合することで、忠実度によって引き起こされるメカニズムを設計します。
私たちは、StarCraft II の複雑な制御環境で実験を実施します。この環境は、現在の IPE 手法では通常避けられる領域です。
実験結果は、FIPE がインタラクションのパフォーマンスと一貫性の点でベースラインを上回っていると同時に、理解しやすいことを示しています。

要約(オリジナル)

Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making problems. However, existing DRL agents make decisions in an opaque fashion, hindering the user from establishing trust and scrutinizing weaknesses of the agents. While recent research has developed Interpretable Policy Extraction (IPE) methods for explaining how an agent takes actions, their explanations are often inconsistent with the agent’s behavior and thus, frequently fail to explain. To tackle this issue, we propose a novel method, Fidelity-Induced Policy Extraction (FIPE). Specifically, we start by analyzing the optimization mechanism of existing IPE methods, elaborating on the issue of ignoring consistency while increasing cumulative rewards. We then design a fidelity-induced mechanism by integrate a fidelity measurement into the reinforcement learning feedback. We conduct experiments in the complex control environment of StarCraft II, an arena typically avoided by current IPE methods. The experiment results demonstrate that FIPE outperforms the baselines in terms of interaction performance and consistency, meanwhile easy to understand.

arxiv情報

著者	Xiao Liu,Wubing Chen,Mao Tan
発行日	2023-09-12 10:03:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー