Explainable Reinforcement Learning via Temporal Policy Decomposition

要約

私たちは、個々のアクションに関連する将来の結果のシーケンスに焦点を当て、強化学習 (RL) ポリシーの説明可能性を時間的な観点から調査します。
RL では、価値関数は複数の軌跡および無限の地平線にわたって収集された報酬に関する情報を圧縮し、コンパクトな形式の知識表現を可能にします。
ただし、この圧縮により、逐次的な意思決定に固有の時間的な詳細が不明瞭になり、解釈可能性に関して重要な課題が生じます。
我々は、個々の RL アクションをその期待される将来の結果 (EFO) の観点から説明する新しい説明可能性アプローチである時間政策分解 (TPD) を紹介します。
これらの説明は、一般化された価値関数を一連の EFO に分解し、関心のある予測範囲までの時間ステップごとに 1 つずつ生成し、特定の結果がいつ発生すると予想されるかについての洞察を明らかにします。
固定水平時間差学習を利用して、最適なアクションと準最適なアクションの両方の EFO を学習するためのオフポリシー手法を考案し、さまざまな状態とアクションのペアの EFO で構成される対照的な説明を可能にします。
私たちの実験は、TPD が、(i) 政策の将来の戦略と特定の行動に対する予想される軌道を明確にし、(ii) 報酬構成の理解を向上させ、人間の期待に合わせた報酬関数の微調整を容易にする正確な説明を生成することを示しています。

要約(オリジナル)

We investigate the explainability of Reinforcement Learning (RL) policies from a temporal perspective, focusing on the sequence of future outcomes associated with individual actions. In RL, value functions compress information about rewards collected across multiple trajectories and over an infinite horizon, allowing a compact form of knowledge representation. However, this compression obscures the temporal details inherent in sequential decision-making, presenting a key challenge for interpretability. We present Temporal Policy Decomposition (TPD), a novel explainability approach that explains individual RL actions in terms of their Expected Future Outcome (EFO). These explanations decompose generalized value functions into a sequence of EFOs, one for each time step up to a prediction horizon of interest, revealing insights into when specific outcomes are expected to occur. We leverage fixed-horizon temporal difference learning to devise an off-policy method for learning EFOs for both optimal and suboptimal actions, enabling contrastive explanations consisting of EFOs for different state-action pairs. Our experiments demonstrate that TPD generates accurate explanations that (i) clarify the policy’s future strategy and anticipated trajectory for a given action and (ii) improve understanding of the reward composition, facilitating fine-tuning of the reward function to align with human expectations.

arxiv情報

著者	Franco Ruggeri,Alessio Russo,Rafia Inam,Karl Henrik Johansson
発行日	2025-01-07 16:10:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Explainable Reinforcement Learning via Temporal Policy Decomposition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー