RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

要約

強化学習 (RL) アルゴリズムは多くのタスクにうまく適用されていますが、ニューラルネットワークに依存しているため、その動作を理解して信頼することは困難です。
反事実的説明は、モデル入力を変更してブラックボックスシステムから目的の出力を達成する方法についてユーザーに実行可能なアドバイスを提供する、人間に優しい説明です。
ただし、RL で反事実を生成する現在のアプローチは、RL タスクの確率論的および逐次的な性質を無視しており、取得が困難な、または望ましい結果をもたらさない反事実を生成する可能性があります。
この作業では、RL エージェントの動作の反事実の説明を生成するための最初の RL 固有のアプローチである RACCER を提案します。
最初に、RL 固有の反事実プロパティのセットを提案して実装します。これにより、望ましい結果が得られる可能性が高く、簡単に到達できる反事実が保証されます。
エージェントの実行軌跡のヒューリスティックツリー検索を使用して、定義されたプロパティに基づいて最も適切な反事実を見つけます。
RACCER を 2 つのタスクで評価し、ユーザー調査を実施して、現在の最先端のアプローチと比較して、RL 固有の反事実がエージェントの行動をユーザーがよりよく理解するのに役立つことを示します。

要約(オリジナル)

While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals which are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behaviour of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly-probable desired outcomes. We use a heuristic tree search of agent’s execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agent’s behavior compared to the current state-of-the-art approaches.

arxiv情報

著者	Jasmina Gajcin,Ivana Dusparic
発行日	2023-03-08 09:47:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー