Goal-oriented inference of environment from redundant observations

要約

タイトル：冗長な観測からの環境目的推論

要約：
– エージェントは報酬の最大化などの行動目標を達成するために意思決定行動を組織化することを学びます。この最適化のために、強化学習がしばしば使用されます。
– 最適な行動戦略を学習することは、学習に必要なイベントが部分的にしか観測できない不確実性の下で難しい場合があります。
– しかし、現実の環境は報酬提供に関係ない多くのイベントも出します。そこで、重複可能なマルコフ決定過程(ROMDP)を仮定し、報酬に関連する「コア状態」との間の状態遷移ルールを効率的に学習する目的指向の強化学習方法を提案します。
– 当初はわずかな数のコア状態から始め、ベルマン方程式に一致する最適な行動戦略を達成するまで、モデルに新しいコア状態を徐々に追加します。
– 結果として得られる推論モデルは、報酬に関連する「コア状態」のみを含むため、高い説明力を持っています。また、提案された方法はオンライン学習に適しており、メモリ消費を抑制し、学習速度を向上させます。

要約(オリジナル)

The agent learns to organize decision behavior to achieve a behavioral goal, such as reward maximization, and reinforcement learning is often used for this optimization. Learning an optimal behavioral strategy is difficult under the uncertainty that events necessary for learning are only partially observable, called as Partially Observable Markov Decision Process (POMDP). However, the real-world environment also gives many events irrelevant to reward delivery and an optimal behavioral strategy. The conventional methods in POMDP, which attempt to infer transition rules among the entire observations, including irrelevant states, are ineffective in such an environment. Supposing Redundantly Observable Markov Decision Process (ROMDP), here we propose a method for goal-oriented reinforcement learning to efficiently learn state transition rules among reward-related ‘core states” from redundant observations. Starting with a small number of initial core states, our model gradually adds new core states to the transition diagram until it achieves an optimal behavioral strategy consistent with the Bellman equation. We demonstrate that the resultant inference model outperforms the conventional method for POMDP. We emphasize that our model only containing the core states has high explainability. Furthermore, the proposed method suits online learning as it suppresses memory consumption and improves learning speed.

arxiv情報

著者	Kazuki Takahashi,Tomoki Fukai,Yutaka Sakai,Takashi Takekawa
発行日	2023-05-08 03:00:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Goal-oriented inference of environment from redundant observations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー