Thinker: Learning to Plan and Act

要約

私たちは、強化学習エージェントが自律的に学習された世界モデルと対話し、利用できるようにする新しいアプローチである、Thinker アルゴリズムを提案します。
Thinker アルゴリズムは、環境をワールドモデルでラップし、ワールドモデルと対話するために設計された新しいアクションを導入します。
これらのモデル相互作用アクションにより、エージェントは、環境内で実行する最終アクションを選択する前に、ワールドモデルに代替プランを提案することで計画を実行できるようになります。
このアプローチでは、エージェントが自律的に計画を立てる方法を学習できるため、手動で計画アルゴリズムを作成する必要がなくなり、視覚化によってエージェントの計画を簡単に解釈できるようになります。
倉庫番のゲームと Atari 2600 ベンチマークでの実験結果を通じてアルゴリズムの有効性を実証します。Thinker アルゴリズムはそれぞれ最先端のパフォーマンスと競争力のある結果を達成しました。
Thinker アルゴリズムでトレーニングされたエージェントの視覚化は、エージェントがワールドモデルを使用して効果的に計画を立て、より良いアクションを選択する方法を学習していることを示しています。
このアルゴリズムの汎用性により、ワールドモデルを強化学習でどのように使用できるか、また計画をエージェントの意思決定プロセスにどのようにシームレスに統合できるかについて、新しい研究の方向性が開かれます。

要約(オリジナル)

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for hand-crafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent’s plan with visualization. We demonstrate the algorithm’s effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. The algorithm’s generality opens a new research direction on how a world model can be used in reinforcement learning and how planning can be seamlessly integrated into an agent’s decision-making process.

arxiv情報

著者	Stephen Chung,Ivan Anokhin,David Krueger
発行日	2023-07-27 16:40:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Thinker: Learning to Plan and Act

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー