Learning Logic Specifications for Soft Policy Guidance in POMCP

要約

部分観測可能モンテカルロ計画 (POMCP) は、部分観測可能マルコフ決定過程 (POMDP) の効率的なソルバーです。
モンテカルロツリー検索ベースの戦略を使用して、最適なポリシーの近似をローカルおよびオンラインで計算することにより、大きな状態空間にスケーリングできます。
ただし、POMCP は、特に大きな状態空間と長い視野を持つ環境では、最終目標に到達した場合にのみ報酬が達成されるという希薄な報酬関数に悩まされます。
最近、ロジック仕様が POMCP に統合され、探索をガイドし、安全要件を満たすようになりました。
ただし、このようなポリシー関連のルールは、特に実際のシナリオでは、ドメインの専門家による手動での定義が必要です。
この論文では、帰納的論理プログラミングを使用して、POMCP実行のトレース、つまりプランナーによって生成された信念と行動のペアのセットから論理仕様を学習します。
具体的には、解集合プログラミングのパラダイムで表現されたルールを学びます。
次に、それらをPOMCP内に統合して、有望なアクションに対するソフトポリシーバイアスを提供します。
2 つのベンチマークシナリオである rocksample と battery のコンテキストで、小さなタスクインスタンスから学習したルールを統合することで、少ないモンテカルロシミュレーションと大きなタスクインスタンスでパフォーマンスを向上できることを示します。
POMCP の修正版を https://github.com/GiuMaz/pomcp_clingo.git で公開しています。

要約(オリジナル)

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a Monte Carlo Tree Search based strategy. However, POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached, particularly in environments with large state spaces and long horizons. Recently, logic specifications have been integrated into POMCP to guide exploration and to satisfy safety requirements. However, such policy-related rules require manual definition by domain experts, especially in real-world scenarios. In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions, i.e., sets of belief-action pairs generated by the planner. Specifically, we learn rules expressed in the paradigm of answer set programming. We then integrate them inside POMCP to provide soft policy bias toward promising actions. In the context of two benchmark scenarios, rocksample and battery, we show that the integration of learned rules from small task instances can improve performance with fewer Monte Carlo simulations and in larger task instances. We make our modified version of POMCP publicly available at https://github.com/GiuMaz/pomcp_clingo.git.

arxiv情報

著者	Giulio Mazzi,Daniele Meli,Alberto Castellini,Alessandro Farinelli
発行日	2023-03-16 09:37:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Logic Specifications for Soft Policy Guidance in POMCP

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー