Improving the performance of Learned Controllers in Behavior Trees using Value Function Estimates at Switching Boundaries

要約

動作ツリーは、さまざまなサブ問題を解決する一連のサブコントローラーから全体的なコントローラーを作成するモジュール式の方法を表します。
これらのサブコントローラーは、古典的なモデルベースの制御や強化学習 (RL) など、さまざまな方法で作成できます。
各サブコントローラーが次のサブコントローラーの前提条件を満たしている場合、全体のコントローラーは全体の目標を達成します。
ただし、完了時間などの一部のパフォーマンスメトリックに関して、すべてのサブコントローラーが次の前提条件の達成において局所的に最適である場合でも、同じパフォーマンスメトリックに関してコントローラー全体が最適から程遠い場合があります。
この論文では、値関数の近似を使用してサブコントローラーの設計に次のコントローラーのニーズを通知する場合、コントローラー全体のパフォーマンスがどのように向上するかを示します。
また、特定の仮定の下で、プロセスがすべてのサブコントローラーで実行されるときに、グローバルに最適なコントローラーがどのように導かれるかについても示します。
最後に、この結果は、一部のサブコントローラーがすでに指定されている場合にも当てはまります。つまり、既存のサブコントローラーを使用するように制約されている場合、この制約が与えられた場合、コントローラー全体がグローバルに最適になります。

要約(オリジナル)

Behavior trees represent a modular way to create an overall controller from a set of sub-controllers solving different sub-problems. These sub-controllers can be created in different ways, such as classical model based control or reinforcement learning (RL). If each sub-controller satisfies the preconditions of the next sub-controller, the overall controller will achieve the overall goal. However, even if all sub-controllers are locally optimal in achieving the preconditions of the next, with respect to some performance metric such as completion time, the overall controller might be far from optimal with respect to the same performance metric. In this paper we show how the performance of the overall controller can be improved if we use approximations of value functions to inform the design of a sub-controller of the needs of the next one. We also show how, under certain assumptions, this leads to a globally optimal controller when the process is executed on all sub-controllers. Finally, this result also holds when some of the sub-controllers are already given, i.e., if we are constrained to use some existing sub-controllers the overall controller will be globally optimal given this constraint.

arxiv情報

著者	Mart Kartasev,Petter Ögren
発行日	2023-05-30 09:59:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving the performance of Learned Controllers in Behavior Trees using Value Function Estimates at Switching Boundaries

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー