FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization

要約

一般的で不確実な非線形力学モデルを介して状態分布を伝播することは扱いにくいことが知られており、通常は数値的近似または解析的近似が必要になります。
私たちは、拡張圧縮アンセンテッド変換と呼ばれる状態予測方法を導入し、それを使用してオンラインポリシー最適化問題のクラスを解決します。
私たちが提案するアルゴリズムは、状態依存の分布を通じて有限数のシグマポイントを伝播します。これにより、結果の分布を表す各タイムステップでのシグマポイントの数の増加が決まります。
これが拡張操作と呼ばれるものです。
アルゴリズムのスケーラビリティを維持するために、モーメントマッチングに基づく圧縮操作で拡張操作を強化し、それによって複数のタイムステップにわたる予測全体でシグマポイントの数を一定に保ちます。
そのパフォーマンスはモンテカルロと同等でありながら、計算コストがはるかに低いことが経験的に示されています。
状態および制御入力の制約の下では、状態予測は、その後、後退地平線方式でポリシーパラメーターをオンライン更新するための制約付き勾配降下の提案された変形と連携して使用されます。
このフレームワークは、ポリシートレーニング用の微分可能な計算グラフとして実装されます。
安全制御ジムでのベンチマーク比較の一部としてクワローター安定化タスクのフレームワークと、リーダー/フォロワー問題でのコントロールバリア関数ベースのコントローラーのパラメーターを最適化するためのフレームワークを紹介します。

要約(オリジナル)

Propagating state distributions through a generic, uncertain nonlinear dynamical model is known to be intractable and usually begets numerical or analytical approximations. We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, and use it to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step to represent the resulting distribution; this is what we call the expansion operation. To keep the algorithm scalable, we augment the expansion operation with a compression operation based on moment matching, thereby keeping the number of sigma points constant across predictions over multiple time steps. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost. Under state and control input constraints, the state prediction is subsequently used in tandem with a proposed variant of constrained gradient-descent for online update of policy parameters in a receding horizon fashion. The framework is implemented as a differentiable computational graph for policy training. We showcase our framework for a quadrotor stabilization task as part of a benchmark comparison in safe-control-gym and for optimizing the parameters of a Control Barrier Function based controller in a leader-follower problem.

arxiv情報

著者	Hardik Parwana,Dimitra Panagou
発行日	2024-02-01 02:14:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー