Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation

要約

挑戦的で構造化されていない環境でアジャイルな自律システムを展開するには、適応能力と不確実性に対する堅牢性が必要です。
MPC に基づくものなど、既存の堅牢で適応性のあるコントローラーは、重いオンボードのオンライン計算を犠牲にして、優れたパフォーマンスを達成できます。
MPC から堅牢でオンボード展開可能なポリシーを効率的に学習する戦略が登場しましたが、基本的な適応機能がまだ不足しています。
この作業では、困難なモデル/環境の不確実性に適応するポリシーを学習する機能を使用して、MPC からの堅牢なポリシー学習のために既存の効率的な IL アルゴリズムを拡張します。
私たちのアプローチの重要なアイデアは、オンラインで効率的に推定できる学習済みの低次元モデル/環境表現でポリシーを調整することにより、IL 手順を変更することです。
マルチローターで困難な外乱下で軌道を追跡するために、適応位置と姿勢制御ポリシーを学習するタスクへのアプローチを調整します。
私たちの評価は忠実度の高いシミュレーション環境で実行され、高品質の適応ポリシーを約 1.3 ドルの時間で取得できることを示しています。
さらに、トレーニング中およびトレーニング外の分布の不確実性への迅速な適応を経験的に示し、風の擾乱下でロボットの重量の約 $50\%$ に相当する $6.1$ cm の平均位置誤差を達成し、それは $36\
トレーニング中に見られた最大風より %$ 大きい。

要約(オリジナル)

The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as the ones based on MPC, can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient IL algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Our evaluation is performed in a high-fidelity simulation environment and shows that a high-quality adaptive policy can be obtained in about $1.3$ hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a $6.1$ cm average position error under a wind disturbance that corresponds to about $50\%$ of the weight of the robot and that is $36\%$ larger than the maximum wind seen during training.

arxiv情報

著者	Tong Zhao,Andrea Tagliabue,Jonathan P. How
発行日	2023-03-28 02:22:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー