Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation

要約

困難で構造化されていない環境に機敏な自律システムを展開するには、適応能力と不確実性に対する堅牢性が必要です。
モデル予測制御 (MPC) に基づくコントローラーなど、既存の堅牢で適応性のあるコントローラーは、大量のオンラインオンボード計算を犠牲にして優れたパフォーマンスを達成できます。
MPC から堅牢でオンボード展開可能なポリシーを効率的に学習する戦略が登場しましたが、それらには依然として基本的な適応機能が欠けています。
この研究では、既存の効率的な模倣学習 (IL) アルゴリズムを MPC からの堅牢なポリシー学習のために拡張し、困難なモデル/環境の不確実性に適応するポリシーを学習する機能を備えています。
私たちのアプローチの重要なアイデアは、オンラインで効率的に推定できる学習済みの低次元モデル/環境表現に基づいてポリシーを条件付けすることによって、IL プロシージャを変更することにあります。
私たちは、マルチコプター上の困難な外乱下での軌道を追跡するために、適応的な位置および姿勢制御ポリシーを学習するタスクにアプローチを調整します。
シミュレーションでの評価では、高品質の適応ポリシーが約 1.3 ドル時間で取得できることが示されています。
さらに、訓練中および訓練外の分布の不確実性に対する迅速な適応を経験的に実証し、風擾乱下でロボットの重量の約 $50\%$ に相当する $6.1$ cm の平均位置誤差を達成し、これは $36\ に相当します。
%$ はトレーニング中に観測された最大風よりも大きいです。

要約(オリジナル)

The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as those based on model predictive control (MPC), can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient Imitation Learning (IL) algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Evaluations in simulation show that a high-quality adaptive policy can be obtained in about $1.3$ hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a $6.1$ cm average position error under wind disturbances that correspond to about $50\%$ of the weight of the robot, and that are $36\%$ larger than the maximum wind seen during training.

arxiv情報

著者	Tong Zhao,Andrea Tagliabue,Jonathan P. How
発行日	2023-10-02 17:34:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー