End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight

要約

クアッドコプターを時間的に最適に制御する積極的な制御は、ロボット工学の分野で大きな課題となっています。
最先端のアプローチでは、強化学習 (RL) を活用して、最適なニューラルポリシーをトレーニングします。
ただし、重大なハードルはシミュレーションと実際のギャップであり、多くの場合、堅牢な内部ループコントローラーを採用することで対処できます。これは、理論上、訓練されたコントローラーの最適性を制約する抽象化であり、潜在的な外乱に対抗するためのマージンが必要です。
対照的に、私たちの新しいアプローチでは、モーターに直接コマンドを与えるエンドツーエンド RL (E2E) を使用した高速クアッドコプター制御が導入されています。
現実のギャップを埋めるために、学習された残差モデルと、推力とモーメントのモデリング誤差を補償できる適応手法を組み込みます。
私たちの E2E アプローチを、シミュレーション飛行と現実世界の飛行の両方で、INDI インナーループコントローラーに推力と機体速度を指令する最先端のネットワークと比較します。
E2E は、シミュレーションでは 1.39 秒という大幅な優位性を示し、実世界のテストでは 0.17 秒の優位性を示し、エンドツーエンドの強化学習の可能性を強調しています。
シミュレーションから現実に至るまで観察されたパフォーマンスの低下は、現実のギャップに対処するための戦略の改良や、実際の飛行データを使用したオフライン強化学習の探索など、さらなる改善の可能性を示しています。

要約(オリジナル)

Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller -an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning’s potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data.

arxiv情報

著者	Robin Ferede,Christophe De Wagter,Dario Izzo,Guido C. H. E. de Croon
発行日	2023-11-28 16:50:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー