Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles

要約

このペーパーでは、自律ナビゲーションのためのシーンダイナミクス（DL-NMPC-SD）メソッドを備えたディープラーニングベースの非線形モデル予測コントローラーを紹介します。
DL-NMPC-SDは、時間範囲センシング情報から学習したシーンダイナミクスモデルと組み合わせて、A-Prioriの公称車両モデルを使用します。
シーンダイナミクスモデルは、目的の車両軌道を推定するだけでなく、基礎となるモデルの予測コントローラーが使用する真のシステムモデルを調整する責任があります。
操作条件の高次状態空間の非線形近似値として機能するディープニューラルネットワークの層内でシーンダイナミクスモデルをエンコードすることを提案します。
このモデルは、拡張メモリ成分によって統合された範囲検知観測とシステム状態の一時的なシーケンスに基づいて学習されます。
逆補強学習とベルマン最適性の原則を使用して、学習コントローラーをディープQラーニングアルゴリズムの修正バージョンでトレーニングし、目的の状態軌道を最適なアクション値関数として推定できるようにします。
DL-NMPC-SDを、ベースラインダイナミックウィンドウアプローチ（DWA）と、それぞれ2つの最先端のEND2ENDおよび強化学習方法に対して評価しました。
パフォーマンスは、3つの実験で測定されています。i）Gridsim仮想環境で、ii）屋内および屋外のナビゲーションタスクでは、Rovislab AMTU（自動モバイルテストユニット）プラットフォームとIII）を使用して、公道で運転している本格的な自律テストビークルを使用しています。

要約(オリジナル)

This paper introduces the Deep Learning-based Nonlinear Model Predictive Controller with Scene Dynamics (DL-NMPC-SD) method for autonomous navigation. DL-NMPC-SD uses an a-priori nominal vehicle model in combination with a scene dynamics model learned from temporal range sensing information. The scene dynamics model is responsible for estimating the desired vehicle trajectory, as well as to adjust the true system model used by the underlying model predictive controller. We propose to encode the scene dynamics model within the layers of a deep neural network, which acts as a nonlinear approximator for the high order state-space of the operating conditions. The model is learned based on temporal sequences of range sensing observations and system states, both integrated by an Augmented Memory component. We use Inverse Reinforcement Learning and the Bellman optimality principle to train our learning controller with a modified version of the Deep Q-Learning algorithm, enabling us to estimate the desired state trajectory as an optimal action-value function. We have evaluated DL-NMPC-SD against the baseline Dynamic Window Approach (DWA), as well as against two state-of-the-art End2End and reinforcement learning methods, respectively. The performance has been measured in three experiments: i) in our GridSim virtual environment, ii) on indoor and outdoor navigation tasks using our RovisLab AMTU (Autonomous Mobile Test Unit) platform and iii) on a full scale autonomous test vehicle driving on public roads.

arxiv情報

著者	Sorin Grigorescu,Mihai Zaha
発行日	2025-04-02 03:46:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー