Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control

要約

高度な車両制御は、自動運転システム開発における基本的な構成要素です。
強化学習 (RL) は、導入時の計算需要を低く抑えながら、従来のアプローチよりも優れた制御パフォーマンスを達成することを約束します。
ただし、ソフトアクタークリティカル (SAC) のような標準的な RL アプローチでは、大量のトレーニングデータを収集する必要があるため、現実世界のアプリケーションには非現実的です。
この問題に対処するために、私たちは最近開発されたデータ効率の高いディープ RL 手法を車両の軌道制御に適用します。
私たちの調査は、これまで車両制御で未開発だった 3 つの方法に焦点を当てています。ランダム化アンサンブルダブル Q 学習 (REDQ)、軌道サンプリングとモデル予測パス積分オプティマイザーを使用した確率的アンサンブル (PETS-MPPI)、およびモデルベースのポリシー最適化 (MBPO) です。
。
軌道制御の場合、PETS-MPPI や MBPO などのアプローチで使用される標準的なモデルベースの RL 定式化は適切ではないことがわかりました。
したがって、我々は、ダイナミクス予測と車両位置特定を分割する新しい定式化を提案します。
CARLA シミュレータに関するベンチマーク調査では、特定された 3 つのデータ効率の高いディープ RL アプローチが、SAC と同等かそれ以上の制御戦略を学習しながら、必要な環境インタラクションの数を 1 桁以上削減していることが明らかになりました。

要約(オリジナル)

Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.

arxiv情報

著者	Bernd Frauenknecht,Tobias Ehlgen,Sebastian Trimpe
発行日	2023-11-30 09:38:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー