SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving

要約

近年、都市型自動運転車 (SDV) の動作計画は、道路コンポーネントの複雑な相互作用により、一般的な問題となっています。
これに取り組むために、多くの手法は、模倣学習 (IL) を通じて処理された、人間がサンプリングした大規模なデータに依存してきました。
効果的ではありますが、IL だけでは安全性と信頼性の問題に適切に対処できません。
RL ポリシーと IL ポリシーの間の KL 発散を RL 損失に追加することで IL と強化学習 (RL) を組み合わせると、IL の弱点を軽減できますが、IL の共変量シフトによって引き起こされる過剰保存の問題が発生します。
この制限に対処するために、過剰保存特性を軽減する簡単な方法を提供する暗黙的なエントロピー KL 制御を使用して IL と RL を組み合わせる方法を導入します。
特に、未確認のデータセットからさまざまな挑戦的なシミュレーション都市シナリオを検証します。これは、IL が模倣タスクではうまく機能するものの、提案した方法により堅牢性が大幅に向上し (故障が 17% 以上減少)、人間のような運転行動が生成されることを示しています。

要約(オリジナル)

In recent years, motion planning for urban self-driving cars (SDV) has become a popular problem due to its complex interaction of road components. To tackle this, many methods have relied on large-scale, human-sampled data processed through Imitation learning (IL). Although effective, IL alone cannot adequately handle safety and reliability concerns. Combining IL with Reinforcement learning (RL) by adding KL divergence between RL and IL policy to the RL loss can alleviate IL’s weakness but suffer from over-conservation caused by covariate shift of IL. To address this limitation, we introduce a method that combines IL with RL using an implicit entropy-KL control that offers a simple way to reduce the over-conservation characteristic. In particular, we validate different challenging simulated urban scenarios from the unseen dataset, indicating that although IL can perform well in imitation tasks, our proposed method significantly improves robustness (over 17\% reduction in failures) and generates human-like driving behavior.

arxiv情報

著者	Minh Tri Huynh,Duc Dung Nguyen
発行日	2024-10-30 07:18:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー