HE-Drive: Human-Like End-to-End Driving with Vision Language Models

要約

この論文では、HE-Drive を提案します。これは、時間的に一貫性があり快適な軌道を生成する初の人間中心のエンドツーエンド自動運転システムです。
最近の研究では、模倣学習ベースのプランナーと学習ベースの軌道スコアラーが、専門家のデモンストレーションを厳密に模倣した正確な軌道を効果的に生成および選択できることが示されています。
しかし、そのような軌道プランナーやスコアラーは、時間的に一貫性がなく不快な軌道を生成するというジレンマに直面しています。
上記の問題を解決するために、当社の HE ドライブは、まずスパース知覚を通じて重要な 3D 空間表現を抽出します。次に、これが条件付きノイズ除去拡散確率モデル (DDPM) ベースのモーションプランナーへの条件付き入力として機能し、時間的一貫性のあるマルチモーダル軌道を生成します。
その後、視覚言語モデル (VLM) に基づく軌道スコアラーがこれらの候補から最も快適な軌道を選択して車両を制御し、人間らしいエンドツーエンドの運転を保証します。
実験では、HE-Drive が、困難な nuScenes および OpenScene データセットに対して最先端のパフォーマンス (つまり、平均衝突率を VAD より 71% 削減) と効率 (つまり、SparseDrive の 1.9 倍高速) を達成するだけでなく、
また、実世界のデータに基づいて最も快適な運転体験を提供します。詳細については、プロジェクト Web サイト (https://jmwang0117.github.io/HE-Drive/) を参照してください。

要約(オリジナル)

In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable. Recent studies have shown that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select accuracy trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the dilemma of generating temporally inconsistent and uncomfortable trajectories. To solve the above problems, Our HE-Drive first extracts key 3D spatial representations through sparse perception, which then serves as conditional inputs for a Conditional Denoising Diffusion Probabilistic Models (DDPMs)-based motion planner to generate temporal consistency multi-modal trajectories. A Vision-Language Models (VLMs)-guided trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle, ensuring human-like end-to-end driving. Experiments show that HE-Drive not only achieves state-of-the-art performance (i.e., reduces the average collision rate by 71% than VAD) and efficiency (i.e., 1.9X faster than SparseDrive) on the challenging nuScenes and OpenScene datasets but also provides the most comfortable driving experience on real-world data.For more information, visit the project website: https://jmwang0117.github.io/HE-Drive/.

arxiv情報

著者	Junming Wang,Xingyu Zhang,Zebin Xing,Songen Gu,Xiaoyang Guo,Yang Hu,Ziying Song,Qian Zhang,Xiaoxiao Long,Wei Yin
発行日	2024-10-07 14:06:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー