DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

要約

エンドツーエンドの自律運転に関する研究の関心は、モジュラータスク、つまり、究極の目標を追求する最適化を可能にする、モジュラータスク、つまり知覚、予測、および計画を統合する完全な設計により急増しています。
エンドツーエンドのパラダイムの大きな可能性にもかかわらず、既存の方法は、複雑な現実世界のシナリオにおける高価なBEV（鳥瞰図）の計算、アクションの多様性、最適な決定など、いくつかの側面に苦しんでいます。
これらの課題に対処するために、diff-vlaと呼ばれるビジョン言語モデル（VLM）によって権限を与えられた、新しいハイブリッドスパース密度の高い拡散ポリシーを提案します。
効率的なマルチモーダル駆動行動のためのスパース拡散表現を探ります。
さらに、VLMの運転決定の有効性を再考し、エージェント、マップインスタンス、VLM出力を介した深い相互作用を通じて軌道生成ガイダンスを改善します。
私たちの方法は、挑戦的な現実的で反応性のある合成シナリオを含む自律的なグランドチャレンジ2025の優れたパフォーマンスを示しています。
私たちの方法は45.0 PDMSを達成します。

要約(オリジナル)

Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird’s eye view) computation, action diversity, and sub-optimal decision in complex real-world scenarios. To address these challenges, we propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM), called Diff-VLA. We explore the sparse diffusion representation for efficient multi-modal driving behavior. Moreover, we rethink the effectiveness of VLM driving decision and improve the trajectory generation guidance through deep interaction across agent, map instances and VLM output. Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios. Our methods achieves 45.0 PDMS.

arxiv情報

著者	Anqing Jiang,Yu Gao,Zhigang Sun,Yiru Wang,Jijun Wang,Jinghao Chai,Qian Cao,Yuweng Heng,Hao Jiang,Zongzheng Zhang,Xianda Guo,Hao Sun,Hao Zhao
発行日	2025-05-27 06:45:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー