DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

要約

都市環境における自動運転の主なハードルは、困難な道路状況や繊細な人間の行動など、複雑でロングテールのシナリオを理解することです。
シーンの理解と計画機能を強化するための視覚言語モデル (VLM) を活用した自動運転システムである DriveVLM を紹介します。
DriveVLM は、シーン記述、シーン分析、階層計画のための推論モジュールの独自の組み合わせを統合します。
さらに、空間推論と大量の計算要件における VLM の限界を認識し、DriveVLM の長所と従来の自動運転パイプラインを相乗させるハイブリッドシステムである DriveVLM-Dual を提案します。
nuScenes データセットと SUP-AD データセットの両方での実験により、複雑で予測不可能な運転条件の処理における DriveVLM と DriveVLM-Dual の有効性が実証されました。
最後に、DriveVLM-Dual を量産車両に導入し、実際の自動運転環境で効果的であることを検証します。

要約(オリジナル)

A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. Experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the efficacy of DriveVLM and DriveVLM-Dual in handling complex and unpredictable driving conditions. Finally, we deploy the DriveVLM-Dual on a production vehicle, verifying it is effective in real-world autonomous driving environments.

arxiv情報

著者	Xiaoyu Tian,Junru Gu,Bailin Li,Yicheng Liu,Yang Wang,Zhiyong Zhao,Kun Zhan,Peng Jia,Xianpeng Lang,Hang Zhao
発行日	2024-06-25 17:55:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー