Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

要約

大規模言語モデルに基づく自動運転ソリューションに関する最近の研究では、計画および制御の分野において有望な見通しが示されています。
しかし、大規模言語モデルの大量の計算リソースと幻覚は、正確な軌道を予測し、制御信号を指示するタスクを妨げ続けています。
この問題に対処するために、私たちは、道路状況の理解に基づいて自動運転車両に調整可能な運転動作を提供する新しい自動運転支援システムである Co-driver を提案します。
CARLA シミュレーターとロボットオペレーティングシステム 2 (ROS2) を含むパイプラインが示されており、視覚言語モデルのテキスト出力の容量を活用しながら、単一の Nvidia 4090 24G GPU を利用してシステムの有効性を検証します。
さらに、システムの視覚言語モデルモジュールを微調整するための画像セットと対応するプロンプトセットを含むデータセットも提供します。
現実世界の運転データセットでは、当社のシステムは合理的な予測に関して、夜景では 96.16%、薄暗いシーンでは 89.7% の成功率を達成しました。
コドライバーデータセットは https://github.com/ZionGo6/Co-driver でリリースされます。

要約(オリジナル)

Recent research about Large Language Model based autonomous driving solutions shows a promising picture in planning and control fields. However, heavy computational resources and hallucinations of Large Language Models continue to hinder the tasks of predicting precise trajectories and instructing control signals. To address this problem, we propose Co-driver, a novel autonomous driving assistant system to empower autonomous vehicles with adjustable driving behaviors based on the understanding of road scenes. A pipeline involving the CARLA simulator and Robot Operating System 2 (ROS2) verifying the effectiveness of our system is presented, utilizing a single Nvidia 4090 24G GPU while exploiting the capacity of textual output of the Visual Language Model. Besides, we also contribute a dataset containing an image set and a corresponding prompt set for fine-tuning the Visual Language Model module of our system. In the real-world driving dataset, our system achieved 96.16% success rate in night scenes and 89.7% in gloomy scenes regarding reasonable predictions. Our Co-driver dataset will be released at https://github.com/ZionGo6/Co-driver.

arxiv情報

著者	Ziang Guo,Artem Lykov,Zakhar Yagudin,Mikhail Konenkov,Dzmitry Tsetserukou
発行日	2024-05-09 16:17:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー