Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving

要約

自動運転は、安全な軌道計画を立てるために周囲の環境を認識して理解する必要がある難しいタスクです。
既存のビジョンベースのエンドツーエンドモデルは有望な結果を達成していますが、これらの手法は依然としてビジョンの理解、意思決定推論、シーンの一般化という課題に直面しています。
これらの問題を解決するために、GPVL と呼ばれる 3D ビジョン言語の事前トレーニングモデルを使用した生成計画がエンドツーエンドの自動運転向けに提案されています。
提案されたパラダイムには 2 つの重要な側面があります。
一方では、3D ビジョン言語事前トレーニングモジュールは、鳥瞰図での視覚認識と言語理解の間のギャップを埋めるように設計されています。
一方、クロスモーダル言語モデルは、自己回帰的な方法で知覚およびナビゲーション情報を使用して全体的な運転決定と詳細な軌道を生成するために導入されています。
挑戦的な nuScenes データセットの実験により、提案されたスキームが最先端の方法と比較して優れたパフォーマンスを達成することが実証されました。
さらに、提案された GPVL は、さまざまなシナリオで高レベルのコマンドを処理するときに、強力な一般化機能とリアルタイムの可能性を示します。
GPVL の効果的、堅牢かつ効率的なパフォーマンスは、将来の自動運転システムの実用化にとって重要であると考えられています。
コードは https://github.com/ltp1995/GPVL で入手できます。

要約(オリジナル)

Autonomous driving is a challenging task that requires perceiving and understanding the surrounding environment for safe trajectory planning. While existing vision-based end-to-end models have achieved promising results, these methods are still facing the challenges of vision understanding, decision reasoning and scene generalization. To solve these issues, a generative planning with 3D-vision language pre-training model named GPVL is proposed for end-to-end autonomous driving. The proposed paradigm has two significant aspects. On one hand, a 3D-vision language pre-training module is designed to bridge the gap between visual perception and linguistic understanding in the bird’s eye view. On the other hand, a cross-modal language model is introduced to generate holistic driving decisions and fine-grained trajectories with perception and navigation information in an auto-regressive manner. Experiments on the challenging nuScenes dataset demonstrate that the proposed scheme achieves excellent performances compared with state-of-the-art methods. Besides, the proposed GPVL presents strong generalization ability and real-time potential when handling high-level commands in various scenarios. It is believed that the effective, robust and efficient performance of GPVL is crucial for the practical application of future autonomous driving systems. Code is available at https://github.com/ltp1995/GPVL

arxiv情報

著者	Tengpeng Li,Hanli Wang,Xianfei Li,Wenlong Liao,Tao He,Pai Peng
発行日	2025-01-15 15:20:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー