Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

要約

強化学習の領域における大規模言語モデル (LLM) の利用、特にプランナーとしての利用は、最近の学術文献で大きな注目を集めています。
しかし、既存の研究のかなりの部分は主に、知覚モデルから得られた出力を言語形式に変換するロボット工学のモデルの計画に焦点を当てており、したがって「純粋言語」戦略を採用しています。
この研究では、基本的な運転模倣学習とマルチモダリティプロンプトトークンに基づくLLMを組み合わせた、自動運転のためのハイブリッドエンドツーエンド学習フレームワークを提案します。
分離された列車モデルからの知覚結果を純粋な言語入力に単純に変換するのではなく、私たちの新規性は 2 つの側面にあります。
1) 視覚と LiDAR の感覚入力を学習可能なマルチモダリティトークンにエンドツーエンドで統合することで、事前にトレーニングされた個別の知覚モデルによる記述バイアスを本質的に軽減します。
2) この論文では、LLM に直接運転させる代わりに、運転モデルが間違いや複雑なシナリオを修正するのを LLM に支援させるというハイブリッド設定を検討します。
私たちの実験の結果は、提案された方法論が 49.21% の運転スコアを達成できることを示唆しており、CARLA を介して実施されたオフライン評価では 91.34% という素晴らしいルート完了率を達成しました。
これらのパフォーマンス指標は、最先端の運転モデルに匹敵します。

要約(オリジナル)

The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting a `pure-language’ strategy. In this research, we propose a hybrid End-to-End learning framework for autonomous driving by combining basic driving imitation learning with LLMs based on multi-modality prompt tokens. Instead of simply converting perception results from the separated train model into pure language input, our novelty lies in two aspects. 1) The end-to-end integration of visual and LiDAR sensory input into learnable multi-modality tokens, thereby intrinsically alleviating description bias by separated pre-trained perception models. 2) Instead of directly letting LLMs drive, this paper explores a hybrid setting of letting LLMs help the driving model correct mistakes and complicated scenarios. The results of our experiments suggest that the proposed methodology can attain driving scores of 49.21%, coupled with an impressive route completion rate of 91.34% in the offline evaluation conducted via CARLA. These performance metrics are comparable to the most advanced driving models.

arxiv情報

著者	Yiqun Duan,Qiang Zhang,Renjing Xu
発行日	2024-07-29 11:43:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー