PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

要約

車両の動作計画は自動運転技術の重要な要素です。
現在のルールベースの車両運動計画手法は、一般的なシナリオでは十分に機能しますが、ロングテールの状況に一般化するのは困難です。
一方、大規模な閉ループシナリオでは、学習ベースの手法は、ルールベースのアプローチより優れたパフォーマンスをまだ達成していません。
これらの問題に対処するために、マルチモーダル大規模言語モデル (MLLM) に基づく初の中中期計画システムである PlanAgent を提案します。
MLLM は、人間のような知識、解釈可能性、常識的な推論を閉ループ計画に導入するための認知エージェントとして使用されます。
具体的には、PlanAgent は 3 つのコアモジュールを通じて MLLM の機能を活用します。
まず、環境変換モジュールは、入力として環境から鳥瞰図 (BEV) マップとレーングラフベースのテキスト記述を構築します。
2 番目に、Reasoning Engine モジュールは、シーンの理解から横方向および縦方向の動きの指示に至る階層的な思考チェーンを導入し、プランナーコード生成で頂点に達します。
最後に、MLLM の不確実性を軽減するために生成されたプランナーをシミュレートおよび評価するために、Reflection モジュールが統合されています。
PlanAgent には MLLM の常識的な推論と一般化機能が備わっており、これにより一般的なシナリオと複雑なロングテールシナリオの両方に効果的に取り組むことができます。
私たちが提案する PlanAgent は、大規模で困難な nuPlan ベンチマークで評価されます。
一連の包括的な実験により、PlanAgent が閉ループ動作計画タスクにおいて既存の最先端技術を上回るパフォーマンスを発揮することが説得力をもって実証されています。
コードは間もなく公開されます。

要約(オリジナル)

Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird’s Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second, a Reasoning Engine module introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation. Last, a Reflection module is integrated to simulate and evaluate the generated planner for reducing MLLM’s uncertainty. PlanAgent is endowed with the common-sense reasoning and generalization capability of MLLM, which empowers it to effectively tackle both common and complex long-tailed scenarios. Our proposed PlanAgent is evaluated on the large-scale and challenging nuPlan benchmarks. A comprehensive set of experiments convincingly demonstrates that PlanAgent outperforms the existing state-of-the-art in the closed-loop motion planning task. Codes will be soon released.

arxiv情報

著者	Yupeng Zheng,Zebin Xing,Qichao Zhang,Bu Jin,Pengfei Li,Yuhang Zheng,Zhongpu Xia,Kun Zhan,Xianpeng Lang,Yaran Chen,Dongbin Zhao
発行日	2024-06-04 07:48:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー