HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

要約

ドライビングワールドモデル (DWM) は、将来のシーンの予測を可能にすることで自動運転に不可欠なものとなっています。
ただし、既存の DWM はシーンの生成に限定されており、運転環境の解釈と推論を含むシーンの理解を組み込んでいません。
この論文では、エルメスという名前の統一ドライビングワールドモデルを紹介します。
運転シナリオにおける統一フレームワークを通じて、3D シーンの理解と将来のシーンの進化 (生成) をシームレスに統合します。
具体的には、エルメスは鳥瞰図 (BEV) 表現を活用して、幾何学的関係と相互作用を維持しながら、マルチビューの空間情報を統合します。
また、大規模言語モデル (LLM) の因果的注意を通じて世界の知識を BEV の機能に組み込むワールドクエリも導入し、タスクの理解と生成のためのコンテキストの強化を可能にします。
私たちは、nuScenes と OmniDrive-nuScenes データセットに関する包括的な調査を実施して、手法の有効性を検証します。
HERMES は最先端のパフォーマンスを実現し、生成エラーを 32.4% 削減し、CIDEr などの指標の理解度を 8.0% 向上させました。
モデルとコードは https://github.com/LMD0311/HERMES で公開されます。

要約(オリジナル)

Driving World Models (DWMs) have become essential for autonomous driving by enabling future scene prediction. However, existing DWMs are limited to scene generation and fail to incorporate scene understanding, which involves interpreting and reasoning about the driving environment. In this paper, we present a unified Driving World Model named HERMES. We seamlessly integrate 3D scene understanding and future scene evolution (generation) through a unified framework in driving scenarios. Specifically, HERMES leverages a Bird’s-Eye View (BEV) representation to consolidate multi-view spatial information while preserving geometric relationships and interactions. We also introduce world queries, which incorporate world knowledge into BEV features via causal attention in the Large Language Model (LLM), enabling contextual enrichment for understanding and generation tasks. We conduct comprehensive studies on nuScenes and OmniDrive-nuScenes datasets to validate the effectiveness of our method. HERMES achieves state-of-the-art performance, reducing generation error by 32.4% and improving understanding metrics such as CIDEr by 8.0%. The model and code will be publicly released at https://github.com/LMD0311/HERMES.

arxiv情報

著者	Xin Zhou,Dingkang Liang,Sifan Tu,Xiwu Chen,Yikang Ding,Dingyuan Zhang,Feiyang Tan,Hengshuang Zhao,Xiang Bai
発行日	2025-01-24 18:59:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー