GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

要約

世界モデルの最近の進歩は、動的環境シミュレーションに革命をもたらし、システムが将来の状態を予見し、潜在的な行動を評価できるようになりました。
自律運転では、これらの能力は、車両が他の道路利用者の行動を予測し、リスク認識計画を実行し、シミュレーションのトレーニングを加速し、新しいシナリオに適応し、それにより安全性と信頼性を高めるのに役立ちます。
現在のアプローチは、自律的なナビゲーションタスクでの信頼できる安全性評価に重要な、閉塞処理中に堅牢な3D幾何学的一貫性または蓄積アーティファクトを維持する際に欠陥を示します。
これに対処するために、Geodriveを導入します。これは、堅牢な3Dジオメトリ条件を世界モデルの駆動に明示的に統合して、空間的理解とアクションの制御性を高めます。
具体的には、最初に入力フレームから3D表現を抽出し、次にユーザー指定のエゴカー軌道に基づいて2Dレンダリングを取得します。
動的モデリングを有効にするために、トレーニング中に動的編集モジュールを提案して、車両の位置を編集してレンダリングを強化します。
広範な実験は、私たちの方法が、アクション精度と3D空間認識の両方で既存のモデルを大幅に上回り、より安全で適応性のある、信頼性の高い自律運転のためのより現実的で適応性があり、信頼性の高いシーンモデリングにつながることを示しています。
さらに、モデルは新しい軌道に一般化し、オブジェクトの編集やオブジェクトの軌跡コントロールなどのインタラクティブなシーン編集機能を提供できます。

要約(オリジナル)

Recent advancements in world models have revolutionized dynamic environment simulation, allowing systems to foresee future states and assess potential actions. In autonomous driving, these capabilities help vehicles anticipate the behavior of other road users, perform risk-aware planning, accelerate training in simulation, and adapt to novel scenarios, thereby enhancing safety and reliability. Current approaches exhibit deficiencies in maintaining robust 3D geometric consistency or accumulating artifacts during occlusion handling, both critical for reliable safety assessment in autonomous navigation tasks. To address this, we introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models to enhance spatial understanding and action controllability. Specifically, we first extract a 3D representation from the input frame and then obtain its 2D rendering based on the user-specified ego-car trajectory. To enable dynamic modeling, we propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles. Extensive experiments demonstrate that our method significantly outperforms existing models in both action accuracy and 3D spatial awareness, leading to more realistic, adaptable, and reliable scene modeling for safer autonomous driving. Additionally, our model can generalize to novel trajectories and offers interactive scene editing capabilities, such as object editing and object trajectory control.

arxiv情報

著者	Anthony Chen,Wenzhao Zheng,Yida Wang,Xueyang Zhang,Kun Zhan,Peng Jia,Kurt Keutzer,Shanghang Zhang
発行日	2025-05-29 12:41:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー