DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

要約

ワールドモデルは、自動運転、特にマルチビュー運転ビデオの生成において優れていることを実証しています。
ただし、カスタマイズされた運転ビデオを生成するには、依然として大きな課題が存在します。
この論文では、DriveDreamer のフレームワークに基づいて構築され、ユーザー定義の運転ビデオを生成するための大規模言語モデル (LLM) を組み込んだ DriveDreamer-2 を提案します。
具体的には、ユーザーのクエリをエージェントの軌跡に変換するために、LLM インターフェイスが最初に組み込まれます。
その後、交通規制に準拠した HDMap が軌跡に基づいて生成されます。
最終的に、生成された運転ビデオの時間的および空間的一貫性を強化するための統合マルチビューモデルを提案します。
DriveDreamer-2 は、カスタマイズされた走行ビデオを生成する初の世界モデルで、珍しい走行ビデオ (突然割り込む車両など) をユーザーフレンドリーな方法で生成できます。
さらに、実験結果は、生成されたビデオが運転知覚方法 (3D 検出や追跡など) のトレーニングを強化することを示しています。
さらに、DriveDreamer-2 のビデオ生成品質は他の最先端の方法を上回っており、FID スコアと FVD スコアが 11.2 と 55.7 であり、相対的に 30% と 50% の向上を示しています。

要約(オリジナル)

World models have demonstrated superiority in autonomous driving, particularly in the generation of multi-view driving videos. However, significant challenges still exist in generating customized driving videos. In this paper, we propose DriveDreamer-2, which builds upon the framework of DriveDreamer and incorporates a Large Language Model (LLM) to generate user-defined driving videos. Specifically, an LLM interface is initially incorporated to convert a user’s query into agent trajectories. Subsequently, a HDMap, adhering to traffic regulations, is generated based on the trajectories. Ultimately, we propose the Unified Multi-View Model to enhance temporal and spatial coherence in the generated driving videos. DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e.g., vehicles abruptly cut in) in a user-friendly manner. Besides, experimental results demonstrate that the generated videos enhance the training of driving perception methods (e.g., 3D detection and tracking). Furthermore, video generation quality of DriveDreamer-2 surpasses other state-of-the-art methods, showcasing FID and FVD scores of 11.2 and 55.7, representing relative improvements of 30% and 50%.

arxiv情報

著者	Guosheng Zhao,Xiaofeng Wang,Zheng Zhu,Xinze Chen,Guan Huang,Xiaoyi Bao,Xingang Wang
発行日	2024-03-11 16:03:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー