X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

要約

拡散モデルは、現実的なデータ統合、予測エンドツーエンド計画、および閉ループシミュレーションを可能にすることにより、一時的に一貫した生成に焦点を当てて自律的な運転を進めています。
ただし、空間的一貫性を必要とする大規模な3Dシーンの生成は、既知のままです。
この論文では、柔軟な制御性を提供しながら、幾何学的複雑さと外観の忠実度の両方を達成する大規模な運転シーン生成のための新しいフレームワークであるX-Sceneを提案します。
具体的には、X-sceneは、詳細なシーン構成のためのユーザーが提供するレベルまたはテキスト駆動型のレイアウトや、ユーザーインテントやLLMが登録したテキストなどの高レベルのセマンティックガイダンスなど、効率的なカスタマイズのための高レベルのセマンティックガイダンスを含む、多顆粒コントロールをサポートします。
幾何学的および視覚的な忠実度を高めるために、3Dセマンティック占有率と対応するマルチビュー画像を順次生成し、モダリティ間のアライメントを確保する統一されたパイプラインを導入します。
さらに、生成されたローカル領域を、以前に生成された領域に条件付けられた新しい占有率と画像を推定し、空間の連続性を高め、視覚的な一貫性を維持する一貫性が認識されたシーンの上昇を通じて、生成されたローカル領域を大規模なシーンに拡張します。
結果のシーンは、高品質の3DGS表現に持ち上げられ、シーン探査などの多様なアプリケーションをサポートします。
包括的な実験は、X-Sceneが大規模な運転シーン生成の制御可能性と忠実度を大幅に高め、自律運転のデータ生成とシミュレーションを強化することを示しています。

要約(オリジナル)

Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, the generation of large-scale 3D scenes that require spatial coherence remains underexplored. In this paper, we propose X-Scene, a novel framework for large-scale driving scene generation that achieves both geometric intricacy and appearance fidelity, while offering flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level conditions such as user-provided or text-driven layout for detailed scene composition and high-level semantic guidance such as user-intent and LLM-enriched text prompts for efficient customization. To enhance geometrical and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and the corresponding multiview images, while ensuring alignment between modalities. Additionally, we extend the generated local region into a large-scale scene through consistency-aware scene outpainting, which extrapolates new occupancy and images conditioned on the previously generated area, enhancing spatial continuity and preserving visual coherence. The resulting scenes are lifted into high-quality 3DGS representations, supporting diverse applications such as scene exploration. Comprehensive experiments demonstrate that X-Scene significantly advances controllability and fidelity for large-scale driving scene generation, empowering data generation and simulation for autonomous driving.

arxiv情報

著者	Yu Yang,Alan Liang,Jianbiao Mei,Yukai Ma,Yong Liu,Gim Hee Lee
発行日	2025-06-16 14:43:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー