Simulating the Real World: A Unified Survey of Multimodal Generative Models

要約

現実の世界を理解して複製することは、人工的な一般情報（AGI）研究における重要な課題です。
これを達成するために、世界モデルなどの多くの既存のアプローチは、物理的な世界を管理する基本原則を把握し、より正確なシミュレーションと意味のある相互作用を可能にすることを目指しています。
ただし、現在の方法では、2D（画像）、ビデオ、3D、および4D表現を含むさまざまなモダリティを独立したドメインとして扱い、相互依存性を見落とします。
さらに、これらの方法は通常、接続を体系的に統合することなく、現実の孤立した次元に焦点を当てています。
この調査では、実際のシミュレーションにおけるデータ次元の進行を調査するマルチモーダル生成モデルの統一調査を提示します。
具体的には、この調査は2D世代（外観）から始まり、ビデオ（外観+ダイナミクス）と3D世代（外観+ジオメトリ）に移動し、最後にすべての次元を統合する4D世代で頂点に達します。
私たちの知る限り、これは、単一のフレームワーク内で2D、ビデオ、3D、および4D生成の研究を体系的に統合する最初の試みです。
将来の研究を導くために、データセット、評価メトリック、将来の方向性の包括的なレビューを提供し、新人の洞察を促進します。
この調査は、マルチモーダル生成モデルの研究と統一されたフレームワーク内での実世界のシミュレーションを進めるための橋として機能します。

要約(オリジナル)

Understanding and replicating the real world is a critical challenge in Artificial General Intelligence (AGI) research. To achieve this, many existing approaches, such as world models, aim to capture the fundamental principles governing the physical world, enabling more accurate simulations and meaningful interactions. However, current methods often treat different modalities, including 2D (images), videos, 3D, and 4D representations, as independent domains, overlooking their interdependencies. Additionally, these methods typically focus on isolated dimensions of reality without systematically integrating their connections. In this survey, we present a unified survey for multimodal generative models that investigate the progression of data dimensionality in real-world simulation. Specifically, this survey starts from 2D generation (appearance), then moves to video (appearance+dynamics) and 3D generation (appearance+geometry), and finally culminates in 4D generation that integrate all dimensions. To the best of our knowledge, this is the first attempt to systematically unify the study of 2D, video, 3D and 4D generation within a single framework. To guide future research, we provide a comprehensive review of datasets, evaluation metrics and future directions, and fostering insights for newcomers. This survey serves as a bridge to advance the study of multimodal generative models and real-world simulation within a unified framework.

arxiv情報

著者	Yuqi Hu,Longguang Wang,Xian Liu,Ling-Hao Chen,Yuwei Guo,Yukai Shi,Ce Liu,Anyi Rao,Zeyu Wang,Hui Xiong
発行日	2025-03-06 17:31:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Simulating the Real World: A Unified Survey of Multimodal Generative Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー