PhysGen3D: Crafting a Miniature Interactive World from a Single Image

要約

単一の画像から物理的にもっともらしい結果を想像するには、世界のダイナミクスを深く理解する必要があります。
これに対処するために、単一の画像をアモダルのカメラ中心のインタラクティブな3Dシーンに変換する新しいフレームワークであるPhysGen3Dを紹介します。
高度な画像ベースの幾何学的およびセマンティック理解と物理ベースのシミュレーションを組み合わせることにより、PhysGen3Dは静的画像からインタラクティブな3Dワールドを作成し、ユーザーの入力に基づいて将来のシナリオを「想像」してシミュレートすることができます。
そのコアでは、PhysGen3Dは3D形状、ポーズ、物理的および照明特性をオブジェクトの物理的および照明特性と推定し、それにより、現実的なオブジェクトの相互作用を駆動する重要な物理的属性をキャプチャします。
このフレームワークにより、ユーザーは、生成されたビデオ成果を強化するために、オブジェクト速度や材料プロパティなどの正確な初期条件を指定できます。
Pika、Kling、Gen-3を含むクローズドソースの最先端の（SOTA）画像からビデオへのパフォーマンスを評価し、PhysGen3Dのリアルな物理学でビデオを生成しながら、より柔軟性と微細なコントロールを提供する能力を示しています。
我々の結果は、PhysGen3Dがフォトリアリズム、身体的妥当性、ユーザー駆動型のインタラクティブ性のユニークなバランスを達成し、画像から動的な物理学的なビデオを生成するための新しい可能性を開くことを示しています。

要約(オリジナル)

Envisioning physically plausible outcomes from a single image requires a deep understanding of the world’s dynamics. To address this, we introduce PhysGen3D, a novel framework that transforms a single image into an amodal, camera-centric, interactive 3D scene. By combining advanced image-based geometric and semantic understanding with physics-based simulation, PhysGen3D creates an interactive 3D world from a static image, enabling us to ‘imagine’ and simulate future scenarios based on user input. At its core, PhysGen3D estimates 3D shapes, poses, physical and lighting properties of objects, thereby capturing essential physical attributes that drive realistic object interactions. This framework allows users to specify precise initial conditions, such as object speed or material properties, for enhanced control over generated video outcomes. We evaluate PhysGen3D’s performance against closed-source state-of-the-art (SOTA) image-to-video models, including Pika, Kling, and Gen-3, showing PhysGen3D’s capacity to generate videos with realistic physics while offering greater flexibility and fine-grained control. Our results show that PhysGen3D achieves a unique balance of photorealism, physical plausibility, and user-driven interactivity, opening new possibilities for generating dynamic, physics-grounded video from an image.

arxiv情報

著者	Boyuan Chen,Hanxiao Jiang,Shaowei Liu,Saurabh Gupta,Yunzhu Li,Hao Zhao,Shenlong Wang
発行日	2025-03-26 17:31:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー