ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation

要約

ヒューマンシーンインタラクション (HSI) の生成は、具体化された AI、仮想現実、ロボット工学のアプリケーションにとって重要です。
既存の手法は、3D シーンでリアルな人間の動きを合成し、もっともらしい人間とオブジェクトのインタラクションを生成できますが、ペアになった 3D シーンとモーションキャプチャデータを含むデータセットに大きく依存しており、多様な環境やインタラクションにわたって収集するには費用と時間がかかります。
ビデオ生成とニューラルヒューマンレンダリングを統合することにより、ゼロショット 4D ヒューマンシーンインタラクション合成を可能にする新しいアプローチである ZeroHSI を紹介します。
私たちの重要な洞察は、膨大な量の人間の自然な動きとインタラクションでトレーニングされた最先端のビデオ生成モデルによって学習されたリッチモーション事前分布を活用し、微分可能なレンダリングを使用して人間のシーンのインタラクションを再構築することです。
ZeroHSI は、グラウンドトゥルースのモーションデータを必要とせずに、静的なシーンと動的オブジェクトのある環境の両方でリアルな人間のモーションを合成できます。
私たちは、さまざまなインタラクションプロンプトを持つさまざまなタイプのさまざまな屋内および屋外シーンの厳選されたデータセットで ZeroHSI を評価し、多様で状況に応じて適切なヒューマンシーンインタラクションを生成する能力を実証します。

要約(オリジナル)

Human-scene interaction (HSI) generation is crucial for applications in embodied AI, virtual reality, and robotics. While existing methods can synthesize realistic human motions in 3D scenes and generate plausible human-object interactions, they heavily rely on datasets containing paired 3D scene and motion capture data, which are expensive and time-consuming to collect across diverse environments and interactions. We present ZeroHSI, a novel approach that enables zero-shot 4D human-scene interaction synthesis by integrating video generation and neural human rendering. Our key insight is to leverage the rich motion priors learned by state-of-the-art video generation models, which have been trained on vast amounts of natural human movements and interactions, and use differentiable rendering to reconstruct human-scene interactions. ZeroHSI can synthesize realistic human motions in both static scenes and environments with dynamic objects, without requiring any ground-truth motion data. We evaluate ZeroHSI on a curated dataset of different types of various indoor and outdoor scenes with different interaction prompts, demonstrating its ability to generate diverse and contextually appropriate human-scene interactions.

arxiv情報

著者	Hongjie Li,Hong-Xing Yu,Jiaman Li,Jiajun Wu
発行日	2024-12-24 18:55:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー