FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors

要約

3Dシーンにおけるテキスト駆動オブジェクト挿入は、自然言語による直感的なシーン編集を可能にする新たなタスクである。しかし、既存の2D編集ベースの手法は、多くの場合、2Dマスクや3Dバウンディングボックスのような空間プリオールに依存しており、挿入されたオブジェクトの一貫性を確保するのに苦労している。これらの制限は、実世界のアプリケーションにおける柔軟性と拡張性を妨げる。本論文では、MLLM、LGM、拡散モデルなどの基礎モデルを活用し、オブジェクト生成と空間配置を切り離す新しいフレームワークFreeInsertを提案する。これにより、空間的事前分布を持たない、教師なしかつ柔軟な3Dシーンへのオブジェクト挿入が可能となる。FreeInsertはMLLMベースのパーサーから始まり、オブジェクトのタイプ、空間的関係、アタッチメント領域を含む構造化されたセマンティクスをユーザの指示から抽出します。これらのセマンティクスは、挿入されたオブジェクトの3D一貫性の再構築と、その自由度の学習の両方を導く。我々は、MLLMの空間的推論能力を活用して、オブジェクトのポーズとスケールを初期化する。階層的で空間を意識した精密化段階は、空間的意味論とMLLMに推測された事前分布をさらに統合し、配置を強化する。最後に、視覚的忠実度を高めるために、挿入されたオブジェクト画像を用いてオブジェクトの外観を改善する。実験結果は、FreeInsertが、空間プリアに依存することなく、意味的に首尾一貫し、空間的に正確で、視覚的にリアルな3D挿入を達成し、ユーザーフレンドリーで柔軟な編集体験を提供することを実証している。

要約(オリジナル)

Text-driven object insertion in 3D scenes is an emerging task that enables intuitive scene editing through natural language. However, existing 2D editing-based methods often rely on spatial priors such as 2D masks or 3D bounding boxes, and they struggle to ensure consistency of the inserted object. These limitations hinder flexibility and scalability in real-world applications. In this paper, we propose FreeInsert, a novel framework that leverages foundation models including MLLMs, LGMs, and diffusion models to disentangle object generation from spatial placement. This enables unsupervised and flexible object insertion in 3D scenes without spatial priors. FreeInsert starts with an MLLM-based parser that extracts structured semantics, including object types, spatial relationships, and attachment regions, from user instructions. These semantics guide both the reconstruction of the inserted object for 3D consistency and the learning of its degrees of freedom. We leverage the spatial reasoning capabilities of MLLMs to initialize object pose and scale. A hierarchical, spatially aware refinement stage further integrates spatial semantics and MLLM-inferred priors to enhance placement. Finally, the appearance of the object is improved using the inserted-object image to enhance visual fidelity. Experimental results demonstrate that FreeInsert achieves semantically coherent, spatially precise, and visually realistic 3D insertions without relying on spatial priors, offering a user-friendly and flexible editing experience.

arxiv情報

著者	Chenxi Li,Weijie Wang,Qiang Li,Bruno Lepri,Nicu Sebe,Weizhi Nie
発行日	2025-05-02 14:53:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー