Steerable Scene Generation with Post Training and Inference-Time Search

要約

シミュレーションでロボットをトレーニングするには、ダウンストリームタスクの特定の課題を反映する多様な3Dシーンが必要です。
ただし、もっともらしい空間的配置を備えた高雑然とした環境など、厳格なタスク要件を満たすシーンは、手動でキュレートするのにまれで費用がかかります。
代わりに、ロボット操作のための現実的な環境を近似する手続きモデルを使用して、大規模なシーンデータを生成し、タスク固有の目標に適応させます。
これを行うと、固定資産ライブラリから配置するオブジェクトとそのSE（3）のポーズを予測する統一された拡散ベースの生成モデルをトレーニングすることにより。
このモデルは、補強学習ベースのポストトレーニング、条件付き生成、または推論時間検索、元のデータ分布とは異なる場合でも下流の目標へのステアリング生成を使用して適応できる柔軟なシーンとして機能します。
私たちの方法により、シーンタイプ全体で物理的な実現可能性とスケールを尊重する目標指向のシーン統合が可能になります。
拡散モデルの新しいMCTSベースの推論時間検索戦略を導入し、投影とシミュレーションを介して実行可能性を実施し、5つの多様な環境にまたがる4400万を超えるSE（3）シーンのデータセットをリリースします。
ビデオ、コード、データ、モデルの重み付きウェブサイト：https：//steerable-scene-generation.github.io/

要約(オリジナル)

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method enables goal-directed scene synthesis that respects physical feasibility and scales across scene types. We introduce a novel MCTS-based inference-time search strategy for diffusion models, enforce feasibility via projection and simulation, and release a dataset of over 44 million SE(3) scenes spanning five diverse environments. Website with videos, code, data, and model weights: https://steerable-scene-generation.github.io/

arxiv情報

著者	Nicholas Pfaff,Hongkai Dai,Sergey Zakharov,Shun Iwase,Russ Tedrake
発行日	2025-05-07 22:07:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Steerable Scene Generation with Post Training and Inference-Time Search

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー