Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

要約

3Dビジョンとコンピューターグラフィックスでは、高品質で光リアリスティックな3Dアセットを生成することは、依然として長年の課題です。
拡散モデルなどの最先端の生成モデルは、3D世代で大きな進歩を遂げていますが、指示に従う、人間の好みに合わせたり、現実的なテクスチャ、ジオメトリ、物理的属性を生成する能力が限られているため、人間が設計したコンテンツに依存していることがよくあります。
このホワイトペーパーでは、2Dリワードを使用した3Dネイティブ拡散モデルの非常に効果的でサンプル効率の高い補強学習アライメントフレームワークであるNABLA-R2D3を紹介します。
スコア関数に一致する最近提案されたNABLA-GFLOWNETメソッドに基づいて構築されています。これは、2D報酬信号のみを使用して3D拡散モデルの効果的な適応を可能にするために、プリンシップの勾配に勾配を報酬を与えます。
広範な実験によると、報酬のハッキングに苦労するか苦しむのに苦労しているバニラの微調整ベースラインとは異なり、NABLA-R2D3は一貫してより高い報酬を達成し、いくつかの微調整ステップ内で事前の忘却を減らしました。

要約(オリジナル)

Generating high-quality and photorealistic 3D assets remains a longstanding challenge in 3D vision and computer graphics. Although state-of-the-art generative models, such as diffusion models, have made significant progress in 3D generation, they often fall short of human-designed content due to limited ability to follow instructions, align with human preferences, or produce realistic textures, geometries, and physical attributes. In this paper, we introduce Nabla-R2D3, a highly effective and sample-efficient reinforcement learning alignment framework for 3D-native diffusion models using 2D rewards. Built upon the recently proposed Nabla-GFlowNet method, which matches the score function to reward gradients in a principled manner for reward finetuning, our Nabla-R2D3 enables effective adaptation of 3D diffusion models using only 2D reward signals. Extensive experiments show that, unlike vanilla finetuning baselines which either struggle to converge or suffer from reward hacking, Nabla-R2D3 consistently achieves higher rewards and reduced prior forgetting within a few finetuning steps.

arxiv情報

著者	Qingming Liu,Zhen Liu,Dinghuai Zhang,Kui Jia
発行日	2025-06-18 17:59:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー