Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review

要約

このチュートリアルでは、拡散モデルの下流報酬関数を最適化するための推論時間のガイダンスと調整方法に関する詳細なガイドを提供します。
拡散モデルは生成モデリング機能で知られていますが、生物学などの分野での実際の応用では、特定の指標 (安定性、タンパク質の親和性、標的構造への近さなど) を最大化するサンプル生成が必要になることがよくあります。
これらのシナリオでは、拡散モデルを適応させて現実的なサンプルを生成するだけでなく、推論時に微調整せずに目的の測定値を明示的に最大化することもできます。
このチュートリアルでは、このような推論時アルゴリズムの基本的な側面について説明します。
これらの手法を統一的な観点からレビューし、逐次モンテカルロ (SMC) ベースのガイダンス、値ベースのサンプリング、分類器ガイダンスなどの現在の手法がソフト最適ノイズ除去プロセス (別名 RL のポリシー) を近似することを目的としていることを実証します。
これは、事前トレーニングされたノイズ除去プロセスと、中間状態から最終的な報酬までを予測する先読み関数として機能する値関数を組み合わせたものです。
この枠組みの中で、まだ文献で取り上げられていないいくつかの新しいアルゴリズムを紹介します。
さらに、(1) 推論時手法と組み合わせた微調整方法、(2) 現在の研究ではあまり注目されていないモンテカルロ木探索などの探索アルゴリズムに基づく推論時アルゴリズム、および (3) 接続について説明します。
言語モデルと拡散モデルの推論時間アルゴリズムの間。
タンパク質設計に関するこのチュートリアルのコードは、https://github.com/masa-ue/AlignInversePro で入手できます。

要約(オリジナル)

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities, practical applications in fields such as biology often require sample generation that maximizes specific metrics (e.g., stability, affinity in proteins, closeness to target structures). In these scenarios, diffusion models can be adapted not only to generate realistic samples but also to explicitly maximize desired measures at inference time without fine-tuning. This tutorial explores the foundational aspects of such inference-time algorithms. We review these methods from a unified perspective, demonstrating that current techniques — such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance — aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. Within this framework, we present several novel algorithms not yet covered in the literature. Furthermore, we discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, which have received limited attention in current research, and (3) connections between inference-time algorithms in language models and diffusion models. The code of this tutorial on protein design is available at https://github.com/masa-ue/AlignInversePro

arxiv情報

著者	Masatoshi Uehara,Yulai Zhao,Chenyu Wang,Xiner Li,Aviv Regev,Sergey Levine,Tommaso Biancalani
発行日	2025-01-16 17:37:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー