From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

要約

私たちは、ターゲットサンプルにアクセスせずにボルツマン分布からサンプリングする神経確率微分方程式、つまり拡散モデルをトレーニングする問題を研究します。
このようなモデルをトレーニングするための既存の方法では、微分可能シミュレーションまたはオフポリシー強化学習 (RL) を使用して、生成プロセスとノイズプロセスの時間逆転を強制します。
エントロピー RL 法 (GFlowNets) と連続時間オブジェクト (偏微分方程式と経路空間測度) をリンクさせて、無限小離散化ステップの極限における目的群間の等価性を証明します。
さらに、トレーニング中に粗い時間離散化を適切に選択すると、サンプル効率が大幅に向上し、時間ローカル目標の使用が可能になり、計算コストを削減しながら標準サンプリングベンチマークで競争力のあるパフォーマンスを達成できることを示します。

要約(オリジナル)

We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.

arxiv情報

著者	Julius Berner,Lorenz Richter,Marcin Sendera,Jarrid Rector-Brooks,Nikolay Malkin
発行日	2025-01-10 18:18:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー