Trajectory balance: Improved credit assignment in GFlowNets

要約

生成フローネットワーク (GFlowNets) は、一連のアクションによって特定の非正規化密度からグラフや文字列などの構成オブジェクトを生成するための確率的ポリシーを学習する方法であり、多くの可能なアクションシーケンスが同じオブジェクトにつながる可能性があります。
GFlowNets に対して以前に提案された学習目標であるフローマッチングと詳細なバランスは、時間差分学習に類似しており、長いアクションシーケンス全体で非効率的なクレジット伝播を起こしやすいことがわかりました。
したがって、私たちは、以前に使用されていた目標に代わるより効率的なものとして、GFlowNets の新しい学習目標である軌道バランスを提案します。
軌道バランス目標のグローバルミニマイザーは、ターゲット分布から正確にサンプリングするポリシーを定義できることを証明します。
4 つの異なるドメインでの実験で、GFlowNet 収束のための軌道バランス目標の利点、生成されたサンプルの多様性、長いアクションシーケンスと大きなアクションスペースに対する堅牢性を経験的に実証しました。

要約(オリジナル)

Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.

arxiv情報

著者	Nikolay Malkin,Moksh Jain,Emmanuel Bengio,Chen Sun,Yoshua Bengio
発行日	2023-10-04 16:30:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Trajectory balance: Improved credit assignment in GFlowNets

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー