Bellman Diffusion Models

要約

拡散モデルは生成アーキテクチャとして大きな成功を収めてきました。
最近では、オフラインの強化学習や模倣学習のポリシーのモデル化に効果的であることが示されています。
ポリシーの後継状態測定 (SSM) のモデルクラスとして拡散を使用することを検討します。
ベルマン流量制約を強制すると、拡散ステップ分布の単純なベルマン更新につながることがわかりました。

要約(オリジナル)

Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.

arxiv情報

著者	Liam Schramm,Abdeslam Boularias
発行日	2024-07-16 20:40:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bellman Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー