SinFusion: Training Diffusion Models on a Single Image or Video


当社の画像/ビデオ固有の拡散モデル (SinFusion) は、拡散モデルの条件付け機能を利用しながら、単一の画像またはビデオの外観とダイナミクスを学習します。
次に、同じ動的シーンのさまざまな新しいビデオ サンプルを生成し、短いビデオを長いビデオに外挿し (時間的に順方向と逆方向の両方)、ビデオのアップサンプリングを実行できます。


Diffusion models exhibited tremendous progress in image and video generation, exceeding GANs in quality and diversity. However, they are usually trained on very large datasets and are not naturally adapted to manipulate a given input image or video. In this paper we show how this can be resolved by training a diffusion model on a single input image or video. Our image/video-specific diffusion model (SinFusion) learns the appearance and dynamics of the single image or video, while utilizing the conditioning capabilities of diffusion models. It can solve a wide array of image/video-specific manipulation tasks. In particular, our model can learn from few frames the motion and dynamics of a single input video. It can then generate diverse new video samples of the same dynamic scene, extrapolate short videos into long ones (both forward and backward in time) and perform video upsampling. When trained on a single image, our model shows comparable performance and capabilities to previous single-image models in various image manipulation tasks.


著者 Yaniv Nikankin,Niv Haim,Michal Irani
発行日 2022-11-21 18:59:33+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.CV, cs.LG パーマリンク