Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

要約

テキストから画像への (T2I) 拡散モデルは、編集、画像の融合、修復などの画像操作方法を導入するために一般的です。同時に、画像からビデオへ (I2V) とテキストからビデオへ (T2V)
) モデルも T2I モデルの上に構築されます。
カンディンスキー 3 は、潜在拡散に基づく新しい T2I モデルであり、高レベルの品質とフォトリアリズムを実現します。
新しいアーキテクチャの主な特徴は、多くのタイプの生成タスクに適応する単純さと効率です。
基本の T2I モデルをさまざまなアプリケーション向けに拡張し、テキストガイドによるインペイント/アウトペイント、画像融合、テキストと画像の融合、画像バリエーション生成、I2V および T2V 生成を含む多機能生成システムを作成します。
また、T2I モデルの蒸留バージョンも紹介します。これは、画質を低下させることなく、ベースモデルよりも 3 倍高速に、逆プロセスの 4 ステップで推論を評価します。
すべての機能をパブリックドメインでテストできる、ユーザーフレンドリーなデモシステムを導入しました。
さらに、カンディンスキー 3 および拡張モデルのソースコードとチェックポイントをリリースしました。
人間による評価では、カンディンスキー 3 はオープンソース生成システムの中で最高の品質スコアの 1 つを示していることが示されています。

要約(オリジナル)

Text-to-image (T2I) diffusion models are popular for introducing image manipulation methods, such as editing, image fusion, inpainting, etc. At the same time, image-to-video (I2V) and text-to-video (T2V) models are also built on top of T2I models. We present Kandinsky 3, a novel T2I model based on latent diffusion, achieving a high level of quality and photorealism. The key feature of the new architecture is the simplicity and efficiency of its adaptation for many types of generation tasks. We extend the base T2I model for various applications and create a multifunctional generation system that includes text-guided inpainting/outpainting, image fusion, text-image fusion, image variations generation, I2V and T2V generation. We also present a distilled version of the T2I model, evaluating inference in 4 steps of the reverse process without reducing image quality and 3 times faster than the base model. We deployed a user-friendly demo system in which all the features can be tested in the public domain. Additionally, we released the source code and checkpoints for the Kandinsky 3 and extended models. Human evaluations show that Kandinsky 3 demonstrates one of the highest quality scores among open source generation systems.

arxiv情報

著者	Vladimir Arkhipkin,Viacheslav Vasilev,Andrei Filatov,Igor Pavlov,Julia Agafonova,Nikolai Gerasimenko,Anna Averchenkova,Evelina Mironova,Anton Bukashkin,Konstantin Kulikov,Andrey Kuznetsov,Denis Dimitrov
発行日	2024-10-28 14:22:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー