MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

要約

生成モデリングと表現学習は、コンピュータービジョンにおける 2 つの重要なタスクです。
ただし、これらのモデルは通常、個別にトレーニングされるため、各タスクが他のタスクに役立つ可能性が無視され、トレーニングとモデルのメンテナンスのオーバーヘッドが発生します。
この作業では、SOTA 画像生成と自己教師あり表現学習を統合する最初のフレームワークである MAsked Generative Encoder (MAGE) を提案します。
私たちの重要な洞察は、マスクされた画像モデリングの事前トレーニングで可変マスキング率を使用すると、同じトレーニングフレームワークの下で生成トレーニング (非常に高いマスキング率) と表現学習 (より低いマスキング率) が可能になるということです。
以前の生成モデルに触発された MAGE は、入力と出力でベクトル量子化された GAN によって学習されたセマンティックトークンを使用し、これをマスキングと組み合わせます。
エンコーダ出力にコントラスト損失を追加することで、表現をさらに改善できます。
MAGE の生成と表現の学習機能を広く評価します。
ImageNet-1K では、単一の MAGE ViT-L モデルがクラス無条件画像生成のタスクで 9.10 FID を取得し、線形プロービングで 78.9% のトップ 1 精度を取得し、画像生成と表現の両方で最先端のパフォーマンスを達成します。
学ぶ。
コードは https://github.com/LTH14/mage で入手できます。

要約(オリジナル)

Generative modeling and representation learning are two key tasks in computer vision. However, these models are typically trained independently, which ignores the potential for each task to help the other, and leads to training and model maintenance overheads. In this work, we propose MAsked Generative Encoder (MAGE), the first framework to unify SOTA image generation and self-supervised representation learning. Our key insight is that using variable masking ratios in masked image modeling pre-training can allow generative training (very high masking ratio) and representation learning (lower masking ratio) under the same training framework. Inspired by previous generative models, MAGE uses semantic tokens learned by a vector-quantized GAN at inputs and outputs, combining this with masking. We can further improve the representation by adding a contrastive loss to the encoder output. We extensively evaluate the generation and representation learning capabilities of MAGE. On ImageNet-1K, a single MAGE ViT-L model obtains 9.10 FID in the task of class-unconditional image generation and 78.9% top-1 accuracy for linear probing, achieving state-of-the-art performance in both image generation and representation learning. Code is available at https://github.com/LTH14/mage.

arxiv情報

著者	Tianhong Li,Huiwen Chang,Shlok Kumar Mishra,Han Zhang,Dina Katabi,Dilip Krishnan
発行日	2022-11-16 18:59:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー