StyleNAT: Giving Each Head a New Perspective

要約

画像生成は長い間求められてきましたが、挑戦的なタスクであり、生成タスクを効率的に実行することも同様に困難です。
多くの場合、研究者は、大幅に異なるデータセットのパラメーター空間にほとんど違いがない「フリーサイズ」のジェネレーターを作成しようとします。
ここでは、優れた効率と柔軟性を備えた高品質の画像生成をターゲットとする、StyleNAT と呼ばれる新しいトランスフォーマーベースのフレームワークを紹介します。
私たちのモデルの中核にあるのは、慎重に設計されたフレームワークであり、注意の頭を分割してローカル情報とグローバル情報をキャプチャします。これは、ネイバーフッドアテンション (NA) を使用して実現されます。
さまざまな受容野に注意を向けることができるさまざまな頭部により、モデルはこの情報をより適切に結合し、非常に柔軟な方法で手元のデータに適応することができます。
StyleNAT は FFHQ-256 で 2.046 の新しい SOTA FID スコアを達成し、StyleGAN-XL などの畳み込みモデルと HIT や StyleSwin などのトランスフォーマーを使用した従来技術を打ち負かし、FFHQ-1024 での新しいトランスフォーマー SOTA は 4.174 の FID スコアを達成しました。
これらの結果は、StyleGAN-XL と比較した場合、FFHQ-256 スコアが 6.4% 向上し、パラメーター数が 28% 減少し、サンプリングスループットが 56% 向上したことを示しています。
コードとモデルは、https://github.com/SHI-Labs/StyleNAT でオープンソース化されます。

要約(オリジナル)

Image generation has been a long sought-after but challenging task, and performing the generation task in an efficient manner is similarly difficult. Often researchers attempt to create a ‘one size fits all’ generator, where there are few differences in the parameter space for drastically different datasets. Herein, we present a new transformer-based framework, dubbed StyleNAT, targeting high-quality image generation with superior efficiency and flexibility. At the core of our model, is a carefully designed framework that partitions attention heads to capture local and global information, which is achieved through using Neighborhood Attention (NA). With different heads able to pay attention to varying receptive fields, the model is able to better combine this information, and adapt, in a highly flexible manner, to the data at hand. StyleNAT attains a new SOTA FID score on FFHQ-256 with 2.046, beating prior arts with convolutional models such as StyleGAN-XL and transformers such as HIT and StyleSwin, and a new transformer SOTA on FFHQ-1024 with an FID score of 4.174. These results show a 6.4% improvement on FFHQ-256 scores when compared to StyleGAN-XL with a 28% reduction in the number of parameters and 56% improvement in sampling throughput. Code and models will be open-sourced at https://github.com/SHI-Labs/StyleNAT .

arxiv情報

著者	Steven Walton,Ali Hassani,Xingqian Xu,Zhangyang Wang,Humphrey Shi
発行日	2022-11-10 18:55:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

StyleNAT: Giving Each Head a New Perspective

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー