FlexiViT: One Model for All Patch Sizes

要約

ビジョントランスフォーマーは、画像をパッチにスライスしてシーケンスに変換します。
これらのパッチのサイズは、速度と精度のトレードオフを制御します。パッチが小さいほど、計算コストが高くなりますが、精度が高くなりますが、パッチサイズを変更するには、通常、モデルの再トレーニングが必要になります。
このホワイトペーパーでは、トレーニング時にパッチサイズを単純にランダム化するだけで、さまざまなパッチサイズで適切に機能する単一の重みのセットが得られることを示し、展開時にモデルをさまざまなコンピューティングバジェットに合わせて調整できるようにします。
FlexiViT と呼ばれる結果のモデルを、分類、画像テキスト検索、オープンワールド検出、パノプティックセグメンテーション、セマンティックセグメンテーションなどの幅広いタスクで広範囲に評価し、通常は標準に一致し、場合によってはそれを上回ると結論付けています。
ViT モデルは、それ以外は同一のセットアップで単一のパッチサイズでトレーニングされました。
したがって、FlexiViT トレーニングは、ViT バックボーンアーキテクチャに依存するほとんどのモデルにコンピューティング適応機能を簡単に追加できる、ViT の単純なドロップインの改善です。
コードと事前トレーニング済みのモデルは、https://github.com/google-research/big_vision で入手できます。

要約(オリジナル)

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision

arxiv情報

著者	Lucas Beyer,Pavel Izmailov,Alexander Kolesnikov,Mathilde Caron,Simon Kornblith,Xiaohua Zhai,Matthias Minderer,Michael Tschannen,Ibrahim Alabdulmohsin,Filip Pavetic
発行日	2022-12-15 18:18:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FlexiViT: One Model for All Patch Sizes

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー