Could Giant Pretrained Image Models Extract Universal Representations?

要約

凍結された事前学習モデルは、事前学習→微調整のパラダイムに代わる伝達学習の有力な手段となっています。しかし、凍結モデルでは、下流のタスクに適応するためのパラメータが比較的少なく、入出力の形式や価値のある情報の種類が大きく異なるタスクであるコンピュータビジョンにおいて問題となる。本論文では、物体検出、セマンティックセグメンテーション、ビデオアクション認識など、多様で代表的なコンピュータビジョンタスクに適用した場合の凍結プリトレーニングモデルに関する研究を紹介する。この実証分析から、我々の研究は、どのような事前学習タスクがこの凍結設定に最も適合するか、凍結設定を様々な下流タスクに対してより柔軟にする方法、およびより大きなモデルサイズの効果についての質問に答える。さらに、30億のパラメータを持つ巨大なフローズンプリトレーニングモデル（SwinV2-G）を用いて性能の上限を調べたところ、1つの共有フローズンベースネットワークのみで、様々な主要ベンチマークにおいて競争力のある性能に到達することがわかった。COCO物体検出テストでは、box mAP 60.0、mask mAP 52.2、ADE20K意味分割ではval mIoU 57.6、Kinetics-400行動認識では81.7トップ1精度を達成しています。この研究により、事前学習した画像モデルを凍結させるという有望な道に、より大きな注目が集まることを期待しています。

要約(オリジナル)

Frozen pretrained models have become a viable alternative to the pretraining-then-finetuning paradigm for transfer learning. However, with frozen models there are relatively few parameters available for adapting to downstream tasks, which is problematic in computer vision where tasks vary significantly in input/output format and the type of information that is of value. In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition. From this empirical analysis, our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes. We additionally examine the upper bound of performance using a giant frozen pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches competitive performance on a varied set of major benchmarks with only one shared frozen base network: 60.0 box mAP and 52.2 mask mAP on COCO object detection test-dev, 57.6 val mIoU on ADE20K semantic segmentation, and 81.7 top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to bring greater attention to this promising path of freezing pretrained image models.

arxiv情報

著者	Yutong Lin,Ze Liu,Zheng Zhang,Han Hu,Nanning Zheng,Stephen Lin,Yue Cao
発行日	2022-11-03 17:57:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Could Giant Pretrained Image Models Extract Universal Representations?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー