InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

要約

タイトル: Deformable Convolutionsを使用した大スケールVision Foundationモデルの探索- InternImage

要約:

– 大規模ビジョントランスフォーマー(ViTs) の進展に対して、畳み込みニューラルネットワーク(CNNs) ベースの大規模モデルはまだ初期段階にある。
– 本研究は、ViTsと同じように、パラメータとトレーニングデータの進展で利益を得ることができる新しい大規模CNNベースのFoundationモデル、InternImageを提示している。
– InternImageは最近の大きな密集したカーネルに焦点を当てたCNNとは異なり、変形可能な畳み込みを中心としたオペレータを使用している。そのため、私たちの提案モデルは、検出やセグメンテーションなどの下流タスクに必要な大きな有効受容野だけでなく、入力とタスク情報によって条件付けられた適応型空間集積も持っています。
– 結果として、提案されたInternImageは、従来のCNNの厳格な帰納バイアスを減らし、ViTsのように大規模パラメータから大量のデータを用いてより強力で堅牢なパターンを学習できるようにします。
– 我々のモデルの効果の検証は、ImageNet、COCO、ADE20Kを含むチャレンジングなベンチマークで証明されており、InternImage-HはCOCO test-devで65.4 mAP、ADE20Kで62.9 mIoUという新記録を達成し、現在の主要なCNNとViTsを凌いでいる。
– コードは https://github.com/OpenGVLab/InternImage にてリリースされます。

要約(オリジナル)

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs. The code will be released at https://github.com/OpenGVLab/InternImage.

arxiv情報

著者	Wenhai Wang,Jifeng Dai,Zhe Chen,Zhenhang Huang,Zhiqi Li,Xizhou Zhu,Xiaowei Hu,Tong Lu,Lewei Lu,Hongsheng Li,Xiaogang Wang,Yu Qiao
発行日	2023-04-17 11:51:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー