Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

要約

このホワイトペーパーでは、統合されたオブジェクト検出およびセグメンテーションフレームワークである Mask DINO を紹介します。
Mask DINO は、すべての画像セグメンテーションタスク (インスタンス、パノプティック、およびセマンティック) をサポートするマスク予測ブランチを追加することで、DINO (改善されたノイズ除去アンカーボックスを使用した DETR) を拡張します。
DINO からのクエリ埋め込みを利用して、高解像度のピクセル埋め込みマップをドット積し、一連のバイナリマスクを予測します。
DINO のいくつかの主要コンポーネントは、共有アーキテクチャとトレーニングプロセスによるセグメンテーション用に拡張されています。
Mask DINO はシンプルで効率的でスケーラブルであり、大規模な検出とセグメンテーションのデータセットを組み合わせることでメリットを得ることができます。
私たちの実験では、ResNet-50 バックボーンと SwinL バックボーンを使用した事前トレーニング済みモデルの両方で、Mask DINO が既存のすべての特殊なセグメンテーション方法よりも大幅に優れていることが示されています。
特に、Mask DINO は、10 億パラメーター以下のモデルの中で、インスタンスセグメンテーション (COCO で 54.5 AP)、パノプティックセグメンテーション (COCO で 59.4 PQ)、およびセマンティックセグメンテーション (ADE20K で 60.8 mIoU) で、これまでで最高の結果を確立しています。
コードは \url{https://github.com/IDEACVR/MaskDINO} で入手できます。

要約(オリジナル)

In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, and scalable, and it can benefit from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing specialized segmentation methods, both on a ResNet-50 backbone and a pre-trained model with SwinL backbone. Notably, Mask DINO establishes the best results to date on instance segmentation (54.5 AP on COCO), panoptic segmentation (59.4 PQ on COCO), and semantic segmentation (60.8 mIoU on ADE20K) among models under one billion parameters. Code is available at \url{https://github.com/IDEACVR/MaskDINO}.

arxiv情報

著者	Feng Li,Hao Zhang,Huaizhe xu,Shilong Liu,Lei Zhang,Lionel M. Ni,Heung-Yeung Shum
発行日	2022-12-12 15:40:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー