Open-vocabulary Panoptic Segmentation with Embedding Modulation

要約

オープンボキャブラリー画像セグメンテーションは、現実世界での重要なアプリケーションのためにますます注目を集めています。
従来の閉じた語彙のセグメンテーション方法は、新しいオブジェクトを特徴付けることができませんが、最近のいくつかの開いた語彙の試みでは、不十分な結果が得られます。つまり、閉じた語彙のパフォーマンスが著しく低下し、余分なデータが大量に必要になります。
この目的のために、Open-vocabulary Panoptic Segmentation のための全能でデータ効率の良いフレームワークである OPSNet を提案します。
具体的には、精巧に設計された Embedding Modulation モジュールは、いくつかの細心の注意を払ったコンポーネントとともに、セグメンテーションモデルと視覚言語的に適切に調整された CLIP エンコーダーとの間の適切な埋め込み強化と情報交換を可能にし、オープン語彙とクローズド語彙の両方で優れたセグメンテーションパフォーマンスを実現します。
追加データの必要性がはるかに少ない設定。
提案された OPSNet が最先端の結果を達成するさまざまな状況下で、複数のデータセット (COCO、ADE20K、Cityscapes、および PascalContext など) にわたって広範な実験的評価が行われ、提案されたアプローチの有効性と一般性が実証されます。
コードとトレーニング済みモデルは公開されます。

要約(オリジナル)

Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world. Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results, i.e., notable performance reduction on the closed vocabulary and massive demand for extra data. To this end, we propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panoptic Segmentation. Specifically, the exquisitely designed Embedding Modulation module, together with several meticulous components, enables adequate embedding enhancement and information exchange between the segmentation model and the visual-linguistic well-aligned CLIP encoder, resulting in superior segmentation performance under both open- and closed-vocabulary settings with much fewer need of additional data. Extensive experimental evaluations are conducted across multiple datasets (e.g., COCO, ADE20K, Cityscapes, and PascalContext) under various circumstances, where the proposed OPSNet achieves state-of-the-art results, which demonstrates the effectiveness and generality of the proposed approach. The code and trained models will be made publicly available.

arxiv情報

著者	Xi Chen,Shuang Li,Ser-Nam Lim,Antonio Torralba,Hengshuang Zhao
発行日	2023-03-20 17:58:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Open-vocabulary Panoptic Segmentation with Embedding Modulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー