APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

要約

フューショットセマンティックセグメンテーション (FSS) は、少数のラベル付きサンプルのみを使用して、目に見えないクラスをセグメント化しようとします。
現在の FSS メソッドは一般に、トレーニングシナリオとアプリケーションシナリオが同様のドメインを共有するという前提に基づいて構築されており、別個のドメインに適用されるとパフォーマンスが大幅に低下します。
この目的を達成するために、一般化強化のために最先端の基盤モデルであるセグメントエニシングモデル (SAM) を活用することを提案します。
ただし、SAM は、主に自然シーンの画像で構成されているトレーニングデータとは異なるドメインでは満足のいくパフォーマンスが得られません。また、対話型のプロンプトメカニズムにより、特定のセマンティクスの自動セグメンテーションをサポートしていません。
私たちの研究では、クロスドメインセマンティックセグメンテーション (CD-FSS) 用の新しい自動プロンプトネットワークである APSeg を紹介します。これは、クロスドメインセグメンテーションをガイドするために自動プロンプトが表示されるように設計されています。
具体的には、サイクル整合性に基づいて抽出された疑似クエリプロトタイプとサポートプロトタイプを融合するデュアルプロトタイプアンカー変換 (DPAT) モジュールを提案します。これにより、機能をより安定したドメインに依存しない空間に変換できます。
さらに、メタプロンプトジェネレーター (MPG) モジュールが導入され、プロンプトの埋め込みが自動的に生成され、手動による視覚的なプロンプトが不要になります。
微調整せずにターゲットドメインに直接適用できる効率的なモデルを構築します。
4 つのクロスドメインデータセットに対する広範な実験により、私たちのモデルは、1 ショット設定および 5 ショット設定での平均精度において、最先端の CD-FSS 手法よりもそれぞれ 5.24% および 3.10% 優れていることがわかりました。

要約(オリジナル)

Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.

arxiv情報

著者	Weizhao He,Yang Zhang,Wei Zhuo,Linlin Shen,Jiaqi Yang,Songhe Deng,Liang Sun
発行日	2024-06-13 03:10:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー