MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping

要約

少数のセマンティックセグメンテーションでは、クエリ画像のオブジェクトをセグメント化するという課題に対処します。
ただし、以前の最先端の方法の多くは、複雑なローカルセマンティック機能を破棄するか、高い計算の複雑さに苦しむ必要があります。
これらの課題に対処するために、変圧器アーキテクチャに基づいて、新しい少数のセマンティックセグメンテーションフレームワークを提案します。
私たちのアプローチでは、空間変圧器デコーダーとコンテキストマスク生成モジュールを紹介して、サポート画像とクエリ画像の間のリレーショナル理解を改善します。
さらに、さまざまな解像度から機能を階層的に組み込むことにより、セグメンテーションマスクを改良するためのマルチスケールデコーダーを導入します。
さらに、当社のアプローチは、中間エンコーダーステージのグローバルな機能を統合して、コンテキストの理解を改善しながら、軽量構造を維持して複雑さを軽減します。
このパフォーマンスと効率のバランスにより、当社の方法は、1ショット設定と5ショット設定の両方で、Pascal-5^IやCoCO-20^Iなどのベンチマークデータセットで競争結果を達成できます。
特に、わずか150万のパラメーターを備えたモデルは、既存の方法論の制限を克服しながら、競争力のあるパフォーマンスを示しています。

要約(オリジナル)

Few-shot Semantic Segmentation addresses the challenge of segmenting objects in query images with only a handful of annotated examples. However, many previous state-of-the-art methods either have to discard intricate local semantic features or suffer from high computational complexity. To address these challenges, we propose a new Few-shot Semantic Segmentation framework based on the Transformer architecture. Our approach introduces the spatial transformer decoder and the contextual mask generation module to improve the relational understanding between support and query images. Moreover, we introduce a multi scale decoder to refine the segmentation mask by incorporating features from different resolutions in a hierarchical manner. Additionally, our approach integrates global features from intermediate encoder stages to improve contextual understanding, while maintaining a lightweight structure to reduce complexity. This balance between performance and efficiency enables our method to achieve competitive results on benchmark datasets such as PASCAL-5^i and COCO-20^i in both 1-shot and 5-shot settings. Notably, our model with only 1.5 million parameters demonstrates competitive performance while overcoming limitations of existing methodologies.

arxiv情報

著者	Amirreza Fateh,Mohammad Reza Mohammadi,Mohammad Reza Jahed Motlagh
発行日	2025-06-02 10:22:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー