Learning to Prompt Segment Anything Models

要約

SEEM や SAM などのセグメント何でもモデル (SAM) は、あらゆるものをセグメント化する学習において大きな可能性を示しています。
SAM の中心的な設計は、プロンプタブルセグメンテーションにあります。プロンプタブルセグメンテーションは、手作りのプロンプトを入力として受け取り、期待されるセグメンテーションマスクを返します。
SAM は、空間プロンプト (ポイントなど) とセマンティックプロンプト (テキストなど) を含む 2 種類のプロンプトを操作します。これらは連携して、SAM に下流データセット上のあらゆるものをセグメント化するよう促します。
プロンプトの重要な役割にもかかわらず、SAM に適切なプロンプトを取得する方法はほとんど研究されていません。
この研究では、SAM のアーキテクチャを調査し、SAM の効果的なプロンプトを学習するための 2 つの課題を特定します。
この目的を達成するために、より良い SAM のための効果的な意味論的および空間的プロンプトを学習する空間的意味論的プロンプト学習 (SSPrompt) を提案します。
具体的には、SSPrompt は空間プロンプト学習とセマンティックプロンプト学習を導入しています。これにより、空間プロンプトとセマンティックプロンプトが埋め込み空間上で直接最適化され、事前トレーニングされたプロンプトエンコーダーでエンコードされた知識が選択的に活用されます。
広範な実験により、SSPrompt が、広く採用されている複数のデータセットにわたって一貫して優れた画像セグメンテーションパフォーマンスを実現することが示されています。

要約(オリジナル)

Segment Anything Models (SAMs) like SEEM and SAM have demonstrated great potential in learning to segment anything. The core design of SAMs lies with Promptable Segmentation, which takes a handcrafted prompt as input and returns the expected segmentation mask. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts), which work together to prompt SAMs to segment anything on downstream datasets. Despite the important role of prompts, how to acquire suitable prompts for SAMs is largely under-explored. In this work, we examine the architecture of SAMs and identify two challenges for learning effective prompts for SAMs. To this end, we propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs. Specifically, SSPrompt introduces spatial prompt learning and semantic prompt learning, which optimize spatial prompts and semantic prompts directly over the embedding space and selectively leverage the knowledge encoded in pre-trained prompt encoders. Extensive experiments show that SSPrompt achieves superior image segmentation performance consistently across multiple widely adopted datasets.

arxiv情報

著者	Jiaxing Huang,Kai Jiang,Jingyi Zhang,Han Qiu,Lewei Lu,Shijian Lu,Eric Xing
発行日	2024-01-09 16:24:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Prompt Segment Anything Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー