SLiMe: Segment Like Me

要約

画像編集、画像対応、3D 形状生成などのさまざまな下流タスクで、安定拡散 (SD) などの大規模なビジョン言語モデルを使用することで、大幅な進歩が見られました。
これらの進歩に触発されて、私たちは SLiMe を提案することで、わずか 1 つの注釈付きサンプルを使用して、任意の粒度で画像をセグメント化するための広範な視覚言語モデルの活用を検討しています。
SLiMe は、この問題を最適化タスクとして枠組み化します。
具体的には、単一のトレーニング画像とそのセグメンテーションマスクが与えられると、最初に SD 事前分布から新しい「加重累積セルフアテンションマップ」を含むアテンションマップを抽出します。
次に、抽出されたアテンションマップを使用して、安定拡散のテキスト埋め込みが最適化され、それぞれがトレーニング画像から単一のセグメント化された領域について学習します。
これらの学習された埋め込みにより、アテンションマップ内のセグメント化された領域が強調表示され、それを使用してセグメンテーションマップを導出できます。
これにより、SLiMe は、ほんの 1 つの例を使用して、トレーニング画像内のセグメント化された領域の粒度を使用して推論中に実世界の画像をセグメント化できるようになります。
さらに、利用可能な場合、つまり少数ショットなどの追加のトレーニングデータを活用すると、SLiMe のパフォーマンスが向上します。
私たちは、さまざまな設計要素を調査する知識豊富な一連の実験を実行し、SLiMe が他の既存のワンショットおよび少数ショットのセグメンテーション手法よりも優れていることを示しました。

要約(オリジナル)

Significant strides have been made using large vision-language models, like Stable Diffusion (SD), for a variety of downstream tasks, including image editing, image correspondence, and 3D shape generation. Inspired by these advancements, we explore leveraging these extensive vision-language models for segmenting images at any desired granularity using as few as one annotated sample by proposing SLiMe. SLiMe frames this problem as an optimization task. Specifically, given a single training image and its segmentation mask, we first extract attention maps, including our novel ‘weighted accumulated self-attention map’ from the SD prior. Then, using the extracted attention maps, the text embeddings of Stable Diffusion are optimized such that, each of them, learn about a single segmented region from the training image. These learned embeddings then highlight the segmented region in the attention maps, which in turn can then be used to derive the segmentation map. This enables SLiMe to segment any real-world image during inference with the granularity of the segmented region in the training image, using just one example. Moreover, leveraging additional training data when available, i.e. few-shot, improves the performance of SLiMe. We carried out a knowledge-rich set of experiments examining various design factors and showed that SLiMe outperforms other existing one-shot and few-shot segmentation methods.

arxiv情報

著者	Aliasghar Khani,Saeid Asgari Taghanaki,Aditya Sanghi,Ali Mahdavi Amiri,Ghassan Hamarneh
発行日	2023-09-29 15:14:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SLiMe: Segment Like Me

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー