Zero-shot Referring Image Segmentation with Global-Local Context Features

要約

タイトル：グローバル-ローカルコンテキスト特徴を用いた零ショット参照画像のセグメンテーション

要約：
– 参照画像セグメンテーション（RIS）は、入力画像の領域に基づく参照表現が与えられた場合にセグメンテーションマスクを見つけることを目的としています。
– RIS用のラベル付きデータセットの収集は、高価で労力がかかるため、これを克服するために、事前に学習したクロスモーダル知識をCLIPから利用した単純で効果的な零ショット参照画像セグメンテーション方法を提案しています。
– 入力テキストに基づいてグラウンディングされたセグメンテーションマスクを取得するために、マスクガイドビジュアルエンコーダを提案しています。このエンコーダは、入力画像のグローバルおよびローカルの文脈情報を捉えます。
– 私たちの方法は、シェルフおよび提案されたマスク提案技術から得られるインスタンスマスクを活用することにより、詳細なインスタンスレベルのグラウンディングをセグメンテーションしやすくしています。
– 私たちはまた、グローバル特徴が入力表現全体の複雑な文レベルの意味を捕捉し、ローカル特徴が依存構文解析によって抽出された対象名詞句に焦点を当てるグローバル-ローカルテキストエンコーダを導入しています。
– 実験では、提案手法がタスクのいくつかの零ショットベースラインとさらに弱く監視された参照表現セグメンテーション手法よりも優れていることが示されています。私たちのコードはhttps://github.com/Seonghoon-Yu/Zero-shot-RISから利用可能です。

要約(オリジナル)

Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed Istance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.

arxiv情報

著者	Seonghoon Yu,Paul Hongsuck Seo,Jeany Son
発行日	2023-04-03 08:58:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Zero-shot Referring Image Segmentation with Global-Local Context Features

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー