Dual-Modal Prompting for Sketch-Based Image Retrieval

要約

スケッチベースの画像検索 (SBIR) は、手描きのスケッチを、対応するリアルな画像と関連付けます。
この研究では、このタスクの 2 つの主要な課題に同時に取り組むことを目的としています。i) ゼロショット、目に見えないカテゴリの処理、および ii) カテゴリ内のインスタンスレベルの検索を参照するきめの細かい処理です。
私たちの主な革新は、限られた目に見えるカテゴリーから蓄積された知識が十分に価値がないか、目に見えないターゲットカテゴリーに移転できない可能性があるため、一般化の観点からこのカテゴリーをまたいだきめ細かい認識タスクに対処するだけでは不十分である可能性があるという認識にあります。
これに触発されて、この研究では、適応プロンプト戦略が設計されたデュアルモーダルプロンプティング CLIP (DP-CLIP) ネットワークを提案します。
具体的には、予測不可能なターゲットカテゴリへの DP-CLIP の適応を容易にするために、ターゲットカテゴリ内の一連の画像とテキストカテゴリラベルを使用して、それぞれカテゴリに適応したプロンプトトークンとチャネルスケールのセットを構築します。
生成されたガイダンスを統合することで、DP-CLIP はカテゴリ中心の貴重な洞察を得ることができ、新しいカテゴリに効率的に適応し、各ターゲットカテゴリ内で効果的に検索するための固有の識別手がかりを捕捉できます。
これらの設計により、当社の DP-CLIP は、Sketchy データセットの Acc.@1 において、最先端のきめ細かいゼロショット SBIR 手法よりも 7.3% 優れています。
一方、他の 2 つのカテゴリレベルのゼロショット SBIR ベンチマークでも、私たちの手法は有望なパフォーマンスを達成しています。

要約(オリジナル)

Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their corresponding realistic images. In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval. Our key innovation lies in the realization that solely addressing this cross-category and fine-grained recognition task from the generalization perspective may be inadequate since the knowledge accumulated from limited seen categories might not be fully valuable or transferable to unseen target categories. Inspired by this, in this work, we propose a dual-modal prompting CLIP (DP-CLIP) network, in which an adaptive prompting strategy is designed. Specifically, to facilitate the adaptation of our DP-CLIP toward unpredictable target categories, we employ a set of images within the target category and the textual category label to respectively construct a set of category-adaptive prompt tokens and channel scales. By integrating the generated guidance, DP-CLIP could gain valuable category-centric insights, efficiently adapting to novel categories and capturing unique discriminative clues for effective retrieval within each target category. With these designs, our DP-CLIP outperforms the state-of-the-art fine-grained zero-shot SBIR method by 7.3% in Acc.@1 on the Sketchy dataset. Meanwhile, in the other two category-level zero-shot SBIR benchmarks, our method also achieves promising performance.

arxiv情報

著者	Liying Gao,Bingliang Jiao,Peng Wang,Shizhou Zhang,Hanwang Zhang,Yanning Zhang
発行日	2024-04-29 13:43:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dual-Modal Prompting for Sketch-Based Image Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー