Zero-Shot Composed Image Retrieval with Textual Inversion

要約

合成画像検索 (CIR) は、参照画像と 2 つの画像の違いを説明する相対的なキャプションで構成されるクエリに基づいて、ターゲット画像を取得することを目的としています。
既存の方法は教師あり学習に依存しているため、CIR 用のデータセットのラベル付けに必要な多大な労力とコストが、既存の方法の広範な使用を妨げています。
この作業では、ラベル付けされたトレーニングデータセットを必要とせずに CIR に対処することを目的とした新しいタスク、Zero-Shot CIR (ZS-CIR) を提案します。
テキスト反転によるゼロショット合成画像検索（SEARLE）と名付けられた私たちのアプローチは、参照画像の視覚的特徴をCLIPトークン埋め込みスペースの疑似単語トークンにマッピングし、それを相対的なキャプションと統合します。
ZS-CIR の研究をサポートするために、コンテキスト内の共通オブジェクトでの合成画像検索 (CIRCO) という名前のオープンドメインベンチマークデータセットを導入します。これは、クエリごとに複数のグラウンドトゥルースを含む CIR の最初のデータセットです。
実験は、CIR タスクの 2 つの主要なデータセット、FashionIQ と CIRR、および提案された CIRCO で、SEARLE がベースラインよりも優れたパフォーマンスを示すことを示しています。
データセット、コード、およびモデルは、https://github.com/miccunifi/SEARLE で公開されています。

要約(オリジナル)

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images. The high effort and cost required for labeling datasets for CIR hamper the widespread usage of existing methods, as they rely on supervised learning. In this work, we propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset. Our approach, named zero-Shot composEd imAge Retrieval with textuaL invErsion (SEARLE), maps the visual features of the reference image into a pseudo-word token in CLIP token embedding space and integrates it with the relative caption. To support research on ZS-CIR, we introduce an open-domain benchmarking dataset named Composed Image Retrieval on Common Objects in context (CIRCO), which is the first dataset for CIR containing multiple ground truths for each query. The experiments show that SEARLE exhibits better performance than the baselines on the two main datasets for CIR tasks, FashionIQ and CIRR, and on the proposed CIRCO. The dataset, the code and the model are publicly available at https://github.com/miccunifi/SEARLE .

arxiv情報

著者	Alberto Baldrati,Lorenzo Agnolucci,Marco Bertini,Alberto Del Bimbo
発行日	2023-03-27 14:31:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-Shot Composed Image Retrieval with Textual Inversion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー