Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control

要約

学習済みスパース検索 (LSR) は、クエリとドキュメントをスパース語彙ベクトルにエンコードするニューラルメソッドのファミリーで、インデックスを作成し、逆インデックスを使用して効率的に取得できます。
私たちは、テキストと画像の検索に焦点を当てて、LSR のマルチモーダルドメインへの応用を検討します。
LSR はテキスト検索では成功を収めていますが、マルチモーダル検索への応用はまだ研究されていません。
LexLIP や STAIR などの現在のアプローチでは、大規模なデータセットに対する複雑な複数ステップのトレーニングが必要です。
私たちが提案するアプローチは、凍結された密なモデルから密なベクトルを疎な語彙ベクトルに効率的に変換します。
私たちは、ベルヌーイ確率変数を使用してクエリ拡張を制御する新しいトレーニングアルゴリズムを通じて、高次元の同時アクティベーションとセマンティック逸脱の問題に対処します。
2 つの高密度モデル (BLIP、ALBEF) と 2 つのデータセット (MSCOCO、Flickr30k) を使用した実験では、提案したアルゴリズムが共活性化と意味論的逸脱を効果的に低減することが示されています。
私たちの最高のパフォーマンスのスパース化モデルは、トレーニング時間が短く、GPU メモリ要件が低いため、最先端のテキスト画像 LSR モデルよりも優れたパフォーマンスを発揮します。
私たちのアプローチは、マルチモーダル設定で LSR 検索モデルをトレーニングするための効果的なソリューションを提供します。
コードとモデルのチェックポイントは、github.com/thongnt99/lsr-multimodal で入手できます。

要約(オリジナル)

Learned sparse retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors that can be indexed and retrieved efficiently with an inverted index. We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval. While LSR has seen success in text retrieval, its application in multimodal retrieval remains underexplored. Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets. Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors. We address issues of high dimension co-activation and semantic deviation through a new training algorithm, using Bernoulli random variables to control query expansion. Experiments with two dense models (BLIP, ALBEF) and two datasets (MSCOCO, Flickr30k) show that our proposed algorithm effectively reduces co-activation and semantic deviation. Our best-performing sparsified model outperforms state-of-the-art text-image LSR models with a shorter training time and lower GPU memory requirements. Our approach offers an effective solution for training LSR retrieval models in multimodal settings. Our code and model checkpoints are available at github.com/thongnt99/lsr-multimodal

arxiv情報

著者	Thong Nguyen,Mariya Hendriksen,Andrew Yates,Maarten de Rijke
発行日	2024-02-27 14:21:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー