‘This is my unicorn, Fluffy’: Personalizing frozen vision-language representations

要約

Webスケールのデータで事前にトレーニングされた大規模なビジョンと言語モデルは、多数のV＆L問題に対して非常に貴重な表現を提供します。
ただし、非構造化言語でのユーザー固有の視覚的概念について推論するためにそれらをどのように使用できるかは不明です。
この問題は、パーソナライズされた画像検索からスマートデバイスとのパーソナライズされたインタラクションまで、複数のドメインで発生します。
ユーザー固有の「パーソナライズされた」概念を「実際に」取得およびセグメント化するための2つの新しいベンチマークデータセットを備えた、パーソナライズされたビジョンと言語（PerVL）と呼ばれる新しい学習セットアップを紹介します。
PerVLでは、（1）ダウンストリームタスクとは独立して、（2）事前トレーニングされたモデルが自由な言語でそれらについて推論できるようにし、（3）パーソナライズされた否定的な例を必要としないパーソナライズされた概念を学習する必要があります。
事前にトレーニングされたモデルの入力語彙を、新しいパーソナライズされた概念の新しい単語埋め込みで拡張することによって動作するPerVLを解決するためのアーキテクチャを提案します。
モデルは、文の中でそれらを使用するだけで、それらについて推論できます。
私たちのアプローチは、いくつかの例からパーソナライズされた視覚的概念を学習し、豊富なテキストクエリを使用して画像検索とセマンティックセグメンテーションに効果的に適用できることを示しています。

要約(オリジナル)

Large Vision & Language models pretrained on web-scale data provide representations that are invaluable for numerous V&L problems. However, it is unclear how they can be used for reasoning about user-specific visual concepts in unstructured language. This problem arises in multiple domains, from personalized image retrieval to personalized interaction with smart devices. We introduce a new learning setup called Personalized Vision & Language (PerVL) with two new benchmark datasets for retrieving and segmenting user-specific ‘personalized’ concepts ‘in the wild’. In PerVL, one should learn personalized concepts (1) independently of the downstream task (2) allowing a pretrained model to reason about them with free language, and (3) does not require personalized negative examples. We propose an architecture for solving PerVL that operates by extending the input vocabulary of a pretrained model with new word embeddings for the new personalized concepts. The model can then reason about them by simply using them in a sentence. We demonstrate that our approach learns personalized visual concepts from a few examples and can effectively apply them in image retrieval and semantic segmentation using rich textual queries.

arxiv情報

著者	Niv Cohen,Rinon Gal,Eli A. Meirom,Gal Chechik,Yuval Atzmon
発行日	2022-06-13 13:42:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

‘This is my unicorn, Fluffy’: Personalizing frozen vision-language representations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー