VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

要約

人間によるアノテーションが付けられた属性は、ゼロショット学習における強力なセマンティック埋め込みとして機能します。
ただし、注釈付けプロセスは多大な労力を必要とするため、専門家の監督が必要です。
現在の教師なしセマンティック埋め込み、つまり単語埋め込みにより、クラス間の知識の伝達が可能になります。
ただし、単語の埋め込みは視覚的な類似性を常に反映するとは限らず、ゼロショットのパフォーマンスが低下します。
私たちは、人間による注釈を必要とせずに、ゼロショット学習のための識別的な視覚特性を含むセマンティック埋め込みを発見することを提案します。
私たちのモデルは、視覚的な類似性に応じて、見たクラスからの一連の画像をローカル画像領域のクラスターに視覚的に分割し、さらにそれらのクラス識別と意味的関連性を課します。
これらのクラスターをこれまでに見たことのないクラスと関連付けるために、単語埋め込みなどの外部知識を使用し、新しいクラス関係発見モジュールを提案します。
定量的および定性的評価を通じて、私たちのモデルが、目に見えるクラスと見えないクラスの両方の視覚的プロパティをモデル化する意味論的な埋め込みを発見することを実証します。
さらに、視覚に基づいたセマンティック埋め込みが、さまざまな ZSL モデルにわたって単語埋め込みよりも大幅にパフォーマンスをさらに向上させることを 3 つのベンチマークで実証します。

要約(オリジナル)

Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings, enable knowledge transfer between classes. However, word embeddings do not always reflect visual similarities and result in inferior zero-shot performance. We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness. To associate these clusters with previously unseen classes, we use external knowledge, e.g., word embeddings and propose a novel class relation discovery module. Through quantitative and qualitative evaluation, we demonstrate that our model discovers semantic embeddings that model the visual properties of both seen and unseen classes. Furthermore, we demonstrate on three benchmarks that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.

arxiv情報

著者	Wenjia Xu,Yongqin Xian,Jiuniu Wang,Bernt Schiele,Zeynep Akata
発行日	2023-05-26 09:10:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー