Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

要約

この論文では、シングルポジティブマルチラベル学習への新しいアプローチを紹介します。
一般的なマルチラベル学習では、モデルは 1 つの入力画像に対して複数のラベルまたはカテゴリを予測することを学習します。
これは、タスクが画像の多数の可能なラベルから 1 つのラベルを予測する標準のマルチクラス画像分類とは対照的です。
Single-Positive Multi-label Learning (SPML) では、トレーニングデータ内の画像ごとにアノテーションが 1 つしかない場合に、複数のラベルを予測する学習を特に考慮します。
実世界のデータには同時に複数のカテゴリに属するインスタンスが含まれることが多いため、マルチラベル学習は多くの点で単一ラベル学習よりも現実的なタスクです。
ただし、インスタンスごとに複数の高品質のアノテーションを収集する固有の複雑さとコストのため、ほとんどの一般的なコンピュータービジョンデータセットには主に単一のラベルが含まれています。
私たちは、視覚言語擬似ラベリング (VLPL) と呼ばれる新しいアプローチを提案します。これは、視覚言語モデルを使用して、強力な肯定的および陰性の擬似ラベルを提案し、現在の SOTA 手法よりも Pascal VOC で 5.5%、MS で 18.4% 優れています。
-COCO、NUS-WIDEで15.2%、CUB-Birdsで8.4%。
コードとデータは https://github.com/mvrl/VLPL で入手できます。

要約(オリジナル)

This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL.

arxiv情報

著者	Xin Xing,Zhexiao Xiong,Abby Stylianou,Srikumar Sastry,Liyu Gong,Nathan Jacobs
発行日	2023-10-24 16:36:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー