LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

要約

タイトル：LMPT：長尾多ラベル視覚認識のプロンプト・チューニングによるクラス特有の埋め込み損失

要約：

– LTMLタスクは、ラベル共起とデータの不均衡な分布のため、非常に難しいタスクである。
– 本研究では、テキストと画像モダリティのデータを組み合わせてカテゴリ間の意味的な特徴の相互作用を捕捉し、ヘッドとテールの両方のクラスで同期的にパフォーマンスを向上させる、LTMLの統合フレームワークであるプロンプト・チューニングwith class-specific embedding loss（LMPT）を提案している。
– 具体的には、LMPTは、クラス意識のソフトマージンと再重み付けを伴う埋め込み損失関数を導入し、テキストの記述（キャプション）の利点を生かして、クラス特有の文脈を学習し、特にヘッドとテールのクラス間の意味的な関係を確立するのに役立つ。
– さらに、クラスの不均衡を考慮し、分布バランスの損失を分類損失関数として採用することで、テールクラスのパフォーマンスを犠牲にせずに向上させる。
– VOC-LTとCOCO-LTのデータセットを用いた広範な実験により、本手法が従来の先行研究手法やゼロショットCLIPを大幅に上回ることが示された。本研究のコードは、\url{https://github.com/richard-peng-xia/LMPT}で完全に利用可能である。

要約(オリジナル)

Long-tailed multi-label visual recognition (LTML) task is a highly challenging task due to the label co-occurrence and imbalanced data distribution. In this work, we propose a unified framework for LTML, namely prompt tuning with class-specific embedding loss (LMPT), capturing the semantic feature interactions between categories by combining text and image modality data and improving the performance synchronously on both head and tail classes. Specifically, LMPT introduces the embedding loss function with class-aware soft margin and re-weighting to learn class-specific contexts with the benefit of textual descriptions (captions), which could help establish semantic relationships between classes, especially between the head and tail classes. Furthermore, taking into account the class imbalance, the distribution-balanced loss is adopted as the classification loss function to further improve the performance on the tail classes without compromising head classes. Extensive experiments are conducted on VOC-LT and COCO-LT datasets, which demonstrates that the proposed method significantly surpasses the previous state-of-the-art methods and zero-shot CLIP in LTML. Our codes are fully available at \url{https://github.com/richard-peng-xia/LMPT}.

arxiv情報

著者	Peng Xia,Di Xu,Lie Ju,Ming Hu,Jun Chen,Zongyuan Ge
発行日	2023-05-08 08:14:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー