Enhancing Features in Long-tailed Data Using Large Vision Model

要約

大規模な言語モデル（LLMS）や大規模な視覚言語モデル（LVLMS）などの言語ベースの基礎モデルは、長期にわたる認識で広く研究されています。
ただし、言語データの必要性は、すべての実用的なタスクには適用されません。
この研究では、言語情報なしでロングテールデータ機能を強化するために、大型ビジョンモデル（LVMS）またはVisual Foundationモデル（VFM）を使用して調査することを目指しています。
具体的には、LVMから機能を抽出し、ベースラインネットワークのマップと潜在スペースの機能でそれらを融合して、拡張機能を取得します。
さらに、潜在空間でいくつかのプロトタイプベースの損失を設計して、増強された特徴の可能性をさらに活用します。
実験セクションでは、Imagenet-LTとInaturalist2018の2つのベンチマークデータセットでアプローチを検証します。

要約(オリジナル)

Language-based foundation models, such as large language models (LLMs) or large vision-language models (LVLMs), have been widely studied in long-tailed recognition. However, the need for linguistic data is not applicable to all practical tasks. In this study, we aim to explore using large vision models (LVMs) or visual foundation models (VFMs) to enhance long-tailed data features without any language information. Specifically, we extract features from the LVM and fuse them with features in the baseline network’s map and latent space to obtain the augmented features. Moreover, we design several prototype-based losses in the latent space to further exploit the potential of the augmented features. In the experimental section, we validate our approach on two benchmark datasets: ImageNet-LT and iNaturalist2018.

arxiv情報

著者	Pengxiao Han,Changkun Ye,Jinguang Tong,Cuicui Jiang,Jie Hong,Li Fang,Xuesong Li
発行日	2025-04-22 12:31:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Features in Long-tailed Data Using Large Vision Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー