Few Shot Class Incremental Learning using Vision-Language models

要約

深層学習の最近の進歩により、さまざまな教師付きコンピュータービジョンタスクにわたって人間の能力に匹敵する驚くべきパフォーマンスが実証されました。
ただし、モデルのトレーニング前にすべてのクラスを網羅する広範なトレーニングデータのプールがあるという一般的な前提は、新しいクラスで利用できるデータが限られているのが標準である現実世界のシナリオとは異なることがよくあります。
サンプルが少ない新しいクラスをトレーニングデータにシームレスに統合する際に課題が生じ、モデルが基本クラスのパフォーマンスを損なうことなくこれらの追加に適切に対応することが求められます。
この緊急性に対処するために、研究コミュニティは、少数ショットクラス増分学習 (FSCIL) の領域でいくつかのソリューションを導入しました。
この研究では、言語正則化機能と部分空間正則化機能を利用する革新的な FSCIL フレームワークを紹介します。
基本トレーニング中に、言語正則化機能は、視覚言語モデルから抽出されたセマンティック情報を組み込むのに役立ちます。
部分空間正則化機能は、増分トレーニング中に基本クラスに固有の画像とテキストのセマンティクスの間の微妙な関係をモデルが取得しやすくするのに役立ちます。
私たちが提案したフレームワークは、モデルが限られたデータを持つ新しいクラスを受け入れることができるようにするだけでなく、基本クラスのパフォーマンスの維持も保証します。
私たちのアプローチの有効性を実証するために、私たちは 3 つの異なる FSCIL ベンチマークで包括的な実験を実施し、そこで私たちのフレームワークは最先端のパフォーマンスを達成します。

要約(オリジナル)

Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model’s acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.

arxiv情報

著者	Anurag Kumar,Chinmay Bharti,Saikat Dutta,Srikrishna Karanam,Biplab Banerjee
発行日	2024-08-15 13:36:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Few Shot Class Incremental Learning using Vision-Language models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー