LPT++: Efficient Training on Mixture of Long-tailed Experts

要約

パラメーター効率の良い微調整 (PEFT) と学習可能なモデルアンサンブルを組み合わせたロングテール分類のための包括的なフレームワークである LPT++ を紹介します。
LPT++ は、3 つのコアコンポーネントの統合を通じてフリーズビジョントランスフォーマー (ViT) を強化します。
1 つ目は、ユニバーサルロングテール適応モジュールです。これは、ロングテールプロンプトと視覚アダプターを集約して、事前トレーニングされたモデルをターゲットドメインに適応させ、同時にその識別能力を向上させます。
2 つ目は、ロングテールエキスパートフレームワークと専門家混合 (MoE) スコアラーの組み合わせです。これは、視覚のみのモデルと視覚言語 (VL) モデルの両方のエキスパートからの信頼度スコアの再重み付け係数を適応的に計算して、より正確な予測を生成します。
。
最後に、LPT++ は 3 フェーズのトレーニングフレームワークを採用しており、各重要なモジュールが個別に学習されるため、安定した効果的なロングテール分類トレーニングパラダイムが得られます。
さらに、LPT++ の単純なバージョン、つまり視覚のみの事前学習済み ViT とロングテールプロンプトのみを統合して単一のモデル手法を定式化する LPT も提案します。
LPT は、VL 事前トレーニング済みモデルなしで同等のパフォーマンスを達成しながら、ロングテールプロンプトがどのように機能するかを明確に示します。
実験によると、わずか 1% 程度の追加のトレーニング可能なパラメータで、LPT++ はすべての対応物に対して同等の精度を達成します。

要約(オリジナル)

We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble. LPT++ enhances frozen Vision Transformers (ViTs) through the integration of three core components. The first is a universal long-tailed adaptation module, which aggregates long-tailed prompts and visual adapters to adapt the pretrained model to the target domain, meanwhile improving its discriminative ability. The second is the mixture of long-tailed experts framework with a mixture-of-experts (MoE) scorer, which adaptively calculates reweighting coefficients for confidence scores from both visual-only and visual-language (VL) model experts to generate more accurate predictions. Finally, LPT++ employs a three-phase training framework, wherein each critical module is learned separately, resulting in a stable and effective long-tailed classification training paradigm. Besides, we also propose the simple version of LPT++ namely LPT, which only integrates visual-only pretrained ViT and long-tailed prompts to formulate a single model method. LPT can clearly illustrate how long-tailed prompts works meanwhile achieving comparable performance without VL pretrained models. Experiments show that, with only ~1% extra trainable parameters, LPT++ achieves comparable accuracy against all the counterparts.

arxiv情報

著者	Bowen Dong,Pan Zhou,Wangmeng Zuo
発行日	2024-09-17 16:19:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LPT++: Efficient Training on Mixture of Long-tailed Experts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー