uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

要約

マスクされたオートエンコーダー (MAE) は、ラベルのないデータから豊富な低レベル表現を学習しますが、下流のタスクに効果的に適応するには、大量のラベル付きデータが必要です。
逆に、インスタンス識別 (ID) は高レベルのセマンティクスを強調し、MAE のアノテーション要件を軽減する潜在的なソリューションを提供します。
これら 2 つのアプローチを組み合わせることで、限られたラベル付きデータを使用して下流タスクに対処できますが、単純に ID を MAE に統合すると、トレーニング時間が延長され、計算コストが高くなります。
この課題に対処するために、教師なしオーディオ混合を活用する効率的な ID チューニング戦略である uaMix-MAE を導入します。
uaMix-MAE は、対照的なチューニングを利用して、事前トレーニングされた MAE の表現を調整し、それによってタスク固有のセマンティクスへの効果的な適応を促進します。
少量のラベルなしデータを使用してモデルを最適化するために、入力空間と仮想ラベル空間の両方でオーディオサンプルを操作するオーディオミキシング手法を提案します。
低ショット/少数ショット設定での実験では、\modelname が、AudioSet-20K などの限定されたラベルなしデータで調整した場合、さまざまなベンチマークよりも 4 ～ 6% の精度向上を達成することが実証されています。
コードは https://github.com/PLAN-Lab/uamix-MAE で入手できます。

要約(オリジナル)

Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks. Conversely, Instance Discrimination (ID) emphasizes high-level semantics, offering a potential solution to alleviate annotation requirements in MAEs. Although combining these two approaches can address downstream tasks with limited labeled data, naively integrating ID into MAEs leads to extended training times and high computational costs. To address this challenge, we introduce uaMix-MAE, an efficient ID tuning strategy that leverages unsupervised audio mixtures. Utilizing contrastive tuning, uaMix-MAE aligns the representations of pretrained MAEs, thereby facilitating effective adaptation to task-specific semantics. To optimize the model with small amounts of unlabeled data, we propose an audio mixing technique that manipulates audio samples in both input and virtual label spaces. Experiments in low/few-shot settings demonstrate that \modelname achieves 4-6% accuracy improvements over various benchmarks when tuned with limited unlabeled data, such as AudioSet-20K. Code is available at https://github.com/PLAN-Lab/uamix-MAE

arxiv情報

著者	Afrina Tabassum,Dung Tran,Trung Dang,Ismini Lourentzou,Kazuhito Koishida
発行日	2024-03-14 17:13:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー