Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

要約

事前トレーニングされた Transformer ベースの音声モデルは、自動音声認識や音声言語識別 (SLID) などのさまざまな下流タスクで微調整された場合に、驚くべきパフォーマンスを示しました。
ただし、この領域では、ドメインの不一致の問題が依然として課題となっており、事前トレーニングデータのドメインが、微調整に使用される下流のラベル付きデータのドメインと異なる可能性があります。
SLID などの多言語タスクでは、事前トレーニングされた音声モデルが下流タスクのすべての言語をサポートしていない可能性があります。
この課題に対処するために、事前トレーニングされたモデルを下流タスクのターゲットドメインと言語に適応させる自己教師あり適応事前トレーニング (SAPT) を提案します。
SAPT を XLSR-128 モデルに適用し、SLID タスクに対するこのアプローチの有効性を調査します。
まず、SAPT が FLEURS ベンチマークでの XLSR パフォーマンスを向上させ、過小評価されている言語に対して最大 40.1% の大幅な向上をもたらすことを示します。
次に、数ショット学習設定で 4 つの異なるデータセットに SAPT を適用します。これは、私たちのアプローチが微調整中の XLSR のサンプル効率を向上させることを示しています。
私たちの実験は、自己監視による継続的な適応により、多言語音声モデルの下流のパフォーマンスが向上するという強力な経験的証拠を提供します。

要約(オリジナル)

Pre-trained Transformer-based speech models have shown striking performance when fine-tuned on various downstream tasks such as automatic speech recognition and spoken language identification (SLID). However, the problem of domain mismatch remains a challenge in this area, where the domain of the pre-training data might differ from that of the downstream labeled data used for fine-tuning. In multilingual tasks such as SLID, the pre-trained speech model may not support all the languages in the downstream task. To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task. We apply SAPT to the XLSR-128 model and investigate the effectiveness of this approach for the SLID task. First, we demonstrate that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages. Second, we apply SAPT on four different datasets in a few-shot learning setting, showing that our approach improves the sample efficiency of XLSR during fine-tuning. Our experiments provide strong empirical evidence that continual adaptation via self-supervision improves downstream performance for multilingual speech models.

arxiv情報

著者	Mohammed Maqsood Shaik,Dietrich Klakow,Badr M. Abdullah
発行日	2023-12-12 14:58:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー