CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

要約

さまざまな音声処理タスクに広く適用できる汎用条件付けモデルである、Condition-Aware Self-Supervised Learning Representation (CA-SSLR) を紹介します。
ダウンストリームモデルを最適化する標準の微調整方法と比較して、CA-SSLR は以前の層からの言語と話者の埋め込みを統合し、SSL モデルが現在の言語と話者のコンテキストを認識できるようにします。
このアプローチでは、ベース SSLR の整合性を維持しながら、入力オーディオ機能への依存を軽減します。
CA-SSLR はモデルの機能を向上させ、タスク固有のチューニングを最小限に抑えながら、目に見えないタスクに対する汎用性を実証します。
私たちの手法では、線形変調を採用して内部表現を動的に調整し、元のモデルの動作を大きく変えることなく、きめ細かい適応性を実現します。
実験では、CA-SSLR がトレーニング可能なパラメータの数を減らし、過剰適合を軽減し、リソースが不足しているタスクや目に見えないタスクに優れていることが示されています。
具体的には、CA-SSLR は、ML-SUPERB ベンチマークで LID エラーを 10% 相対的に削減し、ASR CER を 37% 改善し、VoxCeleb-1 で SV EER を 27% 削減し、その有効性を実証しています。

要約(オリジナル)

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model’s capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10% relative reduction in LID errors, a 37% improvement in ASR CER on the ML-SUPERB benchmark, and a 27% decrease in SV EER on VoxCeleb-1, demonstrating its effectiveness.

arxiv情報

著者	Yen-Ju Lu,Jing Liu,Thomas Thebaud,Laureano Moro-Velazquez,Ariya Rastrow,Najim Dehak,Jesus Villalba
発行日	2024-12-05 18:51:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー