Hypernetworks for Personalizing ASR to Atypical Speech

要約

自動音声認識 (ASR) をパーソナライズするためのパラメーター効率の良い微調整 (PEFT) は、一般人口モデルを非定型音声に適応させる可能性があることを最近示しました。
しかし、これらのアプローチは、適応される非定型言語障害についての先験的な知識を前提としています。その診断には、常に利用できるとは限らない専門知識が必要です。
この知識があったとしても、データの不足とスピーカー間/スピーカー内の高い変動性により、従来の微調整の有効性はさらに制限されます。
これらの課題を回避するために、まず、ASR 適応に必要なモデルパラメーターの最小限のセットを特定します。
適応パフォーマンスに対する各パラメータの影響を分析した結果、すべての重みの 0.03% を適応しながら、ワード誤り率 (WER) を半分に減らすことができます。
コホート固有のモデルの必要性を軽減して、次に、さまざまな非定型音声特性に対して高度に個別化された発話レベルの適応をオンザフライで生成する、メタ学習されたハイパーネットワークの新しい使用法を提案します。
グローバル、コホート、および個人レベルでの適応を評価したところ、ハイパーネットワークは分布域外の話者に対してよりよく一般化する一方で、パラメータ全体の予算の 0.1% を使用して全体の相対的な WER 削減率 75.2% を維持できることがわかりました。

要約(オリジナル)

Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. However, these approaches assume a priori knowledge of the atypical speech disorder being adapted for — the diagnosis of which requires expert knowledge that is not always available. Even given this knowledge, data scarcity and high inter/intra-speaker variability further limit the effectiveness of traditional fine-tuning. To circumvent these challenges, we first identify the minimal set of model parameters required for ASR adaptation. Our analysis of each individual parameter’s effect on adaptation performance allows us to reduce Word Error Rate (WER) by half while adapting 0.03% of all weights. Alleviating the need for cohort-specific models, we next propose the novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics. Evaluating adaptation at the global, cohort and individual-level, we show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.

arxiv情報

著者	Max Mueller-Eberstein,Dianna Yee,Karren Yang,Gautam Varma Mantena,Colin Lea
発行日	2024-06-07 16:14:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hypernetworks for Personalizing ASR to Atypical Speech

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー