Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

要約

ニューラルネットワークは、非侵入的な音声明瞭度の予測に使用されて成功しています。
最近、事前トレーニングされた自己教師付きモデルと弱教師付きモデルの中間層から取得した特徴表現の使用が、このタスクに特に有用であることが判明しました。
この研究では、Whisper ASR デコーダ層表現をニューラルネットワーク入力特徴として使用することと、模範ベースの心理的に動機づけられた人間の記憶モデルを組み合わせて、補聴器ユーザーの人間の明瞭度評価を予測します。
確立された侵入型 HASPI ベースラインシステムと比較して、トレーニングデータでは見られない拡張システムやリスナーを含め、パフォーマンスの大幅な向上が見られ、ベースラインの 28.7 と比較して二乗平均平方根誤差は 25.3 でした。

要約(オリジナル)

Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users. Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.

arxiv情報

著者	Rhiannon Mogridge,George Close,Robert Sutherland,Thomas Hain,Jon Barker,Stefan Goetze,Anton Ragni
発行日	2024-01-24 17:31:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー