Automatic Speech Recognition System-Independent Word Error Rate Estimation

要約

単語誤り率 (WER) は、自動音声認識 (ASR) システムによって生成された文字起こしの品質を評価するために使用される指標です。
多くのアプリケーションでは、音声発話とトランスクリプトのペアを考慮して WER を推定することが重要です。
WER 推定に関するこれまでの研究は、特定の ASR システム (ASR システム依存と呼ばれる) を念頭に置いてトレーニングされたモデルの構築に焦点を当てていました。
これらもドメインに依存しており、実際のアプリケーションでは柔軟性がありません。
本稿では、ASR システム独立 WER 推定 (SIWE) のための仮説生成方法を提案します。
以前の研究とは対照的に、WER 推定器は、ASR システム出力をシミュレートするデータを使用してトレーニングされます。
仮説は、音声的に類似した単語、または言語的により可能性の高い代替単語を使用して生成されます。
WER 推定実験では、提案された方法は、ドメイン内データでは ASR システム依存の WER 推定器と同様のパフォーマンスに達し、ドメイン外データでは最先端のパフォーマンスを達成します。
ドメイン外データでは、SIWE モデルは、Switchboard と CALLHOME で二乗平均平方根誤差とピアソン相関係数でベースライン推定値をそれぞれ相対的に 17.58% と 18.21% 上回りました。
トレーニングセットの WER が評価データセットの WER に近づくと、パフォーマンスがさらに向上しました。

要約(オリジナル)

Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.

arxiv情報

著者	Chanho Park,Mingjie Chen,Thomas Hain
発行日	2024-04-26 11:11:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automatic Speech Recognition System-Independent Word Error Rate Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー