Measuring the Accuracy of Automatic Speech Recognition Solutions

要約

聴覚障害者および聴覚障害者 (DHH) の人々にとって、キャプションは不可欠なアクセシビリティツールです。
人工知能 (AI) の著しい発展により、自動音声認識 (ASR) が多くの一般的なアプリケーションの一部となっています。
これにより、キャプションの作成が簡単になり、広く利用できるようになります。ただし、文字起こしにアクセスするには、高いレベルの精度が必要です。
科学出版物や業界は非常に低いエラー率を報告しており、AI は人間と同等か、手動による転写を上回る性能を備えていると主張しています。
同時に、DHH コミュニティは、ASR の精度と信頼性に関する重大な問題を報告しています。
技術革新と文字起こしに依存している人々の実生活の体験の間には不一致があるようです。
ASR の状態を把握するには、独立した包括的なデータが必要です。
高等教育の講義の録音を使用して、11 の一般的な ASR サービスのパフォーマンスを測定しました。
ストリーミング、語彙の使用、言語間の違いなどの技術的条件の影響を評価しました。
私たちの結果は、ベンダー間および個々のオーディオサンプルの精度に大きなばらつきがあることを示しています。
また、ライブイベントに使用されるストリーミング ASR の品質が大幅に低いことも測定しました。
私たちの調査では、ASR は最近改善されているにもかかわらず、一般的なサービスには精度の信頼性が欠けていることが示されています。

要約(オリジナル)

For d/Deaf and hard of hearing (DHH) people, captioning is an essential accessibility tool. Significant developments in artificial intelligence (AI) mean that Automatic Speech Recognition (ASR) is now a part of many popular applications. This makes creating captions easy and broadly available – but transcription needs high levels of accuracy to be accessible. Scientific publications and industry report very low error rates, claiming AI has reached human parity or even outperforms manual transcription. At the same time the DHH community reports serious issues with the accuracy and reliability of ASR. There seems to be a mismatch between technical innovations and the real-life experience for people who depend on transcription. Independent and comprehensive data is needed to capture the state of ASR. We measured the performance of eleven common ASR services with recordings of Higher Education lectures. We evaluated the influence of technical conditions like streaming, the use of vocabularies, and differences between languages. Our results show that accuracy ranges widely between vendors and for the individual audio samples. We also measured a significant lower quality for streaming ASR, which is used for live events. Our study shows that despite the recent improvements of ASR, common services lack reliability in accuracy.

arxiv情報

著者	Korbinian Kuhn,Verena Kersken,Benedikt Reuter,Niklas Egger,Gottfried Zimmermann
発行日	2024-08-29 06:38:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Measuring the Accuracy of Automatic Speech Recognition Solutions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー