Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

要約

自動音声認識 (ASR) の結果は下流タスクへの入力として機能し、エンドユーザーの満足度に大きな影響を与えます。
したがって、ASR モデルに存在する脆弱性の診断と強化は非常に重要です。
ただし、ASR システムの従来の評価方法では、単一の複合的な定量的指標が生成されるため、特定の脆弱性に対する包括的な洞察を提供できません。
この詳細の欠如は後処理段階にまで及び、潜在的な弱点がさらにわかりにくくなります。
ASR モデルは発話を正確に認識する能力があるにもかかわらず、可読性が標準以下であるとユーザーの満足度に悪影響を与える可能性があり、認識精度と使いやすさの間にトレードオフが生じます。
これに効果的に対処するには、認識精度にとって重要な音声レベルと、使いやすさにとって重要なテキストレベルの両方を考慮することが不可欠です。
したがって、我々は、Error Explainable Benchmark (EEB) データセットの開発を提案します。
このデータセットは、音声レベルとテキストレベルの両方を考慮しながら、モデルの欠点を詳細に理解することを可能にします。
私たちの提案は、より「現実世界中心の」評価のための構造化された経路を提供し、抽象化された従来の方法からの顕著な移行を可能にし、微妙なシステムの弱点の検出と修正を可能にし、最終的にユーザーエクスペリエンスの向上を目指します。

要約(オリジナル)

Automatic speech recognition (ASR) outcomes serve as input for downstream tasks, substantially impacting the satisfaction level of end-users. Hence, the diagnosis and enhancement of the vulnerabilities present in the ASR model bear significant importance. However, traditional evaluation methodologies of ASR systems generate a singular, composite quantitative metric, which fails to provide comprehensive insight into specific vulnerabilities. This lack of detail extends to the post-processing stage, resulting in further obfuscation of potential weaknesses. Despite an ASR model’s ability to recognize utterances accurately, subpar readability can negatively affect user satisfaction, giving rise to a trade-off between recognition accuracy and user-friendliness. To effectively address this, it is imperative to consider both the speech-level, crucial for recognition accuracy, and the text-level, critical for user-friendliness. Consequently, we propose the development of an Error Explainable Benchmark (EEB) dataset. This dataset, while considering both speech- and text-level, enables a granular understanding of the model’s shortcomings. Our proposition provides a structured pathway for a more `real-world-centric’ evaluation, a marked shift away from abstracted, traditional methods, allowing for the detection and rectification of nuanced system weaknesses, ultimately aiming for an improved user experience.

arxiv情報

著者	Seonmin Koo,Chanjun Park,Jinsung Kim,Jaehyung Seo,Sugyeong Eo,Hyeonseok Moon,Heuiseok Lim
発行日	2024-01-26 03:42:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー