Chain of Correction for Full-text Speech Recognition with Large Language Models

要約

自動音声認識（ASR）のための大規模な言語モデル（LLMS）とのフルテキストエラー補正は、長いコンテキストでエラーを修正し、句読点の回復や逆テキスト正規化など、より広範なエラータイプに対処する可能性があるため、注目を集めています。
それにもかかわらず、安定性、制御可能性、完全性、流encyに関連する問題など、多くの課題が続きます。
これらの課題を軽減するために、このペーパーでは、LLMSとのフルテキストエラー補正のための補正チェーン（COC）を提案します。これは、通常のマルチターンチャット形式のガイダンスとして事前認識テキストを使用してセグメントごとにエラーセグメントを修正します。
また、COCはコンテキストに事前に認識された全文を使用して、モデルがグローバルなセマンティクスをよりよく把握し、コンテンツ全体の包括的な概要を維持できるようにします。
オープンソースのフルテキストエラー修正データセットCHFTを利用して、事前に訓練されたLLMを微調整して、COCフレームワークのパフォーマンスを評価します。
実験結果は、COCがフルテキストASR出力のエラーを効果的に修正し、ベースラインおよびベンチマークシステムを大幅に上回ることを示しています。
さらに、補正のしきい値を設定して、過補正と過剰補給のバランスをとり、COCモデルを非常に長いASR出力で外挿し、他の種類の情報を使用してエラー修正プロセスを導くことができるかどうかを調査する方法を分析します。

要約(オリジナル)

Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) has gained increased attention due to its potential to correct errors across long contexts and address a broader spectrum of error types, including punctuation restoration and inverse text normalization. Nevertheless, many challenges persist, including issues related to stability, controllability, completeness, and fluency. To mitigate these challenges, this paper proposes the Chain of Correction (CoC) for full-text error correction with LLMs, which corrects errors segment by segment using pre-recognized text as guidance within a regular multi-turn chat format. The CoC also uses pre-recognized full text for context, allowing the model to better grasp global semantics and maintain a comprehensive overview of the entire content. Utilizing the open-sourced full-text error correction dataset ChFT, we fine-tune a pre-trained LLM to evaluate the performance of the CoC framework. Experimental results demonstrate that the CoC effectively corrects errors in full-text ASR outputs, significantly outperforming baseline and benchmark systems. We further analyze how to set the correction threshold to balance under-correction and over-rephrasing, extrapolate the CoC model on extremely long ASR outputs, and investigate whether other types of information can be employed to guide the error correction process.

arxiv情報

著者	Zhiyuan Tang,Dong Wang,Zhikai Zhou,Yong Liu,Shen Huang,Shidong Shang
発行日	2025-04-02 09:06:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Chain of Correction for Full-text Speech Recognition with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー