Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts

要約

自動音声認識 (ASR) トランスクリプトには、認識エラーや、不一致、文法的でない文章、不完全な文章などのさまざまな音声言語現象が見られるため、可読性が低下します。
可読性を向上させるために、ASR および文法エラーに対処し、また、コンテキストと補助情報を利用して、内容を保持したまま非公式テキストを正式なスタイルに変換する、文脈化された話し言葉から書き込みへの変換 (CoS2W) タスクを提案します。
このタスクは、当然ながら、大規模言語モデル (LLM) のコンテキスト内学習機能と一致します。
さまざまな LLM の包括的な比較を容易にするために、ASR トランスクリプトベンチマーク (SWAB) データセットのドキュメントレベルの音声から書き込みへの変換を構築します。
SWAB を使用して、CoS2W パフォーマンスに対するさまざまな粒度レベルの影響を研究し、コンテキストと補助情報を活用して出力を強化する方法を提案します。
実験結果は、LLM が CoS2W タスク、特に文法性と形式性において優れている可能性があることを明らかにしており、私たちの方法は LLM による文脈と補助情報の効果的な理解を達成します。
我々は、LLM を評価者として使用する有効性をさらに調査し、LLM 評価者が忠実性と形式性のランキングに関して人間の評価と強い相関関係を示すことを発見しました。これは、CoS2W タスクに対する LLM 評価者の信頼性を検証します。

要約(オリジナル)

Automatic Speech Recognition (ASR) transcripts exhibit recognition errors and various spoken language phenomena such as disfluencies, ungrammatical sentences, and incomplete sentences, hence suffering from poor readability. To improve readability, we propose a Contextualized Spoken-to-Written conversion (CoS2W) task to address ASR and grammar errors and also transfer the informal text into the formal style with content preserved, utilizing contexts and auxiliary information. This task naturally matches the in-context learning capabilities of Large Language Models (LLMs). To facilitate comprehensive comparisons of various LLMs, we construct a document-level Spoken-to-Written conversion of ASR Transcripts Benchmark (SWAB) dataset. Using SWAB, we study the impact of different granularity levels on the CoS2W performance, and propose methods to exploit contexts and auxiliary information to enhance the outputs. Experimental results reveal that LLMs have the potential to excel in the CoS2W task, particularly in grammaticality and formality, our methods achieve effective understanding of contexts and auxiliary information by LLMs. We further investigate the effectiveness of using LLMs as evaluators and find that LLM evaluators show strong correlations with human evaluations on rankings of faithfulness and formality, which validates the reliability of LLM evaluators for the CoS2W task.

arxiv情報

著者	Jiaqing Liu,Chong Deng,Qinglin Zhang,Shilin Zhou,Qian Chen,Hai Yu,Wen Wang
発行日	2025-01-24 07:10:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー