Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

要約

自動音声認識 (ASR) のためのテキストインジェクションは、ペアになっていないテキストのみのデータを使用して、ペアになっているオーディオテキストデータを補完するもので、単語誤り率の改善が期待できることが示されています。
この研究では、補助タスク (E2E モデルによって実行されることが多い非 ASR タスク) に対するテキストインジェクションの使用について調査します。
この作業では、テキストインジェクションアルゴリズムとしてエンドツーエンドおよび内部言語モデルの共同トレーニング (JEIT) を使用し、2 つの補助タスクを実行する ASR モデルをトレーニングします。
1 つ目は大文字化であり、非正規化タスクです。
2 つ目はターンテイキング予測です。これは、ユーザーがデジタルアシスタントとの対話で会話のターンを完了したかどうかを識別しようとします。
私たちのテキストインジェクション手法がロングテールデータの大文字化パフォーマンスを向上させ、ターンテイキング検出の再現率を向上させることを実証する結果を示します。

要約(オリジナル)

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. The first is capitalization, which is a de-normalization task. The second is turn-taking prediction, which attempts to identify whether a user has completed their conversation turn in a digital assistant interaction. We show results demonstrating that our text injection method boosts capitalization performance for long-tail data, and improves turn-taking detection recall.

arxiv情報

著者	Shaan Bijwadia,Shuo-yiin Chang,Weiran Wang,Zhong Meng,Hao Zhang,Tara N. Sainath
発行日	2023-08-14 18:28:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー