Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

要約

この研究では、特定のテキストプロンプトに基づいてテキストドメイン情報の生成を条件付けることにより、テキストドメイン情報を利用する、ドメインに依存する音声認識モデルを作成する方法を提案します。
これは、事前にトレーニングされたエンドツーエンドのモデル (Whisper) を微調整して、プロンプトの例を含むデモンストレーションから学習することで実現されます。
私たちは、この機能がさまざまなドメインやさまざまなプロンプトコンテキストにまで一般化できることを示し、このモデルでは、医療上の会話や航空管制通信など、さまざまなドメインからの目に見えないデータセットでワードエラーレート (WER) が最大 33% 削減されました。
、財務会議など。
オーディオとトランスクリプトのペアデータの利用可能性が限られていることを考慮して、この方法をテキストのみの微調整にさらに拡張して、ドメインの感度とドメインの適応を実現します。
テキストのみの微調整モデルがさまざまなプロンプトコンテキストにも対応できることを実証し、このモデルは医療会話データセットで 29% という最大の WER 削減に達しました。

要約(オリジナル)

In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.

arxiv情報

著者	Feng-Ting Liao,Yung-Chieh Chan,Yi-Chang Chen,Chan-Jan Hsu,Da-shan Shiu
発行日	2023-10-06 03:41:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー