Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

要約

音声言語理解 (SLU) は、音声の分野における構造予測タスクです。
最近、SLU をシーケンス間のタスクとして扱う多くの SLU に関する研究が大きな成功を収めています。
ただし、この方法は音声認識と理解を同時に行うには適していません。
この論文では、音声を正確に書き起こし、構造化コンテンツを同時に抽出できる、スパンに基づくエンドツーエンドの SLU モデルである統合音声認識および構造学習フレームワーク (JSRSL) を提案します。
私たちは中国語のデータセット AISHELL-NER と英語のデータセット SLURP を使用して、名前エンティティの認識と意図分類の実験を行います。
結果は、私たちが提案した方法が、転写能力と抽出能力の両方において従来のシーケンスツーシーケンス方法よりも優れているだけでなく、2 つのデータセットで最先端のパフォーマンスを達成していることを示しています。

要約(オリジナル)

Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets.

arxiv情報

著者	Jiliang Hu,Zuchao Li,Mengjia Shen,Haojun Ai,Sheng Li,Jun Zhang
発行日	2025-01-13 13:43:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー