The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

要約

エンドツーエンドの音声言語理解 (SLU) は、テキストと音声に関する現在の大規模な事前トレーニング済み言語モデルでも、特に多言語の場合には依然としてとらえどころがありません。
機械翻訳は、モデルが入力発話の高レベルのセマンティクスと異なる言語間の関連性をキャプチャできるため、テキストに対する強力な事前トレーニング目標として確立されています。これは、低レベルの音響フレームで機能する音声モデルに望ましいことです。
特にクロスリンガル SLU のタスクを動機として、音声翻訳 (ST) タスクが、単一言語シナリオとクロスリンガルシナリオの両方でエンドツーエンド SLU の音声モデルを事前トレーニングする優れた手段であることを示します。
ST を導入することにより、当社のモデルは、SLURP、MINDS-14、および NMSQA ベンチマークを使用した音声質問応答だけでなく、単言語および多言語の意図分類において、現在のベースラインよりも高いパフォーマンスを提供します。
私たちの手法の有効性を検証するために、音声からの抽象的な要約と、低リソースまたは英語からフランス語へのゼロショット転送のタスク用に、合成ソースと実際のソースの両方から 2 つの新しいベンチマークデータセットもリリースしました。
さらに、事前トレーニングタスクからの知識を保存することの価値を示し、そのための継続学習正則化子に基づいた事前トレーニング済み音声モデルでのベイジアン転移学習を検討します。

要約(オリジナル)

End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both monolingual and cross-lingual scenarios. By introducing ST, our models give higher performance over current baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also release two new benchmark datasets from both synthetic and real sources, for the tasks of abstractive summarization from speech and low-resource or zero-shot transfer from English to French. We further show the value of preserving knowledge from the pretraining task, and explore Bayesian transfer learning on pretrained speech models based on continual learning regularizers for that.

arxiv情報

著者	Mutian He,Philip N. Garner
発行日	2023-05-16 17:53:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー