MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

要約

医療分野における多言語自動音声認識 (ASR) は、音声翻訳、音声言語理解、音声起動アシスタントなどのさまざまな下流アプリケーションの基礎タスクとして機能します。
このテクノロジーは、言語の壁を越えた効率的なコミュニケーションを可能にし、専門人材の不足を緩和し、特にパンデミック時に診断と治療の改善を促進することにより、患者ケアを強化します。
この研究では、初の多言語医療 ASR データセットである MultiMed と、ベトナム語、英語、ドイツ語、フランス語、北京語の 5 つの言語にわたる小規模から大規模のエンドツーエンドの医療 ASR モデルの最初のコレクションを紹介します。
。
私たちの知る限り、MultiMed は、総再生時間、録音条件の数、アクセントの数、発話の役割の数など、すべての主要なベンチマークにわたって世界最大の医療 ASR データセットです。
さらに、再現可能な経験ベースライン、単一言語-多言語分析、アテンションエンコーダーデコーダー (AED) とハイブリッドの比較研究、AED の層ごとのアブレーション研究、および AED の言語分析を含む、医療 ASR に関する最初の多言語研究を紹介します。
多言語医療ASR。
すべてのコード、データ、モデルはオンラインで入手できます: https://github.com/leduckhai/MultiMed/tree/master/MultiMed

要約(オリジナル)

Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as the world’s largest medical ASR dataset across all major benchmarks: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study, a layer-wise ablation study for the AED, and a linguistic analysis for multilingual medical ASR. All code, data, and models are available online: https://github.com/leduckhai/MultiMed/tree/master/MultiMed

arxiv情報

著者	Khai Le-Duc,Phuc Phan,Tan-Hanh Pham,Bach Phan Tat,Minh-Huong Ngo,Truong-Son Hy
発行日	2025-01-09 10:50:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー