MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

要約

MuAViC は、9 つの言語で 1200 時間のオーディオビジュアルスピーチを提供する、堅牢な音声認識と堅牢な音声からテキストへの翻訳のための多言語オーディオビジュアルコーパスです。
それは完全に転写されており、英語から X への 6 つの翻訳と X から英語への 6 つの翻訳方向をカバーしています。
私たちの知る限りでは、これはオーディオビジュアル音声からテキストへの翻訳の最初のオープンベンチマークであり、多言語オーディオビジュアル音声認識の最大のオープンベンチマークです。
ベースラインの結果は、MuAViC がノイズに強い音声認識および翻訳モデルの構築に効果的であることを示しています。
https://github.com/facebookresearch/muavic でコーパスを利用できるようにします。

要約(オリジナル)

We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages. It is fully transcribed and covers 6 English-to-X translation as well as 6 X-to-English translation directions. To the best of our knowledge, this is the first open benchmark for audio-visual speech-to-text translation and the largest open benchmark for multilingual audio-visual speech recognition. Our baseline results show that MuAViC is effective for building noise-robust speech recognition and translation models. We make the corpus available at https://github.com/facebookresearch/muavic.

arxiv情報

著者	Mohamed Anwar,Bowen Shi,Vedanuj Goswami,Wei-Ning Hsu,Juan Pino,Changhan Wang
発行日	2023-03-07 16:41:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー