Multilingual context-based pronunciation learning for Text-to-Speech

要約

音声情報と言語知識は、音声合成 (TTS) フロントエンドの重要なコンポーネントです。
言語が指定されている場合、辞書はオフラインで収集でき、通常、語彙素と音素 (G2P) の関係は、語彙外 (OOV) 単語の発音を予測するためにモデル化されます。
さらに、規則ベースのシステムの形で定義されることが多い語彙後音韻論は、単語内または単語間の発音を修正するために使用されます。
この研究では、通常は別個のモジュールによって処理される、発音関連のタスクに対処する多言語統合フロントエンドシステムを紹介します。
私たちは、G2P 変換や、同形異義語やポリフォンの曖昧さ回避、語彙後のルール、暗黙的な発音区別符号化などのその他の言語固有の課題に関して、提案されたモデルを評価します。
多言語モデルは言語やタスク間で競争力がありますが、同等の単言語ソリューションと比較すると、いくつかのトレードオフが存在することがわかりました。

要約(オリジナル)

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to correct pronunciation within or between words. In this work we showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules. We evaluate the proposed model on G2P conversion and other language-specific challenges, such as homograph and polyphones disambiguation, post-lexical rules and implicit diacritization. We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when compared to equivalent monolingual solutions.

arxiv情報

著者	Giulia Comini,Manuel Sam Ribeiro,Fan Yang,Heereen Shim,Jaime Lorenzo-Trueba
発行日	2023-07-31 14:29:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multilingual context-based pronunciation learning for Text-to-Speech

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー