EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

要約

韻律強調をエンコードして再現する音声合成モデルの機能を評価するために設計された韻律ベンチマークである EmphAssess を紹介します。
これを音声再合成と音声音声翻訳という 2 つのタスクに適用します。
どちらの場合も、ベンチマークは、場合によっては話者や言語が変わっても、音声入力の強調をエンコードし、それを出力で正確に再現するモデルの能力を評価します。
評価パイプラインの一部として、フレームまたは単語レベルで強調を分類する新しいモデルである EmphaClass を導入します。

要約(オリジナル)

We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.

arxiv情報

著者	Maureen de Seyssel,Antony D’Avirro,Adina Williams,Emmanuel Dupoux
発行日	2023-12-21 17:47:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー