SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

要約

このペーパーでは、音声ディープファーク検出（SDD）およびスプーフィングの強い自動スピーカー検証（SASV）のために設計されたデータセットであるSpoofceleBを紹介します。
堅牢な認識システムには、さまざまなレベルのノイズを訓練するために、さまざまな音響環境で記録された音声データが必要です。
ただし、現在のデータセットには通常、TTSトレーニングの要件により、クリーンで高品質の録音（真正データ）が含まれます。
通常、TTSモデルをトレーニングするためには、スタジオ品質またはよく記録された読み取り音声が必要です。
現在のSDDデータセットは、スピーカーの多様性が不十分なため、SASVモデルをトレーニングするための有用性も限られています。
SpoofceleBは、VoxceleB1データセットを処理し、TTSトレーニングに適した形式に変換する完全に自動化されたパイプラインを活用します。
その後、23の現代TTSシステムをトレーニングします。
Spoofcelebは、自然な現実世界の条件下で収集された1,251のユニークなスピーカーから250万を超える発言で構成されています。
データセットには、よく制御された実験プロトコルを備えた慎重に分割されたトレーニング、検証、および評価セットが含まれます。
SDDタスクとSASVタスクの両方のベースライン結果を提示します。
すべてのデータ、プロトコル、およびベースラインは、https：//jungjee.github.io/spoofcelebで公開されています。

要約(オリジナル)

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a suitable form for TTS training. We subsequently train 23 contemporary TTS systems. SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions. The dataset includes carefully partitioned training, validation, and evaluation sets with well-controlled experimental protocols. We present the baseline results for both SDD and SASV tasks. All data, protocols, and baselines are publicly available at https://jungjee.github.io/spoofceleb.

arxiv情報

著者	Jee-weon Jung,Yihan Wu,Xin Wang,Ji-Hoon Kim,Soumi Maiti,Yuta Matsunaga,Hye-jin Shim,Jinchuan Tian,Nicholas Evans,Joon Son Chung,Wangyou Zhang,Seyun Um,Shinnosuke Takamichi,Shinji Watanabe
発行日	2025-04-15 17:53:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー