「cs.SD」カテゴリーアーカイブ

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

投稿日: 2023年11月21日作成者: jarxiv

要約この論文では、スタイルの拡散と大規模音声言語モデル (SLM) による敵対 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

投稿日: 2023年11月21日作成者: jarxiv

要約 Text-to-Speech (TTS) は、並列 TTS システムの急速 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

投稿日: 2023年11月21日作成者: jarxiv

要約この研究では、多数の話者をモデル化するための新しい方法を提案します。これに … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

投稿日: 2023年11月21日作成者: jarxiv

要約音楽と言語のモデルを評価するために設計された、高品質のオーディオとキャプシ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

投稿日: 2023年11月17日作成者: jarxiv

要約音楽と言語のモデルを評価するために設計された、高品質のオーディオとキャプシ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

投稿日: 2023年11月16日作成者: jarxiv

要約この論文では、話者不変クラスタリング (Spin) を使用して離散音響単位 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Can MusicGen Create Training Data for MIR Tasks?

投稿日: 2023年11月16日作成者: jarxiv

要約私たちは、AI ベースの音楽生成システムを使用して音楽情報検索 (MIR) … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

投稿日: 2023年11月15日作成者: jarxiv

要約この論文では、多様な言語族の 115 以上の言語を網羅する、きめ細かい音素 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

投稿日: 2023年11月15日作成者: jarxiv

要約自動音声認識 (ASR) モデルのパーソナライゼーションは、多くの実用的な … 続きを読む →

カテゴリー: cs.CL, cs.IR, cs.SD, eess.AS | コメントを受け付けていません

Unified Segment-to-Segment Framework for Simultaneous Sequence Generation

投稿日: 2023年11月15日作成者: jarxiv

要約同時シーケンス生成は、ストリーミング音声認識、同時機械翻訳、同時音声翻訳な … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

Can MusicGen Create Training Data for MIR Tasks?

Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

Unified Segment-to-Segment Framework for Simultaneous Sequence Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー