「cs.SD」カテゴリーアーカイブ

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

投稿日: 2023年5月30日作成者: jarxiv

要約広く話されていない言語や、トレーニングデータで十分に表現されていないアク … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

投稿日: 2023年5月30日作成者: jarxiv

要約最近の大規模言語モデル (LLM) の巨大なスケールにより、命令ベースおよ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Leveraging characteristics of the output probability distribution for identifying adversarial audio examples

投稿日: 2023年5月29日作成者: jarxiv

要約敵対的攻撃は、機械学習ベースの自動音声認識 (ASR) システムに対するセ … 続きを読む →

カテゴリー: cs.CR, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction

投稿日: 2023年5月29日作成者: jarxiv

要約会話の音声は多くの場合、音声計画からの逸脱で構成され、流暢な発話を生成し、 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

投稿日: 2023年5月29日作成者: jarxiv

要約すべてのコンポーネントを共同で最適化できる直接音声音声変換 (S2ST) … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Detecting the Severity of Major Depressive Disorder from Speech: A Novel HARD-Training Methodology

投稿日: 2023年5月26日作成者: jarxiv

要約大うつ病性障害 (MDD) は、高い社会経済的コストを伴う世界的に一般的な … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS, q-bio.QM | コメントを受け付けていません

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

投稿日: 2023年5月26日作成者: jarxiv

要約音声感情認識 (SER) では、音声信号固有の変動性に対処するために、テキ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

投稿日: 2023年5月26日作成者: jarxiv

要約最近の研究では、さまざまなモダリティのさまざまなタスクにわたって、モデル … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

投稿日: 2023年5月26日作成者: jarxiv

要約エンドツーエンド同時音声翻訳 (SimulST) は、ストリーミング音声入 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

投稿日: 2023年5月26日作成者: jarxiv

要約最近、RNN トランスデューサーはさまざまな自動音声認識タスクで目覚ましい … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

Leveraging characteristics of the output probability distribution for identifying adversarial audio examples

DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Detecting the Severity of Major Depressive Disorder from Speech: A Novel HARD-Training Methodology

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

最近の投稿

最近のコメント

アーカイブ

カテゴリー