「eess.AS」カテゴリーアーカイブ

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

投稿日: 2023年3月15日作成者: jarxiv

要約最近の表現力豊かなテキスト読み上げ (TTS) モデルは、感情的なスピーチ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

投稿日: 2023年3月15日作成者: jarxiv

要約すべてのターゲットトークンを並行して予測するため、非自己回帰モデルは、従 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Improving CTC-based ASR Models with Gated Interlayer Collaboration

投稿日: 2023年3月15日作成者: jarxiv

要約通常、外部言語モデルを使用しない CTC ベースの自動音声認識 (ASR) … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion

投稿日: 2023年3月15日作成者: jarxiv

要約ほとんどの中国語の書記素から音素 (G2P) システムは、最初に入力シーケ … 続きを読む →

カテゴリー: cs.CL, cs.LG, eess.AS | コメントを受け付けていません

Efficient Speech Translation with Dynamic Latent Perceivers

投稿日: 2023年3月15日作成者: jarxiv

要約近年、トランスフォーマーは音声翻訳の主要なアーキテクチャであり、翻訳品質の … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

投稿日: 2023年3月15日作成者: jarxiv

要約自己教師あり学習 (SSL) モデルは、急激な情報崩壊やゆっくりとした次元 … 続きを読む →

カテゴリー: cs.CL, cs.LG, eess.AS | コメントを受け付けていません

Improving Accented Speech Recognition with Multi-Domain Training

投稿日: 2023年3月15日作成者: jarxiv

要約自己教師あり学習の台頭により、自動音声認識 (ASR) システムは現在、さ … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

DECAR: Deep Clustering for learning general-purpose Audio Representations

投稿日: 2023年3月15日作成者: jarxiv

要約汎用の音声表現を学習するための自己教師あり事前トレーニングアプローチであ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis

投稿日: 2023年3月15日作成者: jarxiv

要約音声合成における話者間スタイル転送は、スタイルをソーススピーカーからター … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

A Study on Bias and Fairness In Deep Speaker Recognition

投稿日: 2023年3月15日作成者: jarxiv

要約個人を認証し、サービスをパーソナライズする手段として話者認識 (SR) シ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Improving CTC-based ASR Models with Gated Interlayer Collaboration

Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion

Efficient Speech Translation with Dynamic Latent Perceivers

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

Improving Accented Speech Recognition with Multi-Domain Training

DECAR: Deep Clustering for learning general-purpose Audio Representations

Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis

A Study on Bias and Fairness In Deep Speaker Recognition

最近の投稿

最近のコメント

アーカイブ

カテゴリー