「cs.SD」カテゴリーアーカイブ

Automated Audio Captioning and Language-Based Audio Retrieval

投稿日: 2023年5月16日作成者: jarxiv

要約このプロジェクトには、(1) 自動音声キャプションと (2) 言語ベースの … 続きを読む →

カテゴリー: cs.CL, cs.IR, cs.SD, eess.AS | コメントを受け付けていません

Understanding and Bridging the Modality Gap for Speech Translation

投稿日: 2023年5月16日作成者: jarxiv

要約 (テキスト) 機械翻訳 (MT) データを活用して、より優れたエンドツーエ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, I.2.7 | コメントを受け付けていません

Back Translation for Speech-to-text Translation Without Transcripts

投稿日: 2023年5月16日作成者: jarxiv

要約エンドツーエンドの音声からテキストへの翻訳 (ST) の成功は、多くの場合 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, I.2.7 | コメントを受け付けていません

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

投稿日: 2023年5月16日作成者: jarxiv

要約本論文では、Ubenwa CryCelebデータセット（乳児の泣き声のラベ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Benchmarks and leaderboards for sound demixing tasks

投稿日: 2023年5月15日作成者: jarxiv

要約音楽デミックスとは、与えられた単一のオーディオ信号から、ドラム、ベース、ボ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

投稿日: 2023年5月15日作成者: jarxiv

要約音声分類モデルにおいて、様々な録音機器への汎化能力は重要な性能要素である。 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Better speech synthesis through scaling

投稿日: 2023年5月15日作成者: jarxiv

要約近年、画像生成の分野は、自己回帰変換器やDDPMの応用により、革命的な変化 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

投稿日: 2023年5月15日作成者: jarxiv

要約自動音声認識（ASR）システムは、学習させた音声と類似した音声に対して最高 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

投稿日: 2023年5月15日作成者: jarxiv

要約音声翻訳モデルの多くはパラレルデータに大きく依存しており、特に低リソース言 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Streaming Joint Speech Recognition and Disfluency Detection

投稿日: 2023年5月12日作成者: jarxiv

要約失語症検出は、主に音声認識の後処理として、パイプラインアプローチで解決され … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Automated Audio Captioning and Language-Based Audio Retrieval

Understanding and Bridging the Modality Gap for Speech Translation

Back Translation for Speech-to-text Translation Without Transcripts

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

Benchmarks and leaderboards for sound demixing tasks

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

Better speech synthesis through scaling

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

Streaming Joint Speech Recognition and Disfluency Detection

最近の投稿

最近のコメント

アーカイブ

カテゴリー