「cs.SD」カテゴリーアーカイブ

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

投稿日: 2023年6月6日作成者: jarxiv

要約ディープスピーチエンハンスメントの分野は、その誕生以来、スペクトルマッピ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Multiple output samples for each input in a single-output Gaussian process

投稿日: 2023年6月6日作成者: jarxiv

要約標準のガウスプロセス (GP) では、トレーニングセット内の入力ごとに … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Pre-training for Speech Translation: CTC Meets Optimal Transport

投稿日: 2023年6月6日作成者: jarxiv

要約音声とテキストのモダリティ間のギャップは、音声からテキストへの翻訳 (ST … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

投稿日: 2023年6月6日作成者: jarxiv

要約最近開発された多言語の弱教師モデルである Whisper は、単言語設定と … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

投稿日: 2023年6月6日作成者: jarxiv

要約 Video-LLaMAは、Large Language Models（LL … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Task-Agnostic Structured Pruning of Speech Representation Models

投稿日: 2023年6月5日作成者: jarxiv

要約 Wav2vec2、Hubert、WavLMなどの自己教師付き事前学習モデル … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

投稿日: 2023年6月5日作成者: jarxiv

要約 FastSpeech 2をベースとした最新の非自己回帰的音声合成モデルによ … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

投稿日: 2023年6月5日作成者: jarxiv

要約音声言語理解（SLU）では、テキスト情報がないため、音声信号から直接意味を … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Speaker-specific Thresholding for Robust Imposter Identification in Unseen Speaker Recognition

投稿日: 2023年6月2日作成者: jarxiv

要約話者識別システムは、トレーニングやテストが行われる実験室の条件とは異な … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

投稿日: 2023年6月2日作成者: jarxiv

要約統合されたストリーミングおよび非ストリーミング音声認識モデルは、その包括的 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Multiple output samples for each input in a single-output Gaussian process

Pre-training for Speech Translation: CTC Meets Optimal Transport

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Task-Agnostic Structured Pruning of Speech Representation Models

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

Speaker-specific Thresholding for Robust Imposter Identification in Unseen Speaker Recognition

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

最近の投稿

最近のコメント

アーカイブ

カテゴリー