「cs.SD」カテゴリーアーカイブ

SQuId: Measuring Speech Naturalness in Many Languages

投稿日: 2023年6月2日作成者: jarxiv

要約テキスト読み上げの研究の多くは人間による評価に依存しているため、多大なコス … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

投稿日: 2023年6月2日作成者: jarxiv

要約追加のコンテキスト情報を組み込むことにより、ディープバイアス手法が、パー … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

投稿日: 2023年6月2日作成者: jarxiv

要約この論文では、さまざまな音声逆タスクを解決できる拡散確率モデル UnDif … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

投稿日: 2023年6月2日作成者: jarxiv

要約ストリーミングモデルは、リアルタイム音声強調ツールの重要なコンポーネント … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]

投稿日: 2023年6月2日作成者: jarxiv

要約ユーザーがビデオデータセットに対してドメイン固有のモデルを構築できるよう … 続きを読む →

カテゴリー: cs.CV, cs.DB, cs.SD, eess.AS | コメントを受け付けていません

UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

投稿日: 2023年6月1日作成者: jarxiv

要約複数のスピーカーが同時に存在する残響状態では、各マイクは異なる場所にある複 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Text-to-Speech Pipeline for Swiss German — A comparison

投稿日: 2023年6月1日作成者: jarxiv

要約この研究では、さまざまな Text-to-Speech (TTS) モデル … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

投稿日: 2023年6月1日作成者: jarxiv

要約この論文では、トレーニングターゲットがどのように取得されるかということから … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Attention-Based Methods For Audio Question Answering

投稿日: 2023年6月1日作成者: jarxiv

要約音声質問応答 (AQA) は、システムに音声および自然言語の質問が提供され … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models

投稿日: 2023年5月31日作成者: jarxiv

要約主に、暗黙的なセマンティックモデリングにより、自己教師あり学習 (SSL … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

SQuId: Measuring Speech Naturalness in Many Languages

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]

UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

Text-to-Speech Pipeline for Swiss German — A comparison

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

Attention-Based Methods For Audio Question Answering

Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー