「eess.AS」カテゴリーアーカイブ

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

投稿日: 2023年8月17日作成者: jarxiv

要約ホットワードのカスタマイズは、ASR 分野に残された重要な問題の 1 つで … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023

投稿日: 2023年8月17日作成者: jarxiv

要約この技術レポートでは、VoxCeleb2023 Speaker Recog … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

投稿日: 2023年8月17日作成者: jarxiv

要約このペーパーでは、コンテキストエンコーダーの潜在空間からハードネガティ … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

投稿日: 2023年8月17日作成者: jarxiv

要約私たちは、文字起こしされた音声データ、テキストのみのデータ、またはその両方 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

投稿日: 2023年8月17日作成者: jarxiv

要約 Text-to-Text Transfer Transformer (T5 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

投稿日: 2023年8月17日作成者: jarxiv

要約この論文では、多言語音素認識装置 Allophant を提案します。ター … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS, I.2.7 | コメントを受け付けていません

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

投稿日: 2023年8月16日作成者: jarxiv

要約自動音声認識 (ASR) のためのテキストインジェクションは、ペアになっ … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

O-1: Self-training with Oracle and 1-best Hypothesis

投稿日: 2023年8月16日作成者: jarxiv

要約トレーニングのバイアスを軽減し、音声認識のトレーニングと評価の指標を統合す … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

投稿日: 2023年8月16日作成者: jarxiv

要約追加のコンテキスト情報を組み込むことにより、ディープバイアス手法が、パー … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

投稿日: 2023年8月16日作成者: jarxiv

要約最近の研究では、視覚入力のみから音声を再構成するビデオ音声合成において、目 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

O-1: Self-training with Oracle and 1-best Hypothesis

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

最近の投稿

最近のコメント

アーカイブ

カテゴリー