「eess.AS」カテゴリーアーカイブ

An analysis on the effects of speaker embedding choice in non auto-regressive TTS

投稿日: 2023年7月20日作成者: jarxiv

要約この論文では、非自己回帰因数分解マルチ話者音声合成アーキテクチャが、さまざ … 続きを読む →

カテゴリー: cs.AI, eess.AS | コメントを受け付けていません

Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization

投稿日: 2023年7月20日作成者: jarxiv

要約オーディオビジュアルイベントローカライゼーション (AVEL) は、 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

投稿日: 2023年7月19日作成者: jarxiv

要約近年、大規模な事前トレーニング済み音声言語モデル (SLM) により、テキ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

FlexiAST: Flexibility is What AST Needs

投稿日: 2023年7月19日作成者: jarxiv

要約この作業の目的は、オーディオスペクトログラムトランスフォーマー (AS … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Model Adaptation for ASR in low-resource Indian Languages

投稿日: 2023年7月18日作成者: jarxiv

要約自動音声認識 (ASR) のパフォーマンスは、主に wav2vec2 など … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

BASS: Block-wise Adaptation for Speech Summarization

投稿日: 2023年7月18日作成者: jarxiv

要約エンドツーエンドの音声要約は、カスケードベースラインよりもパフォーマンス … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Multilingual Speech-to-Speech Translation into Multiple Target Languages

投稿日: 2023年7月18日作成者: jarxiv

要約 Speech-to-Speech Translation (S2ST) に … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Semi-supervised cross-lingual speech emotion recognition

投稿日: 2023年7月18日作成者: jarxiv

要約単一言語での音声感情認識 (SER) のパフォーマンスは、深層学習技術の使 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

投稿日: 2023年7月17日作成者: jarxiv

要約私たちは、音声認識 (ASR) で事前トレーニングされたエンコーダを使用し … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

投稿日: 2023年7月17日作成者: jarxiv

要約 CHiME の課題は、堅牢な自動音声認識 (ASR) システムの開発と評価 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

An analysis on the effects of speaker embedding choice in non auto-regressive TTS

Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

FlexiAST: Flexibility is What AST Needs

Model Adaptation for ASR in low-resource Indian Languages

BASS: Block-wise Adaptation for Speech Summarization

Multilingual Speech-to-Speech Translation into Multiple Target Languages

Semi-supervised cross-lingual speech emotion recognition

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

最近の投稿

最近のコメント

アーカイブ

カテゴリー