「eess.AS」カテゴリーアーカイブ

Exploring Meta Information for Audio-based Zero-shot Bird Classification

投稿日: 2023年9月18日作成者: jarxiv

要約受動的音響モニタリングと機械学習の進歩により、計算による生体音響研究のため … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

投稿日: 2023年9月18日作成者: jarxiv

要約標準的な話者ダイアリゼーションは「誰がいつ話したか」という質問に答えようと … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS, stat.ML | コメントを受け付けていません

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

投稿日: 2023年9月18日作成者: jarxiv

要約この研究では、上流の音声変換 (VC) モデルと下流の Text-To-S … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

DiaCorrect: Error Correction Back-end For Speaker Diarization

投稿日: 2023年9月18日作成者: jarxiv

要約この研究では、シンプルかつ効果的な方法でダイアライゼーションシステムの出 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

投稿日: 2023年9月18日作成者: jarxiv

要約自動音声認識 (ASR) の現実のアプリケーションの多くは、重複した音声の … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Augmenting conformers with structured state space models for online speech recognition

投稿日: 2023年9月18日作成者: jarxiv

要約モデルが左側のコンテキストにのみアクセスするオンライン音声認識は、ASR … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

投稿日: 2023年9月18日作成者: jarxiv

要約深層音声合成モデルの急速な進歩は、悪意のあるコンテンツ操作などの重大な脅威 … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Text-Driven Foley Sound Generation With Latent Diffusion Model

投稿日: 2023年9月18日作成者: jarxiv

要約フォーリーサウンド生成は、マルチメディアコンテンツの背景サウンドを合成 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

投稿日: 2023年9月18日作成者: jarxiv

要約この論文では、強力かつ効率的な Image-to-Speech キャプショ … 続きを読む →

カテゴリー: cs.CL, cs.CV, eess.AS, eess.IV | コメントを受け付けていません

Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model

投稿日: 2023年9月18日作成者: jarxiv

要約この論文では、複数の言語、特にラベル付きデータの数が限られている低リソース … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Exploring Meta Information for Audio-based Zero-shot Bird Classification

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

DiaCorrect: Error Correction Back-end For Speaker Diarization

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Augmenting conformers with structured state space models for online speech recognition

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

Text-Driven Foley Sound Generation With Latent Diffusion Model

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model

最近の投稿

最近のコメント

アーカイブ

カテゴリー