「eess.AS」カテゴリーアーカイブ

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

投稿日: 2023年5月8日作成者: jarxiv

要約タイトル：音声からテキストへのタスクのためのハイブリッドトランスデューサー … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

投稿日: 2023年5月5日作成者: jarxiv

要約タイトル：MedleyVox：複数の歌声分離の評価データセット要約： & … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

The language of sounds unheard: Exploring musical timbre semantics of large language models

投稿日: 2023年5月5日作成者: jarxiv

要約タイトル: 聞こえない音の言語：大規模言語モデルの音楽音色セマンティックス … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

投稿日: 2023年5月5日作成者: jarxiv

要約タイトル：Joint CTC lossと自己教師あり事前学習音声エンコーダ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

投稿日: 2023年5月5日作成者: jarxiv

要約タイトル：NaturalSpeech 2：潜在的拡散モデルは自然で、ゼロシ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Unsupervised Improvement of Audio-Text Cross-Modal Representations

投稿日: 2023年5月4日作成者: jarxiv

要約タイトル：オーディオ-テキストのクロスモーダル表現の自己学習改善要約： … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

投稿日: 2023年5月4日作成者: jarxiv

要約タイトル：End-to-Endノイズロバスト音声認識におけるマルチタスク学 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

投稿日: 2023年5月4日作成者: jarxiv

要約タイトル：エンドツーエンド音声モデルは話者、言語、チャネル情報について何を … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Egocentric Audio-Visual Noise Suppression

投稿日: 2023年5月4日作成者: jarxiv

要約【タイトル】エゴセントリックなオーディオ・ビジュアルノイズサプレッション … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research

投稿日: 2023年5月4日作成者: jarxiv

要約タイトル: 音質の影響が自然なインファント指向性音声研究における長時間録音 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

The language of sounds unheard: Exploring musical timbre semantics of large language models

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Unsupervised Improvement of Audio-Text Cross-Modal Representations

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

Egocentric Audio-Visual Noise Suppression

Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research

最近の投稿

最近のコメント

アーカイブ

カテゴリー