「eess.AS」カテゴリーアーカイブ

Detecting Syllable-Level Pronunciation Stress with A Self-Attention Model

投稿日: 2023年11月2日作成者: jarxiv

要約効果的な口頭コミュニケーションの前提条件の 1 つは、特に非母語話者にとっ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

投稿日: 2023年11月2日作成者: jarxiv

要約事前トレーニングされた音声認識モデルのサイズが大きくなるにつれて、これらの … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

投稿日: 2023年11月2日作成者: jarxiv

要約エンドツーエンドの音声翻訳は、利用可能なデータリソースの不足によって妨げ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

投稿日: 2023年11月2日作成者: jarxiv

要約従来の音声からテキストへの翻訳 (ST) システムは、単一話者の発話に基づ … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Disentangling Voice and Content with Self-Supervision for Speaker Recognition

投稿日: 2023年11月2日作成者: jarxiv

要約話者認識の場合、話者の特徴と内容が混在しているため、音声から正確な話者 … 続きを読む →

カテゴリー: cs.AI, eess.AS | コメントを受け付けていません

Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features

投稿日: 2023年11月2日作成者: jarxiv

要約ディープニューラルネットワークは、自動話者認識および関連タスクにおいて … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

投稿日: 2023年11月1日作成者: jarxiv

要約既存の機械学習研究は、モノラル視聴覚分離 (MAVS) において有望な結果 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

投稿日: 2023年10月31日作成者: jarxiv

要約ノイズ除去拡散確率モデル (DDPM) は、音声合成において有望なパフォー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Exploring the Emotional Landscape of Music: An Analysis of Valence Trends and Genre Variations in Spotify Music Data

投稿日: 2023年10月31日作成者: jarxiv

要約この論文では、Spotify の音楽データを使用して、Spotify AP … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

投稿日: 2023年10月31日作成者: jarxiv

要約このレポートでは、Ego4D Challenge 2022 のオーディオビ … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Detecting Syllable-Level Pronunciation Stress with A Self-Attention Model

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Disentangling Voice and Content with Self-Supervision for Speaker Recognition

Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Exploring the Emotional Landscape of Music: An Analysis of Valence Trends and Genre Variations in Spotify Music Data

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

最近の投稿

最近のコメント

アーカイブ

カテゴリー