「cs.SD」カテゴリーアーカイブ

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

投稿日: 2024年3月15日作成者: jarxiv

要約マスクされたオートエンコーダー (MAE) は、ラベルのないデータから豊富 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

投稿日: 2024年3月15日作成者: jarxiv

要約マルチラベルの不均衡な分類は、機械学習において重大な課題を引き起こします。 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

More than words: Advancements and challenges in speech recognition for singing

投稿日: 2024年3月15日作成者: jarxiv

要約この論文では、標準の音声認識とは明らかに異なる領域である、歌うための音声認 … 続きを読む →

カテゴリー: cs.CL, cs.IR, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

投稿日: 2024年3月15日作成者: jarxiv

要約この論文では、認知負荷評価 (CLA) 用の AVCAffe データセット … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Non-verbal information in spontaneous speech — towards a new framework of analysis

投稿日: 2024年3月14日作成者: jarxiv

要約音声内の非言語信号は韻律によってエンコードされ、会話の動作から態度、感情に … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations

投稿日: 2024年3月14日作成者: jarxiv

要約音響単語埋め込み (AWE) は、話し言葉のベクトル表現です。 AWE を … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

投稿日: 2024年3月13日作成者: jarxiv

要約拡散モデルは、音声強調に対する予測アプローチと生成アプローチの間のパフォー … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Boosting keyword spotting through on-device learnable user speech characteristics

投稿日: 2024年3月13日作成者: jarxiv

要約常時稼働の TinyML 制約のあるアプリケーション用のキーワードスポッ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

投稿日: 2024年3月13日作成者: jarxiv

要約音響-調音反転 (AAI) は、音声を超音波舌画像 (UTI) データなど … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

投稿日: 2024年3月13日作成者: jarxiv

要約 Whisper は、99 言語をカバーするマルチタスクおよび多言語音声モデ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

More than words: Advancements and challenges in speech recognition for singing

M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Non-verbal information in spontaneous speech — towards a new framework of analysis

Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Boosting keyword spotting through on-device learnable user speech characteristics

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

最近の投稿

最近のコメント

アーカイブ

カテゴリー