「eess.AS」カテゴリーアーカイブ

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification

投稿日: 2025年6月2日作成者: jarxiv

要約アラビア語の方言識別（ADI）システムは、アラビア語の品種のための包括的な … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

‘Dyadosyncrasy’, Idiosyncrasy and Demographic Factors in Turn-Taking

投稿日: 2025年6月2日作成者: jarxiv

要約対話のターンテイクは、普遍的な制約に従いますが、大きく異なります。この研 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

投稿日: 2025年6月2日作成者: jarxiv

要約サブグループの格差とパフォーマンスバイアスは計算研究でますます研究されてい … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

投稿日: 2025年6月2日作成者: jarxiv

要約このペーパーでは、静かな話の顔のビデオから直接自然でわかりやすいスピーチを … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Automatic classification of stop realisation with wav2vec2.0

投稿日: 2025年6月2日作成者: jarxiv

要約現代の音声研究は、音声データの注釈のために自動ツールを定期的に使用していま … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

投稿日: 2025年5月30日作成者: jarxiv

要約聴診、特にハートサウンドは、重要な兆候情報を提供する非侵襲的な手法です。 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

投稿日: 2025年5月30日作成者: jarxiv

要約基礎モデルと大規模な言語モデル（LLMS）の急速な進歩は、ミトリモーダル入 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Effective Context in Neural Speech Models

投稿日: 2025年5月29日作成者: jarxiv

要約現代のニューラル音声モデルは、より長いコンテキストを持つことから恩恵を受け … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

投稿日: 2025年5月29日作成者: jarxiv

要約このペーパーでは、モデルの剪定とパラメーターの更新を単一の段階にしっかりと … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

投稿日: 2025年5月28日作成者: jarxiv

要約音声ベースの相互作用モデルの必要性が高まっているため、エンドツーエンドの音 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification

‘Dyadosyncrasy’, Idiosyncrasy and Demographic Factors in Turn-Taking

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Automatic classification of stop realisation with wav2vec2.0

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

Effective Context in Neural Speech Models

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー