「cs.SD」カテゴリーアーカイブ

Sounding that Object: Interactive Object-Aware Image to Audio Generation

投稿日: 2025年6月5日作成者: jarxiv

要約複雑なオーディオビジュアルシーンに対して正確なサウンドを生成することは、特 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

投稿日: 2025年6月4日作成者: jarxiv

要約教師ありまたは教師ありで事前に学習された音声基礎モデル（SFM）を用いた多 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models

投稿日: 2025年6月4日作成者: jarxiv

要約本論文では、TalkingMachinesを紹介する。TalkingMac … 続きを読む →

カテゴリー: cs.AI, cs.GR, cs.SD | コメントを受け付けていません

Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling

投稿日: 2025年6月3日作成者: jarxiv

要約自閉症スペクトラム障害（ASD）は、社会的コミュニケーション、反復行動、お … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

SpeechT: Findings of the First Mentorship in Speech Translation

投稿日: 2025年6月3日作成者: jarxiv

要約この作品は、2024年12月と2025年1月に開催されたスピーチ翻訳の最初 … 続きを読む →

カテゴリー: cs.CL, cs.SD | コメントを受け付けていません

Bemba Speech Translation: Exploring a Low-Resource African Language

投稿日: 2025年6月3日作成者: jarxiv

要約このホワイトペーパーでは、スポークン言語翻訳に関する国際会議（IWSLT … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Efficient Speech Translation through Model Compression and Knowledge Distillation

投稿日: 2025年6月3日作成者: jarxiv

要約音声翻訳のための大規模なオーディオ言語モデルの効率的な展開は、重要な計算要 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation

投稿日: 2025年6月3日作成者: jarxiv

要約テキストまたはビデオで条件付けられた現在のオーディオ生成は、オーディオをテ … 続きを読む →

カテゴリー: cs.CV, cs.SD | コメントを受け付けていません

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification

投稿日: 2025年6月2日作成者: jarxiv

要約アラビア語の方言識別（ADI）システムは、アラビア語の品種のための包括的な … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

投稿日: 2025年6月2日作成者: jarxiv

要約サブグループの格差とパフォーマンスバイアスは計算研究でますます研究されてい … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Sounding that Object: Interactive Object-Aware Image to Audio Generation

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models

Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling

SpeechT: Findings of the First Mentorship in Speech Translation

Bemba Speech Translation: Exploring a Low-Resource African Language

Efficient Speech Translation through Model Compression and Knowledge Distillation

ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

最近の投稿

最近のコメント

アーカイブ

カテゴリー