「eess.AS」カテゴリーアーカイブ

Video-Guided Foley Sound Generation with Multimodal Controls

投稿日: 2025年3月18日作成者: jarxiv

要約ビデオのサウンドエフェクトを生成するには、多くの場合、実生活のソースとサウ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Are Deep Speech Denoising Models Robust to Adversarial Noise?

投稿日: 2025年3月17日作成者: jarxiv

要約ディープノイズ抑制（DNS）モデルは、さまざまなハイステークス音声アプリケ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

投稿日: 2025年3月17日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、幅広いドメインで並外れたパフォー … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

投稿日: 2025年3月17日作成者: jarxiv

要約目的：公開されているSaarbr \ ‘ucken Voice … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Designing Neural Synthesizers for Low Latency Interaction

投稿日: 2025年3月17日作成者: jarxiv

要約ニューラルオーディオ合成（NAS）モデルは、高品質で表現力のあるオーディオ … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings

投稿日: 2025年3月14日作成者: jarxiv

要約多言語設定でのスピーカーの識別は、特に従来のモデルが主に英語のデータでトレ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS, I.2 | コメントを受け付けていません

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

投稿日: 2025年3月14日作成者: jarxiv

要約 LinformerやMambaなどのアーキテクチャは、最近、変圧器の競合的 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

AudioX: Diffusion Transformer for Anything-to-Audio Generation

投稿日: 2025年3月14日作成者: jarxiv

要約オーディオと音楽の生成は、多くのアプリケーションで重要なタスクとして浮上し … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model

投稿日: 2025年3月13日作成者: jarxiv

要約マルチモーダルの基礎モデルをトレーニングするためのオーディオとビジュアルデ … 続きを読む →

カテゴリー: 68T, 68T10, 68T45, cs.CL, cs.IR, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

MAD Speech: Measures of Acoustic Diversity of Speech

投稿日: 2025年3月12日作成者: jarxiv

要約生成された音声言語モデルは、幅広い声、韻律、記録条件で音声を生み出し、自然 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Video-Guided Foley Sound Generation with Multimodal Controls

Are Deep Speech Denoising Models Robust to Adversarial Noise?

Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

Designing Neural Synthesizers for Low Latency Interaction

Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model

MAD Speech: Measures of Acoustic Diversity of Speech

最近の投稿

最近のコメント

アーカイブ

カテゴリー