「cs.SD」カテゴリーアーカイブ

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

投稿日: 2025年3月20日作成者: jarxiv

要約最近、強化学習（RL）は、大規模な言語モデル（LLM）の推論能力を大幅に強 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MoonCast: High-Quality Zero-Shot Podcast Generation

投稿日: 2025年3月20日作成者: jarxiv

要約テキスト間合成の最近の進歩は、個々のスピーカーの高品質の短い発言を生み出す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation

投稿日: 2025年3月19日作成者: jarxiv

要約エンドツーエンドの音声翻訳では、エンコーダーによって学んだ音響表現は、通常 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

投稿日: 2025年3月19日作成者: jarxiv

要約スタイル転送とスタイルコントロールを備えたゼロショット歌声合成（SVS）は … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MoonCast: High-Quality Zero-Shot Podcast Generation

投稿日: 2025年3月19日作成者: jarxiv

要約テキスト間合成の最近の進歩は、個々のスピーカーの高品質の短い発言を生み出す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers

投稿日: 2025年3月18日作成者: jarxiv

要約感情は口頭でのコミュニケーションにおいて不可欠な要素であるため、人間とロボ … 続きを読む →

カテゴリー: cs.HC, cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Video-Guided Foley Sound Generation with Multimodal Controls

投稿日: 2025年3月18日作成者: jarxiv

要約ビデオのサウンドエフェクトを生成するには、多くの場合、実生活のソースとサウ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Are Deep Speech Denoising Models Robust to Adversarial Noise?

投稿日: 2025年3月17日作成者: jarxiv

要約ディープノイズ抑制（DNS）モデルは、さまざまなハイステークス音声アプリケ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

投稿日: 2025年3月17日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、幅広いドメインで並外れたパフォー … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

投稿日: 2025年3月17日作成者: jarxiv

要約目的：公開されているSaarbr \ ‘ucken Voice … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

MoonCast: High-Quality Zero-Shot Podcast Generation

AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

MoonCast: High-Quality Zero-Shot Podcast Generation

Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers

Video-Guided Foley Sound Generation with Multimodal Controls

Are Deep Speech Denoising Models Robust to Adversarial Noise?

Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

最近の投稿

最近のコメント

アーカイブ

カテゴリー