「cs.SD」カテゴリーアーカイブ

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

投稿日: 2024年12月24日作成者: jarxiv

要約最近、Linformer や Mamba などのアーキテクチャが、トランス … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

投稿日: 2024年12月24日作成者: jarxiv

要約拡散モデルの最近の進歩により、オーディオ駆動のトーキングヘッド合成に革命 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

RiTTA: Modeling Event Relations in Text-to-Audio Generation

投稿日: 2024年12月23日作成者: jarxiv

要約 Text-to-Audio (TTA) 生成モデルは大幅に進歩し、詳細なコ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling

投稿日: 2024年12月23日作成者: jarxiv

要約会話アシスタントは、現実世界のさまざまなアプリケーションでますます普及して … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis

投稿日: 2024年12月20日作成者: jarxiv

要約韻律には、単語の文字通りの意味を超えた豊富な情報が含まれており、音声の明瞭 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls

投稿日: 2024年12月20日作成者: jarxiv

要約サウンドデザイナーやフォーリーアーティストは通常、ビデオ内の関心の … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation

投稿日: 2024年12月20日作成者: jarxiv

要約声帯の高速ビデオ内視鏡シーケンスから抽出された促進的再生の開発の進歩は … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

投稿日: 2024年12月20日作成者: jarxiv

要約私たちは、一時的に調整されたクロスモーダルコンディショニングのためのフリ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue

投稿日: 2024年12月19日作成者: jarxiv

要約現在の音声対話システム (SDS) は、多くの場合、ユーザーの音声を受信し … 続きを読む →

カテゴリー: cs.HC, cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Certification of Speaker Recognition Models to Additive Perturbations

投稿日: 2024年12月19日作成者: jarxiv

要約話者認識テクノロジーは、パーソナル仮想アシスタントから安全なアクセスシス … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

RiTTA: Modeling Event Relations in Text-to-Audio Generation

Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis

Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls

GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue

Certification of Speaker Recognition Models to Additive Perturbations

最近の投稿

最近のコメント

アーカイブ

カテゴリー