「cs.SD」カテゴリーアーカイブ

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

投稿日: 2024年10月10日作成者: jarxiv

要約音節は、人間の音声の知覚と生成において重要な役割を果たす話し言葉の構成単位 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

投稿日: 2024年10月10日作成者: jarxiv

要約ビデオとオーディオの双方向の条件付き生成に合わせたマルチモーダル拡散モデル … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Presto! Distilling Steps and Layers for Accelerating Music Generation

投稿日: 2024年10月8日作成者: jarxiv

要約拡散ベースのテキスト音楽変換 (TTM) 手法は進歩していますが、効率的で … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Non-Invasive Suicide Risk Prediction Through Speech Analysis

投稿日: 2024年10月8日作成者: jarxiv

要約救急部門での専門的な精神医学的評価と自殺傾向のリスクのある患者へのケアへの … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS, I.2 | コメントを受け付けていません

Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition

投稿日: 2024年10月7日作成者: jarxiv

要約テキストやビデオ入力に基づく音声生成、編集、作曲のためのマルチモーダルフレ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

SonicSense: Object Perception from In-Hand Acoustic Vibration

投稿日: 2024年10月4日作成者: jarxiv

要約 SonicSenseを紹介する。SonicSenseは、ハードウェアとソフ … 続きを読む →

カテゴリー: cs.MM, cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

投稿日: 2024年10月2日作成者: jarxiv

要約げっ歯類は、社会的コミュニケーションのために広範囲の超音波発声 (USV) … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Active Listener: Continuous Generation of Listener’s Head Motion Response in Dyadic Interactions

投稿日: 2024年10月1日作成者: jarxiv

要約二項音声対話の重要な要素は、対話者の発話に対する聞き手の反応を反映する頭の … 続きを読む →

カテゴリー: cs.RO, cs.SD, eess.AS | コメントを受け付けていません

AfriHuBERT: A self-supervised speech representation model for African languages

投稿日: 2024年10月1日作成者: jarxiv

要約この研究では、もともと 147 の言語で事前トレーニングされた、最先端 ( … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Alignment-Free Training for Transducer-based Multi-Talker ASR

投稿日: 2024年10月1日作成者: jarxiv

要約 RNN トランスデューサ (RNNT) を拡張して複数話者の音声を認識する … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

Presto! Distilling Steps and Layers for Accelerating Music Generation

Non-Invasive Suicide Risk Prediction Through Speech Analysis

Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition

SonicSense: Object Perception from In-Hand Acoustic Vibration

Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

Active Listener: Continuous Generation of Listener’s Head Motion Response in Dyadic Interactions

AfriHuBERT: A self-supervised speech representation model for African languages

Alignment-Free Training for Transducer-based Multi-Talker ASR

最近の投稿

最近のコメント

アーカイブ

カテゴリー