「cs.SD」カテゴリーアーカイブ

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

投稿日: 2025年6月17日作成者: jarxiv

要約非標識データセットのみでASRパフォーマンスを強化する自己強化フレームワー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

投稿日: 2025年6月17日作成者: jarxiv

要約 GPT-4O様の大型マルチモーダルモデル（LMMS）の出現により、テキスト … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech

投稿日: 2025年6月16日作成者: jarxiv

要約拡散モデルは、高品質で自然な音声サンプルを生成することに大きな成功を収めて … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

投稿日: 2025年6月16日作成者: jarxiv

要約音声トークネイザーは、最近の音声タスクで重要な役割を果たし、一般的に音声シ … 続きを読む →

カテゴリー: 68T10, cs.AI, cs.CL, cs.SD, eess.AS, I.2.7 | コメントを受け付けていません

Reimagining Dance: Real-time Music Co-creation between Dancers and AI

投稿日: 2025年6月16日作成者: jarxiv

要約ダンスのパフォーマンスは、伝統的に、動きが音楽に反応する一方的な関係に従っ … 続きを読む →

カテゴリー: cs.AI, cs.HC, cs.SD, eess.AS | コメントを受け付けていません

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching

投稿日: 2025年6月12日作成者: jarxiv

要約テキストからスピーチ（TTS）の最近の進歩により、非常に自然な音声統合が可 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Regularizing Learnable Feature Extraction for Automatic Speech Recognition

投稿日: 2025年6月12日作成者: jarxiv

要約ニューラルフロントエンドは、音響モデルに適合するように直接訓練できるため、 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

投稿日: 2025年6月12日作成者: jarxiv

要約豊富なマルチモーダル条件を備えたエンドツーエンドの人間のアニメーション、例 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.SD | コメントを受け付けていません

Teaching Physical Awareness to LLMs through Sounds

投稿日: 2025年6月12日作成者: jarxiv

要約大規模な言語モデル（LLM）は、テキストとマルチモーダル処理に顕著な能力を … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Teaching Physical Awareness to LLMs through Sounds

投稿日: 2025年6月11日作成者: jarxiv

要約大規模な言語モデル（LLM）は、テキストとマルチモーダル処理に顕著な能力を … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.RO, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech

Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

Reimagining Dance: Real-time Music Co-creation between Dancers and AI

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching

Regularizing Learnable Feature Extraction for Automatic Speech Recognition

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

Teaching Physical Awareness to LLMs through Sounds

Teaching Physical Awareness to LLMs through Sounds

最近の投稿

最近のコメント

アーカイブ

カテゴリー