「cs.SD」カテゴリーアーカイブ

Towards Early Prediction of Self-Supervised Speech Model Performance

投稿日: 2025年1月13日作成者: jarxiv

要約自己教師あり学習 (SSL) では、事前トレーニングと評価にリソースが大量 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement

投稿日: 2025年1月13日作成者: jarxiv

要約 Conformers などのアテンションベースのアーキテクチャは音声強調に … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

投稿日: 2025年1月10日作成者: jarxiv

要約オーディオの超解像度は、その不適切な性質により困難を伴います。最近、オー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

投稿日: 2025年1月10日作成者: jarxiv

要約音声ベースの対話モデルの開発に対する需要が高まる中、エンドツーエンドの音声 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

投稿日: 2025年1月10日作成者: jarxiv

要約医療分野における多言語自動音声認識 (ASR) は、音声翻訳、音声言語理解 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

投稿日: 2025年1月10日作成者: jarxiv

要約最近の Zero-Shot Text-to-Speech (ZS-TTS) … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

投稿日: 2025年1月10日作成者: jarxiv

要約聴取者の脳波 (EEG) 信号から聴取者の焦点の指向性を解読することは、聴 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

投稿日: 2025年1月10日作成者: jarxiv

要約この記事では、マスクされたオートエンコーダーを利用して音声信号の分析、制御 … 続きを読む →

カテゴリー: cs.AI, cs.SD | コメントを受け付けていません

Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation

投稿日: 2025年1月10日作成者: jarxiv

要約音声から画像への生成モデルをトレーニングするには、意味的に整合した多様な音 … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition

投稿日: 2025年1月9日作成者: jarxiv

要約事前トレーニングされた自動音声認識 (ASR) システムは、一致したドメイ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Towards Early Prediction of Self-Supervised Speech Model Performance

xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation

Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition

最近の投稿

最近のコメント

アーカイブ

カテゴリー