「cs.SD」カテゴリーアーカイブ

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

投稿日: 2024年9月2日作成者: jarxiv

要約言語モデルの最近の進歩は大幅な進歩を遂げています。 GPT-4o は新たな … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.HC, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

投稿日: 2024年8月30日作成者: jarxiv

要約この論文では、線形および非線形の両方の硬いストリングのダイナミクスをモデル … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS, physics.comp-ph | コメントを受け付けていません

Measuring the Accuracy of Automatic Speech Recognition Solutions

投稿日: 2024年8月30日作成者: jarxiv

要約聴覚障害者および聴覚障害者 (DHH) の人々にとって、キャプションは不可 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS, I.2.7 | コメントを受け付けていません

SALSA: Speedy ASR-LLM Synchronous Aggregation

投稿日: 2024年8月30日作成者: jarxiv

要約事前トレーニングされた LLM を利用して、特に低リソース言語の ASR … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Innovative Speech-Based Deep Learning Approaches for Parkinson’s Disease Classification: A Systematic Review

投稿日: 2024年8月30日作成者: jarxiv

要約パーキンソン病 (PD) は世界で 2 番目に蔓延している神経変性疾患であ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

投稿日: 2024年8月30日作成者: jarxiv

要約言語モデルの最近の進歩は大幅な進歩を遂げています。 GPT-4o は新たな … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.HC, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Easy, Interpretable, Effective: openSMILE for voice deepfake detection

投稿日: 2024年8月30日作成者: jarxiv

要約この論文では、音声の信頼性とディープフェイク検出の分野における事実上の標準 … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Multi-modal Adversarial Training for Zero-Shot Voice Cloning

投稿日: 2024年8月29日作成者: jarxiv

要約与えられたテキストから音声を再構築するようにトレーニングされたテキスト読み … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

投稿日: 2024年8月29日作成者: jarxiv

要約 Text-to-speech (TTS) を大規模なデータセットにスケーリ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

投稿日: 2024年8月29日作成者: jarxiv

要約単語誤り率 (WER) は、自動音声認識 (ASR) の精度の一般的な尺度 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS, I.2.7 | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

Measuring the Accuracy of Automatic Speech Recognition Solutions

SALSA: Speedy ASR-LLM Synchronous Aggregation

Innovative Speech-Based Deep Learning Approaches for Parkinson’s Disease Classification: A Systematic Review

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Easy, Interpretable, Effective: openSMILE for voice deepfake detection

Multi-modal Adversarial Training for Zero-Shot Voice Cloning

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

最近の投稿

最近のコメント

アーカイブ

カテゴリー