「cs.SD」カテゴリーアーカイブ

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

投稿日: 2025年5月23日作成者: jarxiv

要約自動音声認識（ASR）の最近の進歩は、大規模な音声コーパスによって大きく促 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Slamming: Training a Speech Language Model on One GPU in a Day

投稿日: 2025年5月23日作成者: jarxiv

要約 24時間で単一のアカデミックGPUで高品質の音声言語モデル（SLM）をトレ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

投稿日: 2025年5月22日作成者: jarxiv

要約サブグループの格差とパフォーマンスバイアスは計算研究でますます研究されてい … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling

投稿日: 2025年5月22日作成者: jarxiv

要約強い一貫性を持つ大規模な感情的な音声データを取得することは、音声統合の課題 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

投稿日: 2025年5月22日作成者: jarxiv

要約個別の音声トークンは、言語モデルベースの音声生成に強い可能性を示しています … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

dMel: Speech Tokenization made Simple

投稿日: 2025年5月22日作成者: jarxiv

要約大規模な言語モデルは、膨大なテキストデータに自己監視された事前供与を活用す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

投稿日: 2025年5月22日作成者: jarxiv

要約視聴覚学習の最近の進歩により、モダリティ全体の学習表現における有望な結果が … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach

投稿日: 2025年5月22日作成者: jarxiv

要約視覚的なキューを統合することにより、騒々しい環境での視聴覚音声認識（AVS … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

投稿日: 2025年5月21日作成者: jarxiv

要約自己学習学習（SSL）の最近の開発は、スピーカー検証（SV）の重要な可能性 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples

投稿日: 2025年5月21日作成者: jarxiv

要約オーディオ認識の大規模な言語モデル（ALLMS）の最近の進歩により、オーデ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

Slamming: Training a Speech Language Model on One GPU in a Day

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

dMel: Speech Tokenization made Simple

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples

最近の投稿

最近のコメント

アーカイブ

カテゴリー