「cs.SD」カテゴリーアーカイブ

Simple and Controllable Music Generation

投稿日: 2023年6月9日作成者: jarxiv

要約私たちは条件付き音楽生成のタスクに取り組みます。圧縮された個別の音楽表現 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

投稿日: 2023年6月8日作成者: jarxiv

要約自動音声認識 (ASR) は大幅に進歩しましたが、最先端の ASR システ … 続きを読む →

カテゴリー: cs.CL, cs.SD | コメントを受け付けていません

Label Aware Speech Representation Learning For Language Identification

投稿日: 2023年6月8日作成者: jarxiv

要約言語認識などの非意味論的タスクに対する音声表現学習アプローチでは、分類子モ … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

投稿日: 2023年6月8日作成者: jarxiv

要約この作品では、ザンビア語のオープンソース多言語音声リソースである Zamb … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

投稿日: 2023年6月8日作成者: jarxiv

要約ウェイクワード検出は、ほとんどのインテリジェントホームおよびポータブル … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

投稿日: 2023年6月7日作成者: jarxiv

要約自己教師あり音声表現は話者情報と音声情報の両方をエンコードすることが知られ … 続きを読む →

カテゴリー: cs.CL, cs.SD | コメントを受け付けていません

Topological Data Analysis for Speech Processing

投稿日: 2023年6月7日作成者: jarxiv

要約トポロジカルデータ分析 (TDA) を音声分類問題と事前学習済み音声モデ … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS, math.AT | コメントを受け付けていません

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

投稿日: 2023年6月7日作成者: jarxiv

要約本稿では、大規模な擬似音声翻訳（ST）コーパスであるGigaSTを紹介しま … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

投稿日: 2023年6月7日作成者: jarxiv

要約自己教師あり学習 (SSL) は、視覚、テキスト、および音声の分野の大規模 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

投稿日: 2023年6月7日作成者: jarxiv

要約私たちは、大規模言語モデル (LLM) にビデオ内の視覚コンテンツと聴覚コ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Simple and Controllable Music Generation

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Label Aware Speech Representation Learning For Language Identification

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

Topological Data Analysis for Speech Processing

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー