「cs.SD」カテゴリーアーカイブ

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

投稿日: 2023年10月5日作成者: jarxiv

要約最近の研究では、マルチタスク機能を備えた大規模な言語モデルを採用することで … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

投稿日: 2023年10月5日作成者: jarxiv

要約自己監視型音声エンコーダのコードスイッチング機能を直接評価するために設計さ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

投稿日: 2023年10月5日作成者: jarxiv

要約自動発音評価 (APA) は、ある言語における第 2 言語 (L2) 学習 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

投稿日: 2023年10月4日作成者: jarxiv

要約最近のニューラル・ヴォコーディングの進歩は、主に時間領域で動作するGene … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

投稿日: 2023年10月4日作成者: jarxiv

要約自動発音評価（APA）は、ある言語の第二言語（L2）学習者の発音習熟度を定 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

A Large-scale Dataset for Audio-Language Representation Learning

投稿日: 2023年10月4日作成者: jarxiv

要約 AIコミュニティは、大規模なマルチモーダルデータセットに後押しされ、強力な … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

投稿日: 2023年10月3日作成者: jarxiv

要約大量のデータを使用した音声モデルの事前トレーニングは、目覚ましい成功を収め … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

On decoder-only architecture for speech-to-text and large language model integration

投稿日: 2023年10月3日作成者: jarxiv

要約大規模言語モデル (LLM) は、自然言語処理の分野で目覚ましい成功を収め … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

投稿日: 2023年10月3日作成者: jarxiv

要約実際のアプリケーションでは、特に増分生成が必要なストリーミングシナリオで … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

投稿日: 2023年10月3日作成者: jarxiv

要約この研究では、新しいマルチウィンドウマルチヘッドアテンション (MW- … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

A Large-scale Dataset for Audio-Language Representation Learning

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

On decoder-only architecture for speech-to-text and large language model integration

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

最近の投稿

最近のコメント

アーカイブ

カテゴリー