「eess.AS」カテゴリーアーカイブ

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

投稿日: 2024年1月26日作成者: jarxiv

要約オーディオビジュアル音声分離は、音声認識、日記化、シーン分析、支援技術など … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks

投稿日: 2024年1月26日作成者: jarxiv

要約暗黙的ニューラル表現 (INR) は急速に成長している研究分野であり、マル … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.NE, cs.SD, eess.AS | コメントを受け付けていません

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

投稿日: 2024年1月26日作成者: jarxiv

要約効果的な音声モデリングの恩恵を受けて、現在の音声大規模言語モデル (SLL … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

投稿日: 2024年1月25日作成者: jarxiv

要約自己教師あり学習 (SSL) は、ラベルのないデータから柔軟な音声表現を学 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

投稿日: 2024年1月25日作成者: jarxiv

要約近年、音楽転写に関する研究は、主にアーキテクチャ設計と楽器固有のデータ取得 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

投稿日: 2024年1月25日作成者: jarxiv

要約我々は、音声認識、音声合成、テキスト生成、音声継続という 4 つのタスクを … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

投稿日: 2024年1月25日作成者: jarxiv

要約音声感情認識 (SER) における一般的なアプローチには、音声情報とテキス … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

PromptASR for contextualized ASR with controllable style

投稿日: 2024年1月25日作成者: jarxiv

要約プロンプトは、トピックや論理的関係などのコンテキスト情報を提供するため、大 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

投稿日: 2024年1月25日作成者: jarxiv

要約音声質問応答 (SQA) は、マシンが特定の音声パッセージ内の回答範囲を見 … 続きを読む →

カテゴリー: cs.CL, cs.IR, cs.SD, eess.AS | コメントを受け付けていません

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

投稿日: 2024年1月25日作成者: jarxiv

要約効果的な音声モデリングの恩恵を受けて、現在の音声大規模言語モデル (SLL … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

PromptASR for contextualized ASR with controllable style

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー