「eess.AS」カテゴリーアーカイブ

An approach to optimize inference of the DIART speaker diarization pipeline

投稿日: 2024年8月6日作成者: jarxiv

要約話者ダイアライゼーションは、音声ファイルについて「誰がいつ話したか」という … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition

投稿日: 2024年8月6日作成者: jarxiv

要約最新の自動音声認識 (ASR) システムは通常、数万時間以上の音声データに … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Language Model Can Listen While Speaking

投稿日: 2024年8月6日作成者: jarxiv

要約対話は、人間とコンピューターの対話 (HCI) の最も自然な方法として機能 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.HC, cs.SD, eess.AS | コメントを受け付けていません

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

投稿日: 2024年8月5日作成者: jarxiv

要約音声と言語を共同で処理するマルチモーダルモデルは、音声理解において大きな可 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework

投稿日: 2024年8月5日作成者: jarxiv

要約一般化ゼロショット学習(GZSL)は、見たクラスと見たことのないクラスの両 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS, eess.IV | コメントを受け付けていません

ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio

投稿日: 2024年8月4日作成者: jarxiv

要約西洋音楽の伝統において、和音は和声の主要な構成要素であり、音楽の基本的な側 … 続きを読む →

カテゴリー: 68P20, cs.LG, cs.MM, cs.SD, eess.AS, I.2.6 | コメントを受け付けていません

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

投稿日: 2024年8月4日作成者: jarxiv

要約近年、大規模音声合成（TTS）モデルは大きな進歩を遂げているが、中国語の方 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

投稿日: 2024年8月4日作成者: jarxiv

要約本論文では、3つのマルチモーダル言語理解タスク、AV-ASR（視聴覚自動音 … 続きを読む →

カテゴリー: cs.CL, cs.CV, eess.AS | コメントを受け付けていません

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

投稿日: 2024年8月2日作成者: jarxiv

要約マルチ楽器音楽転写は、ポリフォニック音楽録音を各楽器に割り当てられた楽譜に … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

投稿日: 2024年8月2日作成者: jarxiv

要約 VoIP (Voice over Internet Protocol) 通 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

An approach to optimize inference of the DIART speaker diarization pipeline

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition

Language Model Can Listen While Speaking

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework

ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

最近の投稿

最近のコメント

アーカイブ

カテゴリー