「cs.SD」カテゴリーアーカイブ

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

投稿日: 2024年8月2日作成者: jarxiv

要約マルチ楽器音楽転写は、ポリフォニック音楽録音を各楽器に割り当てられた楽譜に … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

投稿日: 2024年8月2日作成者: jarxiv

要約 VoIP (Voice over Internet Protocol) 通 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

投稿日: 2024年8月2日作成者: jarxiv

要約音楽生成における最近の進歩により、創造的な音楽プロセス、現在のビジネスモ … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Generative Expressive Conversational Speech Synthesis

投稿日: 2024年8月2日作成者: jarxiv

要約会話型音声合成 (CSS) は、ユーザーエージェントの会話設定において、 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

投稿日: 2024年8月1日作成者: jarxiv

要約 Machine Listening は、オーディオ信号から関連情報を抽出す … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Beat this! Accurate beat tracking without DBN postprocessing

投稿日: 2024年8月1日作成者: jarxiv

要約私たちは、多様な音楽範囲にわたる汎用性と高精度という 2 つの目的でビート … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

投稿日: 2024年8月1日作成者: jarxiv

要約ニューラルテキスト読み上げ (TTS) システムの急速な発展により、自動 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Generative Expressive Conversational Speech Synthesis

投稿日: 2024年8月1日作成者: jarxiv

要約会話型音声合成 (CSS) は、ユーザーエージェントの会話設定において、 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Can LLMs ‘Reason’ in Music? An Evaluation of LLMs’ Capability of Music Understanding and Generation

投稿日: 2024年8月1日作成者: jarxiv

要約言語に似た記号音楽は、個別の記号でエンコードできます。最近の研究では、G … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

投稿日: 2024年8月1日作成者: jarxiv

要約この論文では、高品質で人間のような同時音声翻訳 (SiST) システムであ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

Generative Expressive Conversational Speech Synthesis

Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

Beat this! Accurate beat tracking without DBN postprocessing

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

Generative Expressive Conversational Speech Synthesis

Can LLMs ‘Reason’ in Music? An Evaluation of LLMs’ Capability of Music Understanding and Generation

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

最近の投稿

最近のコメント

アーカイブ

カテゴリー