「eess.AS」カテゴリーアーカイブ

Decoder-only Architecture for Streaming End-to-end Speech Recognition

投稿日: 2024年8月2日作成者: jarxiv

要約デコーダ専用言語モデル (LM) は、自動音声認識 (ASR) を含む音声 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

投稿日: 2024年8月2日作成者: jarxiv

要約音楽生成における最近の進歩により、創造的な音楽プロセス、現在のビジネスモ … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Generative Expressive Conversational Speech Synthesis

投稿日: 2024年8月2日作成者: jarxiv

要約会話型音声合成 (CSS) は、ユーザーエージェントの会話設定において、 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

投稿日: 2024年8月1日作成者: jarxiv

要約 Machine Listening は、オーディオ信号から関連情報を抽出す … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Beat this! Accurate beat tracking without DBN postprocessing

投稿日: 2024年8月1日作成者: jarxiv

要約私たちは、多様な音楽範囲にわたる汎用性と高精度という 2 つの目的でビート … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Towards interfacing large language models with ASR systems using confidence measures and prompting

投稿日: 2024年8月1日作成者: jarxiv

要約大規模言語モデル (LLM) のパラメーターサイズとプロンプトによる対話 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

投稿日: 2024年8月1日作成者: jarxiv

要約ニューラルテキスト読み上げ (TTS) システムの急速な発展により、自動 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Generative Expressive Conversational Speech Synthesis

投稿日: 2024年8月1日作成者: jarxiv

要約会話型音声合成 (CSS) は、ユーザーエージェントの会話設定において、 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Can LLMs ‘Reason’ in Music? An Evaluation of LLMs’ Capability of Music Understanding and Generation

投稿日: 2024年8月1日作成者: jarxiv

要約言語に似た記号音楽は、個別の記号でエンコードできます。最近の研究では、G … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

投稿日: 2024年8月1日作成者: jarxiv

要約この論文では、高品質で人間のような同時音声翻訳 (SiST) システムであ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

Generative Expressive Conversational Speech Synthesis

Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

Beat this! Accurate beat tracking without DBN postprocessing

Towards interfacing large language models with ASR systems using confidence measures and prompting

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

Generative Expressive Conversational Speech Synthesis

Can LLMs ‘Reason’ in Music? An Evaluation of LLMs’ Capability of Music Understanding and Generation

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

最近の投稿

最近のコメント

アーカイブ

カテゴリー