「eess.AS」カテゴリーアーカイブ

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

投稿日: 2023年12月25日作成者: jarxiv

要約自動音声認識 (ASR) 仮説において認識された各トークン (単語、サブワ … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Creating New Voices using Normalizing Flows

投稿日: 2023年12月25日作成者: jarxiv

要約トレーニング中に目に見えない音声アイデンティティにとって、リアルで自然な響 … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Unsupervised Melody-to-Lyric Generation

投稿日: 2023年12月25日作成者: jarxiv

要約メロディーから歌詞への自動生成は、指定されたメロディーに合わせて歌詞を生成 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

投稿日: 2023年12月25日作成者: jarxiv

要約大規模言語モデル (LLM) により、ゼネラリストエージェントの野心的な … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

BANSpEmo: A Bangla Emotional Speech Recognition Dataset

投稿日: 2023年12月22日作成者: jarxiv

要約音声および音声分析の分野では、音響信号から感情を識別する機能が不可欠です。 … 続きを読む →

カテゴリー: cs.HC, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

投稿日: 2023年12月22日作成者: jarxiv

要約アクティブ話者検出 (ASD) のための従来のオーディオビジュアルアプロ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS, eess.IV, eess.SP | コメントを受け付けていません

Speech Translation with Large Language Models: An Industrial Practice

投稿日: 2023年12月22日作成者: jarxiv

要約さまざまなタスクにわたる大規模言語モデル (LLM) が大きな成功を収めて … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

投稿日: 2023年12月22日作成者: jarxiv

要約最近、指示に従う音声言語モデルが、人間との音声対話において広く注目を集めて … 続きを読む →

カテゴリー: cs.CL, cs.LG, eess.AS | コメントを受け付けていません

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

投稿日: 2023年12月22日作成者: jarxiv

要約韻律強調をエンコードして再現する音声合成モデルの機能を評価するために設計さ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

投稿日: 2023年12月22日作成者: jarxiv

要約現在の最先端のオーディオ分析システムは、事前にトレーニングされた埋め込みモ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

Creating New Voices using Normalizing Flows

Unsupervised Melody-to-Lyric Generation

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

BANSpEmo: A Bangla Emotional Speech Recognition Dataset

Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

Speech Translation with Large Language Models: An Industrial Practice

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

最近の投稿

最近のコメント

アーカイブ

カテゴリー