「eess.AS」カテゴリーアーカイブ

Visually grounded few-shot word learning in low-resource settings

投稿日: 2023年6月21日作成者: jarxiv

要約我々は、ほんの数個の単語と画像の例のペアから新しい単語とその視覚的描写を学 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Timestamped Embedding-Matching Acoustic-to-Word CTC ASR

投稿日: 2023年6月21日作成者: jarxiv

要約この研究では、多くの実世界のアプリケーションで必要とされる単語の開始時刻と … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Recent Advances in Direct Speech-to-text Translation

投稿日: 2023年6月21日作成者: jarxiv

要約最近、音声からテキストへの翻訳がますます注目を集めており、多くの研究が急速 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Align, Adapt and Inject: Sound-guided Unified Image Generation

投稿日: 2023年6月21日作成者: jarxiv

要約テキストガイドによる画像生成は、拡散モデルの開発により前例のない進歩を遂げ … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.SD, eess.AS | コメントを受け付けていません

Correlation Clustering of Bird Sounds

投稿日: 2023年6月19日作成者: jarxiv

要約鳥の鳴き声の分類は、あらゆる音声記録を、その記録で聞こえる鳥の種類に関連付 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

投稿日: 2023年6月19日作成者: jarxiv

要約音声言語理解 (SLU) タスクは、音声研究コミュニティで何十年も研究され … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition

投稿日: 2023年6月19日作成者: jarxiv

要約対照学習ベースの事前トレーニング方法は、最近、さまざまな分野で目覚ましい成 … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

投稿日: 2023年6月19日作成者: jarxiv

要約この論文では、音声合成韻律を支援する機能として、特定の文脈における単語の予 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

On Data Sampling Strategies for Training Neural Network Speech Separation Models

投稿日: 2023年6月19日作成者: jarxiv

要約音声分離は依然としてマルチスピーカー信号処理の重要な領域です。ディープ … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.NE, cs.SD, eess.AS | コメントを受け付けていません

Evaluation of Speech Representations for MOS prediction

投稿日: 2023年6月19日作成者: jarxiv

要約この論文では、音声品質を予測するための特徴抽出モデルを評価します。また、 … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Visually grounded few-shot word learning in low-resource settings

Timestamped Embedding-Matching Acoustic-to-Word CTC ASR

Recent Advances in Direct Speech-to-text Translation

Align, Adapt and Inject: Sound-guided Unified Image Generation

Correlation Clustering of Bird Sounds

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

On Data Sampling Strategies for Training Neural Network Speech Separation Models

Evaluation of Speech Representations for MOS prediction

最近の投稿

最近のコメント

アーカイブ

カテゴリー