「eess.AS」カテゴリーアーカイブ

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

投稿日: 2024年6月21日作成者: jarxiv

要約人間のインタラクション用にリアルなオーディオを生成することは、映画や仮想現 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

投稿日: 2024年6月19日作成者: jarxiv

要約大規模言語モデル (LLM) で強化されたエージェントは、人間と AI の … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis

投稿日: 2024年6月18日作成者: jarxiv

要約音声合成の最近の進歩により、Google マップの音声案内、スクリーンリ … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics

投稿日: 2024年6月18日作成者: jarxiv

要約ディープラーニング (DL) モデルは、環境の健全性を評価するための鳥類の … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

投稿日: 2024年6月18日作成者: jarxiv

要約非音声音と非言語音声を知覚して理解することは、周囲と対話するのに役立つ意思 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

投稿日: 2024年6月17日作成者: jarxiv

要約 DiffuseST は、複数のソース言語から英語に翻訳しながら、入力話者の … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

An efficient text augmentation approach for contextualized Mandarin speech recognition

投稿日: 2024年6月17日作成者: jarxiv

要約文脈に応じた自動音声認識 (ASR) システムは、一般的ではない単語の認識 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

投稿日: 2024年6月17日作成者: jarxiv

要約 Whisper は、堅牢かつ大規模な多言語音声認識モデルとして、多くの低リ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content

投稿日: 2024年6月17日作成者: jarxiv

要約トランジション関連性場所は、対話者が現在の話者の話を遮ることなく発言でき … 続きを読む →

カテゴリー: cs.CL, cs.HC, cs.SD, eess.AS | コメントを受け付けていません

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

投稿日: 2024年6月17日作成者: jarxiv

要約音声言語理解評価 (SLUE) ベンチマークタスクスイートは、自然音声 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis

BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

An efficient text augmentation approach for contextualized Mandarin speech recognition

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー