「cs.SD」カテゴリーアーカイブ

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

投稿日: 2024年6月19日作成者: jarxiv

要約大規模言語モデル (LLM) で強化されたエージェントは、人間と AI の … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics

投稿日: 2024年6月18日作成者: jarxiv

要約ディープラーニング (DL) モデルは、環境の健全性を評価するための鳥類の … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

投稿日: 2024年6月18日作成者: jarxiv

要約非音声音と非言語音声を知覚して理解することは、周囲と対話するのに役立つ意思 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

投稿日: 2024年6月17日作成者: jarxiv

要約 DiffuseST は、複数のソース言語から英語に翻訳しながら、入力話者の … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

An efficient text augmentation approach for contextualized Mandarin speech recognition

投稿日: 2024年6月17日作成者: jarxiv

要約文脈に応じた自動音声認識 (ASR) システムは、一般的ではない単語の認識 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

投稿日: 2024年6月17日作成者: jarxiv

要約 Whisper は、堅牢かつ大規模な多言語音声認識モデルとして、多くの低リ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content

投稿日: 2024年6月17日作成者: jarxiv

要約トランジション関連性場所は、対話者が現在の話者の話を遮ることなく発言でき … 続きを読む →

カテゴリー: cs.CL, cs.HC, cs.SD, eess.AS | コメントを受け付けていません

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

投稿日: 2024年6月17日作成者: jarxiv

要約音声言語理解評価 (SLUE) ベンチマークタスクスイートは、自然音声 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

投稿日: 2024年6月17日作成者: jarxiv

要約オールインワンのニューラルモデルを使用した、新しいワンパス複数 ASR … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

投稿日: 2024年6月17日作成者: jarxiv

要約 Audio-Visual Speech Recognition (AVSR … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

An efficient text augmentation approach for contextualized Mandarin speech recognition

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

最近の投稿

最近のコメント

アーカイブ

カテゴリー