「eess.AS」カテゴリーアーカイブ

Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation

投稿日: 2024年6月17日作成者: jarxiv

要約自動音声認識 (ASR) システムは、吃音に関連した不規則性 (不随意なブ … 続きを読む →

カテゴリー: cs.CL, eess.AS, I.2 | コメントを受け付けていません

To what extent can ASV systems naturally defend against spoofing attacks?

投稿日: 2024年6月17日作成者: jarxiv

要約現在の自動話者検証 (ASV) タスクには、ターゲットと非ターゲットの 2 … 続きを読む →

カテゴリー: cs.AI, eess.AS | コメントを受け付けていません

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

投稿日: 2024年6月17日作成者: jarxiv

要約オールインワンのニューラルモデルを使用した、新しいワンパス複数 ASR … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

投稿日: 2024年6月17日作成者: jarxiv

要約音声を大規模言語モデル (LLM) に統合し、その結果、命令追従/コンテキ … 続きを読む →

カテゴリー: cs.AI, cs.CL, eess.AS | コメントを受け付けていません

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

投稿日: 2024年6月17日作成者: jarxiv

要約 Audio-Visual Speech Recognition (AVSR … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

投稿日: 2024年6月14日作成者: jarxiv

要約自己教師あり学習 (SSL) ベースの音声モデルは、フルスタックの音声処理 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Diffusion Gaussian Mixture Audio Denoise

投稿日: 2024年6月14日作成者: jarxiv

要約最近の拡散モデルは、オーディオのノイズ除去タスクにおいて有望なパフォーマン … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

End-to-end Streaming model for Low-Latency Speech Anonymization

投稿日: 2024年6月14日作成者: jarxiv

要約話者の匿名化は、言語コンテンツを保持しながら話者の身元を示す手がかりを隠す … 続きを読む →

カテゴリー: cs.CL, cs.LG, eess.AS | コメントを受け付けていません

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

投稿日: 2024年6月14日作成者: jarxiv

要約 Open Whisper-style Speech Model (OWSM … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

投稿日: 2024年6月14日作成者: jarxiv

要約この論文では、音声言語識別 (SLI) と、多言語放送および組織内での音声 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation

To what extent can ASV systems naturally defend against spoofing attacks?

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

Diffusion Gaussian Mixture Audio Denoise

End-to-end Streaming model for Low-Latency Speech Anonymization

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

最近の投稿

最近のコメント

アーカイブ

カテゴリー