「eess.AS」カテゴリーアーカイブ

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

投稿日: 2023年6月7日作成者: jarxiv

要約本稿では、大規模な擬似音声翻訳（ST）コーパスであるGigaSTを紹介しま … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

投稿日: 2023年6月7日作成者: jarxiv

要約自己教師あり学習 (SSL) は、視覚、テキスト、および音声の分野の大規模 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

投稿日: 2023年6月7日作成者: jarxiv

要約私たちは、大規模言語モデル (LLM) にビデオ内の視覚コンテンツと聴覚コ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

投稿日: 2023年6月6日作成者: jarxiv

要約自己教師ありアルゴリズムによる音声表現学習により、多くの下流タスクのパフォ … 続きを読む →

カテゴリー: cs.LG, eess.AS | コメントを受け付けていません

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

投稿日: 2023年6月6日作成者: jarxiv

要約ディープスピーチエンハンスメントの分野は、その誕生以来、スペクトルマッピ … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Multiple output samples for each input in a single-output Gaussian process

投稿日: 2023年6月6日作成者: jarxiv

要約標準のガウスプロセス (GP) では、トレーニングセット内の入力ごとに … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Pre-training for Speech Translation: CTC Meets Optimal Transport

投稿日: 2023年6月6日作成者: jarxiv

要約音声とテキストのモダリティ間のギャップは、音声からテキストへの翻訳 (ST … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

投稿日: 2023年6月6日作成者: jarxiv

要約最近開発された多言語の弱教師モデルである Whisper は、単言語設定と … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

PolyVoice: Language Models for Speech to Speech Translation

投稿日: 2023年6月6日作成者: jarxiv

要約私たちは、言語モデルベースの音声翻訳 (S2ST) システムのフレームワー … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

投稿日: 2023年6月6日作成者: jarxiv

要約 Video-LLaMAは、Large Language Models（LL … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Multiple output samples for each input in a single-output Gaussian process

Pre-training for Speech Translation: CTC Meets Optimal Transport

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

PolyVoice: Language Models for Speech to Speech Translation

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー