「eess.AS」カテゴリーアーカイブ

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

投稿日: 2023年9月20日作成者: jarxiv

要約命令調整された大規模言語モデル (LLM) とエンドツーエンドの自動音声認 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Multimodal Modeling For Spoken Language Identification

投稿日: 2023年9月20日作成者: jarxiv

要約音声言語識別とは、特定の発話内の音声言語を自動的に予測するタスクを指します … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Controllable Speaking Styles Using a Large Language Model

投稿日: 2023年9月20日作成者: jarxiv

要約参照ベースの Text-to-Speech (TTS) モデルは、同じター … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

投稿日: 2023年9月20日作成者: jarxiv

要約事前トレーニングされた言語モデルは、さまざまな音楽の理解と生成のタスクにお … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping

投稿日: 2023年9月20日作成者: jarxiv

要約私たちは、特定の地理的位置で知覚される可能性が最も高い音を予測することを含 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Sound Source Localization is All about Cross-Modal Alignment

投稿日: 2023年9月20日作成者: jarxiv

要約人間は、音源定位と呼ばれる、視覚的なシーンにおける音源の方向を容易に認識で … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

投稿日: 2023年9月20日作成者: jarxiv

要約視聴覚表現学習は、聴覚情報と視覚情報の相関関係を利用して、人間のような知覚 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

投稿日: 2023年9月20日作成者: jarxiv

要約会話中の感情認識 (ERC) は、実用化の可能性が非常に高いため、自然言語 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

投稿日: 2023年9月20日作成者: jarxiv

要約ディープラーニングの発展に伴い、自動音声認識 (ASR) は大幅に進歩しま … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

投稿日: 2023年9月19日作成者: jarxiv

要約我々は、音声認識、音声合成、テキスト生成、音声継続という 4 つのタスクを … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

Multimodal Modeling For Spoken Language Identification

Controllable Speaking Styles Using a Large Language Model

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping

Sound Source Localization is All about Cross-Modal Alignment

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

最近の投稿

最近のコメント

アーカイブ

カテゴリー