「eess.AS」カテゴリーアーカイブ

SpeechAlign: Aligning Speech Generation to Human Preferences

投稿日: 2024年4月9日作成者: jarxiv

要約音声言語モデルは、リアルな音声を生成する点で大幅に進歩しており、ニューラル … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

投稿日: 2024年4月9日作成者: jarxiv

要約自動音声認識 (ASR) システムは騒がしい環境では大幅に性能が低下します … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

投稿日: 2024年4月9日作成者: jarxiv

要約プライバシー制限により、医療分野では公的に利用可能な音声認識データセットが … 続きを読む →

カテゴリー: cs.AI, cs.CL, eess.AS | コメントを受け付けていません

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context

投稿日: 2024年4月8日作成者: jarxiv

要約アフリカの音声のみを対象としてトレーニングされた、最初の自己教師あり多言語 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

投稿日: 2024年4月5日作成者: jarxiv

要約合成メディアが次第にリアルになり、それを利用する障壁が下がり続けるにつれ、 … 続きを読む →

カテゴリー: 68T01, cs.AI, cs.HC, cs.SD, eess.AS, I.2 | コメントを受け付けていません

Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

投稿日: 2024年4月5日作成者: jarxiv

要約音楽は人々の心理や行動パターンの形成に大きな役割を果たしている。本論文では … 続きを読む →

カテゴリー: cs.AI, cs.IR, cs.SD, eess.AS | コメントを受け付けていません

The VoicePrivacy 2024 Challenge Evaluation Plan

投稿日: 2024年4月4日作成者: jarxiv

要約この課題の課題は、言語的な内容や感情の状態を保護しつつ、話者の声の身元を隠 … 続きを読む →

カテゴリー: cs.CL, cs.CR, eess.AS | コメントを受け付けていません

Encoding of lexical tone in self-supervised models of spoken language

投稿日: 2024年4月4日作成者: jarxiv

要約解釈可能性の研究により、自己教師付き音声言語モデル（SLM）は、音響、音声 … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation

投稿日: 2024年4月4日作成者: jarxiv

要約音声コミュニケーションにおける同調行動と模倣行動を研究するために、二人組に … 続きを読む →

カテゴリー: cs.CL, eess.AS | コメントを受け付けていません

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

投稿日: 2024年4月4日作成者: jarxiv

要約最近の研究では、マルチタスク機能を持つ大規模な言語モデルを活用し、自然言語 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「eess.AS」カテゴリーアーカイブ

SpeechAlign: Aligning Speech Generation to Human Preferences

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

The VoicePrivacy 2024 Challenge Evaluation Plan

Encoding of lexical tone in self-supervised models of spoken language

ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

最近の投稿

最近のコメント

アーカイブ

カテゴリー