「cs.SD」カテゴリーアーカイブ

acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices

投稿日: 2025年1月30日作成者: jarxiv

要約 1.人工知能（AI）と組み合わせたパッシブ音響モニタリング（PAM）は、生 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS, H.5.5 | コメントを受け付けていません

Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text

投稿日: 2025年1月30日作成者: jarxiv

要約単語エラー率（WER）推定は、グラウンドトゥルースラベルを必要とせずに自動 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

投稿日: 2025年1月30日作成者: jarxiv

要約特に低リソース言語では、多言語性能を高める自動音声認識（ASR）のデコード … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

投稿日: 2025年1月30日作成者: jarxiv

要約トランスフォーマーは、オーディオ処理タスクに新しいベンチマークを設定し、オ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching

投稿日: 2025年1月30日作成者: jarxiv

要約最近の音声変換（VC）システムでの顕著な進歩にもかかわらず、ゼロショットシ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS, eess.SP | コメントを受け付けていません

Yin-Yang: Developing Motifs With Long-Term Structure And Controllability

投稿日: 2025年1月30日作成者: jarxiv

要約トランスモデルは、象徴的に表現された音楽を生成して、地元の一貫性を備えた大 … 続きを読む →

カテゴリー: cs.AI, cs.SC, cs.SD | コメントを受け付けていません

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

投稿日: 2025年1月29日作成者: jarxiv

要約コンピューター支援の音楽構成ワークフロー向けに設計された変圧器アーキテクチ … 続きを読む →

カテゴリー: cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Whispers of Sound-Enhancing Information Extraction from Depression Patients’ Unstructured Data through Audio and Text Emotion Recognition and Llama Fine-tuning

投稿日: 2025年1月29日作成者: jarxiv

要約この研究では、うつ病の分類の精度を高めるために、教師と学生のアーキテクチャ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Audio-Visual Deepfake Detection With Local Temporal Inconsistencies

投稿日: 2025年1月29日作成者: jarxiv

要約このペーパーでは、オーディオと視覚モダリティの間のきめの細かい時間的矛盾を … 続きを読む →

カテゴリー: cs.CR, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

投稿日: 2025年1月29日作成者: jarxiv

要約サウンドは、人間の知覚において大きな役割を果たします。ビジョンに加えて、 … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices

Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching

Yin-Yang: Developing Motifs With Long-Term Structure And Controllability

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Whispers of Sound-Enhancing Information Extraction from Depression Patients’ Unstructured Data through Audio and Text Emotion Recognition and Llama Fine-tuning

Audio-Visual Deepfake Detection With Local Temporal Inconsistencies

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

最近の投稿

最近のコメント

アーカイブ

カテゴリー