「cs.SD」カテゴリーアーカイブ

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation

投稿日: 2024年12月31日作成者: jarxiv

要約同時音声翻訳 (SimulST) では、ストリーミング音声入力を継続的に処 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment

投稿日: 2024年12月31日作成者: jarxiv

要約音声とテキストを活用するマルチモーダル感情認識 (MER) は、人間とコン … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

投稿日: 2024年12月31日作成者: jarxiv

要約 TangoFlux は、5 億 1500 万のパラメーターを備えた効率的な … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

ETTA: Elucidating the Design Space of Text-to-Audio Models

投稿日: 2024年12月30日作成者: jarxiv

要約近年、Text-To-Audio (TTA) 合成が大幅に進歩しており、ユ … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

投稿日: 2024年12月30日作成者: jarxiv

要約この技術レポートでは、台湾華語の音声大規模言語モデル (LLM) を構築す … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Mamba for Streaming ASR Combined with Unimodal Aggregation

投稿日: 2024年12月30日作成者: jarxiv

要約この論文はストリーミング自動音声認識 (ASR) に取り組んでいます。最 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

投稿日: 2024年12月30日作成者: jarxiv

要約自己教師あり学習 (SSL) は、視覚、テキスト、および音声の分野の大規模 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling

投稿日: 2024年12月25日作成者: jarxiv

要約小型無人航空機 (UAV) の普及が進むにつれ、公共の安全とプライバシーへ … 続きを読む →

カテゴリー: cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Long-Form Speech Generation with Spoken Language Models

投稿日: 2024年12月25日作成者: jarxiv

要約私たちは、長文マルチメディア生成とオーディオネイティブ音声アシスタントの要 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

How ‘Real’ is Your Real-Time Simultaneous Speech-to-Text Translation System?

投稿日: 2024年12月25日作成者: jarxiv

要約音声テキスト同時翻訳 (SimulST) は、話者の発話と同時にソース言語 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

ETTA: Elucidating the Design Space of Text-to-Audio Models

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Mamba for Streaming ASR Combined with Unimodal Aggregation

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling

Long-Form Speech Generation with Spoken Language Models

How ‘Real’ is Your Real-Time Simultaneous Speech-to-Text Translation System?

最近の投稿

最近のコメント

アーカイブ

カテゴリー