「cs.SD」カテゴリーアーカイブ

Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks

投稿日: 2025年1月23日作成者: jarxiv

要約この論文では、テーブルトップロールプレイングゲーム (TRPG) の … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.NE, cs.SD, eess.AS | コメントを受け付けていません

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction

投稿日: 2025年1月23日作成者: jarxiv

要約この論文では、自動音声認識 (ASR) 後の生成音声誤り訂正 (GenSE … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling

投稿日: 2025年1月22日作成者: jarxiv

要約小型無人航空機 (UAV) の普及が進むにつれ、公共の安全とプライバシーへ … 続きを読む →

カテゴリー: cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Audio Texture Manipulation by Exemplar-Based Analogy

投稿日: 2025年1月22日作成者: jarxiv

要約オーディオテクスチャの操作には、聴覚要素の追加、削除、置換などの特定の変 … 続きを読む →

カテゴリー: cs.LG, cs.SD, eess.AS | コメントを受け付けていません

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication

投稿日: 2025年1月22日作成者: jarxiv

要約ウェイクワード検出は、AI アシスタントがユーザーの声を聞き、効果的に対話 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS, I.2.7 | コメントを受け付けていません

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

投稿日: 2025年1月22日作成者: jarxiv

要約最近のマルチモーダル大規模言語モデル (MLLM) は通常、視覚的モダリテ … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

How Redundant Is the Transformer Stack in Speech Representation Models?

投稿日: 2025年1月20日作成者: jarxiv

要約自己教師あり音声表現モデル、特にトランスアーキテクチャを活用したモデルは … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

投稿日: 2025年1月20日作成者: jarxiv

要約 kNN-CTC モデルは、単言語自動音声認識 (ASR) に有効であること … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

投稿日: 2025年1月20日作成者: jarxiv

要約音声言語理解 (SLU) は、音声の分野における構造予測タスクです。最近 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments

投稿日: 2025年1月20日作成者: jarxiv

要約オーディオ信号処理における深層強化学習 (DRL) アプローチは近年大幅な … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction

Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling

Audio Texture Manipulation by Exemplar-Based Analogy

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

How Redundant Is the Transformer Stack in Speech Representation Models?

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments

最近の投稿

最近のコメント

アーカイブ

カテゴリー