「cs.SD」カテゴリーアーカイブ

EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events

投稿日: 2024年9月12日作成者: jarxiv

要約非侵入型音声品質評価 (NISQA) は、参照音声を必要とせずに音声の平均 … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

投稿日: 2024年9月12日作成者: jarxiv

要約ビデオ入力から BGM を生成する方法を学習するためのフレームワークを紹介 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

投稿日: 2024年9月11日作成者: jarxiv

要約スペクトル減算は、その単純さから広く使用されており、ロボットの発話時のシン … 続きを読む →

カテゴリー: 68T50, cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Soft Acoustic Curvature Sensor: Design and Development

投稿日: 2024年9月11日作成者: jarxiv

要約この論文では、新しい Soft Acoustic Curvature (S … 続きを読む →

カテゴリー: cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Human-mimetic binaural ear design and sound source direction estimation for task realization of musculoskeletal humanoids

投稿日: 2024年9月11日作成者: jarxiv

要約筋骨格ヒューマノイドによる人間に似た環境認識は、実際の複雑な環境でのタスク … 続きを読む →

カテゴリー: cs.RO, cs.SD, eess.AS | コメントを受け付けていません

Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings

投稿日: 2024年9月11日作成者: jarxiv

要約音声ベースのトピックセグメンテーションの最近の進歩により、事前トレーニン … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

投稿日: 2024年9月11日作成者: jarxiv

要約音声質問応答タスクには、音声イベント分類、音声キャプション、およびオープン … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

SpeechTaxi: On Multilingual Semantic Speech Classification

投稿日: 2024年9月11日作成者: jarxiv

要約多言語音声符号化および文字起こしにおける最近の進歩により、意味論的音声分類 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

LAST: Language Model Aware Speech Tokenization

投稿日: 2024年9月11日作成者: jarxiv

要約音声トークン化は音声言語モデル (LM) の基礎として機能し、音声言語モデ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

投稿日: 2024年9月11日作成者: jarxiv

要約私たちは、既存のエンドツーエンドのダイアライゼーションモデルと比較して型 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time

Soft Acoustic Curvature Sensor: Design and Development

Human-mimetic binaural ear design and sound source direction estimation for task realization of musculoskeletal humanoids

Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

SpeechTaxi: On Multilingual Semantic Speech Classification

LAST: Language Model Aware Speech Tokenization

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

最近の投稿

最近のコメント

アーカイブ

カテゴリー