「cs.SD」カテゴリーアーカイブ

Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

投稿日: 2024年7月10日作成者: jarxiv

要約この研究では、プロンプトの情報が高性能音声認識モデル Whisper とど … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

投稿日: 2024年7月10日作成者: jarxiv

要約 Explainable AI for the Arts (XAIxArts … 続きを読む →

カテゴリー: cs.AI, cs.HC, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

投稿日: 2024年7月10日作成者: jarxiv

要約ビデオ – オーディオ (V2A) 生成は、サイレントビデオ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

投稿日: 2024年7月9日作成者: jarxiv

要約ユーモアは人間の社会的行動、感情、認知の重要な要素です。その自動理解によ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

MERGE — A Bimodal Dataset for Static Music Emotion Recognition

投稿日: 2024年7月9日作成者: jarxiv

要約音楽感情認識 (MER) 分野は、特徴エンジニアリング、機械学習、深層学習 … 続きを読む →

カテゴリー: cs.AI, cs.IR, cs.LG, cs.MM, cs.SD | コメントを受け付けていません

Romanization Encoding For Multilingual ASR

投稿日: 2024年7月8日作成者: jarxiv

要約多言語およびコードスイッチング自動音声認識(ASR)システムを最適化するた … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

投稿日: 2024年7月8日作成者: jarxiv

要約従来の音声からの会話インテリジェンスでは、カスケード・パイプラインが使用さ … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

投稿日: 2024年7月8日作成者: jarxiv

要約音声認識ベースの柔軟なシステムや、音声プロンプト付きの大規模言語モデル（L … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect

投稿日: 2024年7月8日作成者: jarxiv

要約自己教師あり学習(SSL)によって事前に学習された音声エンコーダは、音声言 … 続きを読む →

カテゴリー: cs.CL, cs.SD, eess.AS | コメントを受け付けていません

Real-time Timbre Remapping with Differentiable DSP

投稿日: 2024年7月8日作成者: jarxiv

要約音色は、様々な音楽的文脈における主要な表現方法である。しかし、一般的なオー … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.SD, eess.AS, eess.SP | コメントを受け付けていません

「cs.SD」カテゴリーアーカイブ

Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

MERGE — A Bimodal Dataset for Static Music Emotion Recognition

Romanization Encoding For Multilingual ASR

TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect

Real-time Timbre Remapping with Differentiable DSP

最近の投稿

最近のコメント

アーカイブ

カテゴリー