「cs.MM」カテゴリーアーカイブ

MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network

投稿日: 2025年5月5日作成者: jarxiv

要約感情表現は一過性のものであり、マルチモーダルな手がかりの時間的なずれがある … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

投稿日: 2025年5月5日作成者: jarxiv

要約オーディオビジュアル学習における最近の進歩は、モダリティを超えた表現の学習 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

投稿日: 2025年5月5日作成者: jarxiv

要約ムービーダビングは、与えられた短い参照音声のボーカルの音色を維持しながら、 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

ClassWise-CRF: Category-Specific Fusion for Enhanced Semantic Segmentation of Remote Sensing Imagery

投稿日: 2025年5月1日作成者: jarxiv

要約 ClassWise-CRFと呼ばれる結果レベルのカテゴリ固有の融合アーキテ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance

投稿日: 2025年5月1日作成者: jarxiv

要約音楽感情の複雑な性質は、特に単一のオーディオエンコーダー、感情分類器、また … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling

投稿日: 2025年5月1日作成者: jarxiv

要約ラベル付けされたデータの欠如は、音声分類タスク、特に認知状態分類などの広範 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

投稿日: 2025年5月1日作成者: jarxiv

要約迅速なエンジニアリングを使用して音声感情を注釈と認識して、最近、大規模な言 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations

投稿日: 2025年5月1日作成者: jarxiv

要約音楽と音楽からの感情の認識は、音響の重複のために類似点を共有しており、これ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline

投稿日: 2025年5月1日作成者: jarxiv

要約 YouTube ShortsやTiktokのような短いビデオプラットフォー … 続きを読む →

カテゴリー: cs.AI, cs.MM | コメントを受け付けていません

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

投稿日: 2025年4月30日作成者: jarxiv

要約この論文では、複数の入力モダリティ（テキスト、ビデオ、および参照オーディオ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

ClassWise-CRF: Category-Specific Fusion for Enhanced Semantic Segmentation of Remote Sensing Imagery

Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance

Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations

Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー