「cs.MM」カテゴリーアーカイブ

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

投稿日: 2024年2月2日作成者: jarxiv

要約ビデオ異常検出 (VAD) は、インテリジェント監視システムにおいて極めて … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

投稿日: 2024年1月31日作成者: jarxiv

要約感情認識は人間の会話を理解するために重要なタスクです。言語、音声、表情な … 続きを読む →

カテゴリー: cs.CL, cs.MM | コメントを受け付けていません

A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

投稿日: 2024年1月31日作成者: jarxiv

要約歌声変換 (SVC) は、ある歌手の歌声を、元の歌詞とメロディーを備えた別 … 続きを読む →

カテゴリー: cs.AI, cs.CR, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

An Open Software Suite for Event-Based Video

投稿日: 2024年1月31日作成者: jarxiv

要約従来のビデオ表現は個別の画像フレームを中心に編成されていますが、イベントベ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas

投稿日: 2024年1月30日作成者: jarxiv

要約トレーラーを作成するには、長いビデオから短い魅力的な瞬間を慎重に選択してつ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Cross-Modal Coordination Across a Diverse Set of Input Modalities

投稿日: 2024年1月30日作成者: jarxiv

要約クロスモーダル検索は、別のクエリを使用して特定のモダリティのサンプルを取得 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Synchformer: Efficient Synchronization from Sparse Cues

投稿日: 2024年1月30日作成者: jarxiv

要約私たちの目的は、同期キューがまばらな可能性がある YouTube などの「 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

投稿日: 2024年1月29日作成者: jarxiv

要約近年、Web ビデオの爆発的な増加により、ビデオのフィルタリング、推奨、検 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

投稿日: 2024年1月29日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) は目覚ましい進歩を遂げ、強力 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MM | コメントを受け付けていません

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

投稿日: 2024年1月26日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) は目覚ましい進歩を遂げ、強力 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

An Open Software Suite for Event-Based Video

Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas

Cross-Modal Coordination Across a Diverse Set of Input Modalities

Synchformer: Efficient Synchronization from Sparse Cues

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー