「cs.MM」カテゴリーアーカイブ

Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation

投稿日: 2025年2月20日作成者: jarxiv

要約人間のアフォーダンス学習は、推定されたポーズがシーン内の有効な人間の行動を … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Multimodal Fake News Video Explanation Generation: Dataset, Model, and Evaluation

投稿日: 2025年2月20日作成者: jarxiv

要約既存の方法は、分類の問題として偽のニュースビデオの検出に対処していますが、 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention

投稿日: 2025年2月20日作成者: jarxiv

要約感情を理解することは、人間のコミュニケーションの基本的な側面です。オーデ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.MM, cs.SD, eess.AS, F.2.2 | コメントを受け付けていません

Multi-scale Attention Guided Pose Transfer

投稿日: 2025年2月19日作成者: jarxiv

要約ポーズ転送とは、異なるポーズをとっている人の別のイメージから、以前に見えな … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

TIPS: Text-Induced Pose Synthesis

投稿日: 2025年2月19日作成者: jarxiv

要約コンピュータービジョンでは、人間のポーズ統合と転送は、その人のすでに利用可 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Scene Aware Person Image Generation through Global Contextual Conditioning

投稿日: 2025年2月19日作成者: jarxiv

要約人のイメージ生成は、興味をそそるが挑戦的な問題です。ただし、制約された状 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Semantically Consistent Person Image Generation

投稿日: 2025年2月19日作成者: jarxiv

要約コンテキストを認識している人の画像生成のためのデータ駆動型アプローチを提案 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Bridging Compressed Image Latents and Multimodal Large Language Models

投稿日: 2025年2月18日作成者: jarxiv

要約このホワイトペーパーでは、マルチモーダルの大手言語モデル（MLLM）を採用 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications

投稿日: 2025年2月18日作成者: jarxiv

要約このホワイトペーパーでは、生成セマンティックコミュニケーションズ（GENS … 続きを読む →

カテゴリー: cs.CV, cs.IT, cs.MM, eess.SP, math.IT | コメントを受け付けていません

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

投稿日: 2025年2月18日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）の急速な進行により、さまざまなマ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation

Multimodal Fake News Video Explanation Generation: Dataset, Model, and Evaluation

Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention

Multi-scale Attention Guided Pose Transfer

TIPS: Text-Induced Pose Synthesis

Scene Aware Person Image Generation through Global Contextual Conditioning

Semantically Consistent Person Image Generation

Bridging Compressed Image Latents and Multimodal Large Language Models

Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

最近の投稿

最近のコメント

アーカイブ

カテゴリー