「cs.MM」カテゴリーアーカイブ

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

投稿日: 2023年2月3日作成者: jarxiv

要約視聴覚音声認識 (AVSR) は、音声認識のノイズロバスト性を改善するた … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

投稿日: 2023年2月2日作成者: jarxiv

要約近年、言語、ビジョン、およびマルチモーダル事前トレーニングの大きな収束が見 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022

投稿日: 2023年2月1日作成者: jarxiv

要約スポーツビデオの分析は、広く行われている研究テーマです。そのアプリケー … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.HC, cs.LG, cs.MM | コメントを受け付けていません

Zero3D: Semantic-Driven Multi-Category 3D Shape Generation

投稿日: 2023年2月1日作成者: jarxiv

要約セマンティック主導の 3D 形状生成は、テキストに基づいて調整された 3D … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System

投稿日: 2023年1月31日作成者: jarxiv

要約フェイススプーフィングとしても知られるフェイスプレゼンテーション攻撃 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Inter-View Depth Consistency Testing in Depth Difference Subspace

投稿日: 2023年1月30日作成者: jarxiv

要約マルチビュー深度画像は、自由視点テレビで重要な役割を果たします。この技術 … 続きを読む →

カテゴリー: cs.CV, cs.MM, eess.IV | コメントを受け付けていません

On the Importance of Noise Scheduling for Diffusion Models

投稿日: 2023年1月30日作成者: jarxiv

要約拡散生成モデルのノイズを除去するためのノイズスケジューリング戦略の効果を経 … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG, cs.MM | コメントを受け付けていません

On the Importance of Noise Scheduling for Diffusion Models

投稿日: 2023年1月27日作成者: jarxiv

要約拡散生成モデルのノイズを除去するためのノイズスケジューリング戦略の効果を経 … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG, cs.MM | コメントを受け付けていません

Learning from Mistakes: Self-Regularizing Hierarchical Semantic Representations in Point Cloud Segmentation

投稿日: 2023年1月27日作成者: jarxiv

要約自律ロボット技術の最近の進歩により、正確な環境分析の必要性が高まっています … 続きを読む →

カテゴリー: cs.CV, cs.MM, stat.ML | コメントを受け付けていません

Self-Supervised RGB-T Tracking with Cross-Input Consistency

投稿日: 2023年1月27日作成者: jarxiv

要約本稿では、自己教師あり RGB-T 追跡法を提案します。トレーニングに多 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022

Zero3D: Semantic-Driven Multi-Category 3D Shape Generation

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System

Inter-View Depth Consistency Testing in Depth Difference Subspace

On the Importance of Noise Scheduling for Diffusion Models

On the Importance of Noise Scheduling for Diffusion Models

Learning from Mistakes: Self-Regularizing Hierarchical Semantic Representations in Point Cloud Segmentation

Self-Supervised RGB-T Tracking with Cross-Input Consistency

最近の投稿

最近のコメント

アーカイブ

カテゴリー