「cs.MM」カテゴリーアーカイブ

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation

投稿日: 2025年6月19日作成者: jarxiv

要約外科的ビデオ生成は医学教育と研究を強化する可能性がありますが、既存の方法に … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.RO | コメントを受け付けていません

EgoBlind: Towards Egocentric Visual Assistance for the Blind

投稿日: 2025年6月19日作成者: jarxiv

要約視覚障害者から収集された最初のエゴセントリックビデオデータセットであるeg … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

投稿日: 2025年6月19日作成者: jarxiv

要約今日のAIエージェントはほとんどが沈黙しています – 彼らはオ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM, cs.RO | コメントを受け付けていません

HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs

投稿日: 2025年6月18日作成者: jarxiv

要約ビジョン言語モデル（VLMS）の急速な進歩に牽引されているため、大規模なマ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Quizzard@INOVA Challenge 2025 — Track A: Plug-and-Play Technique in Interleaved Multi-Image Model

投稿日: 2025年6月16日作成者: jarxiv

要約このペーパーでは、2つの主要な目的について説明します。第一に、マルチイメ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

投稿日: 2025年6月16日作成者: jarxiv

要約多様な環境とドメインでの堅牢なナビゲーションには、正確な状態推定と透明な意 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM, cs.RO | コメントを受け付けていません

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

投稿日: 2025年6月13日作成者: jarxiv

要約多様な環境とドメインでの堅牢なナビゲーションには、正確な状態推定と透明な意 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM, cs.RO | コメントを受け付けていません

A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation

投稿日: 2025年6月13日作成者: jarxiv

要約オーディオ駆動型のヒューマンアニメーションテクノロジーは、ヒューマンコンピ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

投稿日: 2025年6月13日作成者: jarxiv

要約大規模なモデルのマルチステップ推論機能を評価するために作成された最初の長い … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment

投稿日: 2025年6月13日作成者: jarxiv

要約最近の研究では、マルチモーダルの大手言語モデル（MLLM）が解釈可能な評価 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation

EgoBlind: Towards Egocentric Visual Assistance for the Blind

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs

Quizzard@INOVA Challenge 2025 — Track A: Plug-and-Play Technique in Interleaved Multi-Image Model

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment

最近の投稿

最近のコメント

アーカイブ

カテゴリー