「cs.CV」カテゴリーアーカイブ

HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization

投稿日: 2025年6月19日作成者: jarxiv

要約私たちは、現実的でもっともらしい人間とオブジェクトの相互作用（HOI）を合 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

投稿日: 2025年6月19日作成者: jarxiv

要約分散分布（OOD）サンプルの検出は、機械学習システムの安全性を確保するため … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents

投稿日: 2025年6月19日作成者: jarxiv

要約大規模なビジョン言語モデルは最近、計画および制御タスクの印象的なパフォーマ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

投稿日: 2025年6月19日作成者: jarxiv

要約最近のマルチモーダル大手言語モデル（MLLM）は、ベンチマークビジョン言語 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning

投稿日: 2025年6月19日作成者: jarxiv

要約ビジョン言語モデル（VLM）の推論時間検索の大幅な進歩にもかかわらず、既存 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

投稿日: 2025年6月19日作成者: jarxiv

要約自律車両（AV）などの安全性が批判的な物理AIシステムの実世界データを収集 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

投稿日: 2025年6月19日作成者: jarxiv

要約単一の画像またはビデオを再照合するという課題に対処します。これは、正確なシ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Sekai: A Video Dataset towards World Exploration

投稿日: 2025年6月19日作成者: jarxiv

要約ビデオ生成技術は驚くべき進歩を遂げており、インタラクティブな世界探査の基盤 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

投稿日: 2025年6月19日作成者: jarxiv

要約今日のAIエージェントはほとんどが沈黙しています – 彼らはオ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM, cs.RO | コメントを受け付けていません

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

投稿日: 2025年6月19日作成者: jarxiv

要約変形可能なオブジェクトのダイナミクスのモデリングは、それらの多様な物理的特 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

Sekai: A Video Dataset towards World Exploration

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

最近の投稿

最近のコメント

アーカイブ

カテゴリー