「cs.CV」カテゴリーアーカイブ

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

投稿日: 2025年3月17日作成者: jarxiv

要約視覚的推論は、人間の認知の中心であり、個人が自分の環境を解釈し、抽象的に理 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Disentangled Object-Centric Image Representation for Robotic Manipulation

投稿日: 2025年3月17日作成者: jarxiv

要約ビジョンからロボット操作スキルを学ぶことは、現実世界のシナリオに広く一般化 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Affinity-VAE: incorporating prior knowledge in representation learning from scientific images

投稿日: 2025年3月17日作成者: jarxiv

要約データのコンパクトで解釈可能な表現を学習することは、科学的画像分析における … 続きを読む →

カテゴリー: cs.CV, cs.LG, q-bio.QM | コメントを受け付けていません

Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations

投稿日: 2025年3月17日作成者: jarxiv

要約マルチモーダル学習の統一された表現スペースは、テキスト、画像、オーディオな … 続きを読む →

カテゴリー: cs.CV, cs.LG, stat.ML | コメントを受け付けていません

RASA: Replace Anyone, Say Anything — A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing

投稿日: 2025年3月17日作成者: jarxiv

要約ポートレートビデオ編集は、オーディオまたはビデオストリームに導かれた、ポー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

投稿日: 2025年3月17日作成者: jarxiv

要約エンドツーエンドのドキュメント変換をターゲットにした超コンパクトビジョン言 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

投稿日: 2025年3月17日作成者: jarxiv

要約最先端の変圧器ベースの大規模マルチモーダルモデル（LMMS）は、因果的自己 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Towards Few-Call Model Stealing via Active Self-Paced Knowledge Distillation and Diffusion-Based Image Generation

投稿日: 2025年3月17日作成者: jarxiv

要約拡散モデルは、画像合成の強力な機能を示しており、多くのコンピュータービジョ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Pathology Image Compression with Pre-trained Autoencoders

投稿日: 2025年3月17日作成者: jarxiv

要約デジタル組織病理学の高解像度全体のスライド画像の量が増えているため、重要な … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information

投稿日: 2025年3月17日作成者: jarxiv

要約 We present a novel framework for enha … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Disentangled Object-Centric Image Representation for Robotic Manipulation

Affinity-VAE: incorporating prior knowledge in representation learning from scientific images

Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations

RASA: Replace Anyone, Say Anything — A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Towards Few-Call Model Stealing via Active Self-Paced Knowledge Distillation and Diffusion-Based Image Generation

Pathology Image Compression with Pre-trained Autoencoders

Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information

最近の投稿

最近のコメント

アーカイブ

カテゴリー