「cs.CV」カテゴリーアーカイブ

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

投稿日: 2025年6月10日作成者: jarxiv

要約人間の毎日の活動は、ビデオストリームの日常的なイベントのシーケンス（例えば … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DINeMo: Learning Neural Mesh Models with no 3D Annotations

投稿日: 2025年6月10日作成者: jarxiv

要約カテゴリレベルの3D/6Dポーズ推定は、包括的な3Dシーンの理解に向けた重 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes

投稿日: 2025年6月10日作成者: jarxiv

要約 3Dガウススプラッティング（3DG）の最近の拡張は、ニューラルネットワーク … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

A Comparative Study of U-Net Architectures for Change Detection in Satellite Images

投稿日: 2025年6月10日作成者: jarxiv

要約リモートセンシングの変化の検出は、地球の絶えず変化する風景を監視するために … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

ViVo: A Dataset for Volumetric Video Reconstruction and Compression

投稿日: 2025年6月10日作成者: jarxiv

要約神経体積ビデオの再構築と圧縮の繁栄に関する研究として、再構築モデルと圧縮モ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

投稿日: 2025年6月10日作成者: jarxiv

要約ライティングアシスタント（Grammarly、Microsoft Copi … 続きを読む →

カテゴリー: 68T50, cs.AI, cs.CL, cs.CV, I.2.10 | コメントを受け付けていません

Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor

投稿日: 2025年6月10日作成者: jarxiv

要約 Squeeze3Dを提案します。これは、非常に高い圧縮比で3Dデータを圧縮 … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

投稿日: 2025年6月10日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、言語のみの対応物の特性と同様の特性であるコ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

投稿日: 2025年6月10日作成者: jarxiv

要約 LlavaやQwen-VLのような生成的大規模マルチモーダルモデル（LMM … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations

投稿日: 2025年6月10日作成者: jarxiv

要約推論セグメンテーション（RS）は、暗黙のテキストクエリに基づいてオブジェク … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

DINeMo: Learning Neural Mesh Models with no 3D Annotations

Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes

A Comparative Study of U-Net Architectures for Change Detection in Satellite Images

ViVo: A Dataset for Volumetric Video Reconstruction and Compression

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations

最近の投稿

最近のコメント

アーカイブ

カテゴリー