「cs.CV」カテゴリーアーカイブ

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

投稿日: 2025年3月17日作成者: jarxiv

要約 An old-school recipe for training a c … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

投稿日: 2025年3月17日作成者: jarxiv

要約 Treemeshgptを紹介します。Treemeshgptは、入力ポイント … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.MM | コメントを受け付けていません

Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation

投稿日: 2025年3月17日作成者: jarxiv

要約透明なオブジェクトは日常生活で一般的であり、透明な表面とその背後にあるオブ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

投稿日: 2025年3月17日作成者: jarxiv

要約シーケンスの長さに関するマルチモーダル大手言語モデル（MLLM）の2次複雑 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

投稿日: 2025年3月17日作成者: jarxiv

要約カメラ制御は、テキストまたは画像条件付けられたビデオ生成タスクで積極的に研 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Centaur: Robust End-to-End Autonomous Driving with Test-Time Training

投稿日: 2025年3月17日作成者: jarxiv

要約展開中にエンドツーエンドの自動運転車の複雑な意思決定システムにどのように依 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

VGGT: Visual Geometry Grounded Transformer

投稿日: 2025年3月17日作成者: jarxiv

要約 VGGTは、カメラパラメーター、ポイントマップ、深度マップ、3Dポイントト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation

投稿日: 2025年3月17日作成者: jarxiv

要約エゴセントリック3Dヒトポーズ推定は、ヘッドマウントデバイス（HMD）の前 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Distilling Diversity and Control in Diffusion Models

投稿日: 2025年3月17日作成者: jarxiv

要約蒸留拡散モデルは、重大な制限に悩まされています。サンプルの多様性の低下と比 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

PEMF-VTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

投稿日: 2025年3月17日作成者: jarxiv

要約 Video Virtual Try-Onは、視覚的な忠実度と時間的一貫性の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Centaur: Robust End-to-End Autonomous Driving with Test-Time Training

VGGT: Visual Geometry Grounded Transformer

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation

Distilling Diversity and Control in Diffusion Models

PEMF-VTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

最近の投稿

最近のコメント

アーカイブ

カテゴリー