月別アーカイブ: 2024年3月

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

投稿日: 2024年3月20日作成者: jarxiv

要約この論文では、新しいビデオ奥行き推定アプローチ、FutureDepth を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GVGEN: Text-to-3D Generation with Volumetric Representation

投稿日: 2024年3月20日作成者: jarxiv

要約近年、3D ガウススプラッティングは 3D 再構成および生成のための強力 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

WHAC: World-grounded Humans and Cameras

投稿日: 2024年3月20日作成者: jarxiv

要約単眼ビデオからワールド座標系で正確なスケールで人間とカメラの軌跡を推定する … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG, cs.RO | コメントを受け付けていません

FaceXFormer: A Unified Transformer for Facial Analysis

投稿日: 2024年3月20日作成者: jarxiv

要約この研究では、顔解析、ランドマーク検出、頭姿勢推定、属性認識、年齢、性別、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

TexTile: A Differentiable Metric for Texture Tileability

投稿日: 2024年3月20日作成者: jarxiv

要約我々は、繰り返しアーティファクトを導入することなくテクスチャ画像をそれ自体 … 続きを読む →

カテゴリー: 68T07, 68U05, cs.AI, cs.CV, cs.GR, cs.LG, I.2.10 | コメントを受け付けていません

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

投稿日: 2024年3月20日作成者: jarxiv

要約テキストから画像への拡散モデルの顕著な有効性により、ビデオ領域での潜在的な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

投稿日: 2024年3月20日作成者: jarxiv

要約この研究では、事前トレーニングされた拡散モデルからの高解像度画像の生成を詳 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models

投稿日: 2024年3月20日作成者: jarxiv

要約最近、大規模な事前トレーニング済み視覚言語モデル (VLM) は、オープン … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

投稿日: 2024年3月20日作成者: jarxiv

要約視覚言語理解の領域では、視覚コンテンツの解釈と推論におけるモデルの熟練度が … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

投稿日: 2024年3月20日作成者: jarxiv

要約この文書では、Wear-Any-Way と呼ばれる、仮想試着のための新しい … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年3月

FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

GVGEN: Text-to-3D Generation with Volumetric Representation

WHAC: World-grounded Humans and Cameras

FaceXFormer: A Unified Transformer for Facial Analysis

TexTile: A Differentiable Metric for Texture Tileability

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

最近の投稿

最近のコメント

アーカイブ

カテゴリー