「cs.CV」カテゴリーアーカイブ

Localizing Memorization in SSL Vision Encoders

投稿日: 2024年12月13日作成者: jarxiv

要約自己教師あり学習 (SSL) における記憶に関する研究に関する最近の研究で … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

投稿日: 2024年12月13日作成者: jarxiv

要約既存のスパースビュー再構成モデルは、正確な既知のカメラのポーズに大きく … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Neptune: The Long Orbit to Benchmarking Long Video Understanding

投稿日: 2024年12月13日作成者: jarxiv

要約このペーパーでは、長いビデオを理解するための難しい質問、回答、おとりのセッ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

投稿日: 2024年12月13日作成者: jarxiv

要約現代の MLLM を開発するための標準的な手法は、ビジョンエンコーダから … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

投稿日: 2024年12月13日作成者: jarxiv

要約シーン内のどこを見ているのかを予測することを目的とした視線ターゲット推定の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

投稿日: 2024年12月13日作成者: jarxiv

要約単一の画像からオブジェクトのジオメトリとマテリアルを復元することは、制約が … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

投稿日: 2024年12月13日作成者: jarxiv

要約人間の認知と同様に、長期間にわたって環境と対話できる AI システムを作成 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living

投稿日: 2024年12月13日作成者: jarxiv

要約 Web ビデオでトレーニングされた現在の大規模言語視覚モデル (LLVM) … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors

投稿日: 2024年12月13日作成者: jarxiv

要約単一画像の 3D 再構成は、固有の幾何学的な曖昧さと限られた視点情報により … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

RatBodyFormer: Rodent Body Surface from Keypoints

投稿日: 2024年12月13日作成者: jarxiv

要約ラットの行動モデリングは多くの科学研究の中心となっているが、テクスチャーの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Localizing Memorization in SSL Vision Encoders

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Neptune: The Long Orbit to Benchmarking Long Video Understanding

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living

LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors

RatBodyFormer: Rodent Body Surface from Keypoints

最近の投稿

最近のコメント

アーカイブ

カテゴリー