「cs.CV」カテゴリーアーカイブ

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

投稿日: 2024年10月22日作成者: jarxiv

要約我々は、xGen-MM-Vid (BLIP-3-Video) を紹介します … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

投稿日: 2024年10月22日作成者: jarxiv

要約 Segment Anything Model 2 (SAM 2) は、画像 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors

投稿日: 2024年10月22日作成者: jarxiv

要約ドラッグベースの編集は、画像生成モデルの機能によって 2D コンテンツ作成 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

投稿日: 2024年10月22日作成者: jarxiv

要約 Neural Radiance Fields (NeRF) は、主に高忠実 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Toward Generalizing Visual Brain Decoding to Unseen Subjects

投稿日: 2024年10月22日作成者: jarxiv

要約視覚脳デコーディングは、人間の脳活動から視覚情報を解読することを目的として … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Self Supervised Deep Learning for Robot Grasping

投稿日: 2024年10月21日作成者: jarxiv

要約学習ベースのロボット把握には現在、ラベル付きデータの使用が含まれています。 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Optimal DLT-based Solutions for the Perspective-n-Point

投稿日: 2024年10月21日作成者: jarxiv

要約我々は、従来の DLT よりもはるかに優れた動作で透視 n 点 (PnP) … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

投稿日: 2024年10月21日作成者: jarxiv

要約ポイントライン SLAM システムでは、ライン構造情報の利用とラインの最適 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Learning autonomous driving from aerial imagery

投稿日: 2024年10月21日作成者: jarxiv

要約この研究では、航空画像のみから地上車両の制御のためのエンドツーエンドの知覚 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

投稿日: 2024年10月21日作成者: jarxiv

要約既存のオーディオ駆動型の顔アニメーション手法は、表情漏れ、非効果的な微妙な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors

FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

Toward Generalizing Visual Brain Decoding to Unseen Subjects

Self Supervised Deep Learning for Robot Grasping

Optimal DLT-based Solutions for the Perspective-n-Point

PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

Learning autonomous driving from aerial imagery

Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

最近の投稿

最近のコメント

アーカイブ

カテゴリー