VGGT: Visual Geometry Grounded Transformer

要約

VGGTは、カメラパラメーター、ポイントマップ、深度マップ、3Dポイントトラックなど、そのビュー、または数百のビューから、カメラパラメーター、ポイントマップ、深度マップ、3Dポイントトラックなど、シーンのすべての主要な3D属性を直接推進するフィードフォワードニューラルネットワークを提示します。
このアプローチは、3Dコンピュータービジョンでの一歩前進であり、モデルは通常、単一のタスクに制約され、専門化されています。
また、シンプルで効率的で、画像を1秒未満で再構築し、視覚的なジオメトリ最適化技術を使用して後処理を必要とする代替品を引き続き上回っています。
このネットワークは、カメラパラメーターの推定、マルチビュー深度推定、密度の高いポイントクラウド再構成、3Dポイント追跡など、複数の3Dタスクで最先端の結果を達成します。
また、前処理されたVGGTを機能バックボーンとして使用すると、非剛性ポイント追跡やフィードフォワードの新規ビューの合成など、下流のタスクが大幅に向上することも示しています。
コードとモデルは、https://github.com/facebookresearch/vggtで公開されています。

要約(オリジナル)

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. This approach is a step forward in 3D computer vision, where models have typically been constrained to and specialized for single tasks. It is also simple and efficient, reconstructing images in under one second, and still outperforming alternatives that require post-processing with visual geometry optimization techniques. The network achieves state-of-the-art results in multiple 3D tasks, including camera parameter estimation, multi-view depth estimation, dense point cloud reconstruction, and 3D point tracking. We also show that using pretrained VGGT as a feature backbone significantly enhances downstream tasks, such as non-rigid point tracking and feed-forward novel view synthesis. Code and models are publicly available at https://github.com/facebookresearch/vggt.

arxiv情報

著者	Jianyuan Wang,Minghao Chen,Nikita Karaev,Andrea Vedaldi,Christian Rupprecht,David Novotny
発行日	2025-03-14 17:59:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VGGT: Visual Geometry Grounded Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー