Visual Geometry Grounded Deep Structure From Motion

要約

Structure-from-Motion (SfM) は、コンピュータビジョンコミュニティにおける長年の問題であり、制約のない 2D 画像のセットからシーンのカメラポーズと 3D 構造を再構築することを目的としています。
従来のフレームワークは、キーポイントの検出と照合、画像の登録、3D ポイントの三角形化、バンドル調整の実行により、段階的にこの問題を解決します。
最近の研究活動は主に、特定の要素 (キーポイントマッチングなど) を強化するために深層学習技術の力を利用することを中心に展開されていますが、依然として元の微分不可能なパイプラインに基づいています。
代わりに、各コンポーネントが完全に微分可能であるため、エンドツーエンドの方法でトレーニングできる新しいディープパイプライン VGGSfM を提案します。
この目的のために、新しいメカニズムと簡素化を導入します。
まず、ディープ 2D ポイントトラッキングにおける最近の進歩に基づいて、信頼性の高いピクセル精度のトラックを抽出します。これにより、ペアごとの一致を連鎖する必要がなくなります。
さらに、カメラを段階的に登録するのではなく、画像と追跡特徴に基づいてすべてのカメラを同時に復元します。
最後に、カメラを最適化し、微分可能なバンドル調整レイヤーを介して 3D ポイントを三角形化します。
CO3D、IMC Phototourism、ETH3D という 3 つの人気のあるデータセットで最先端のパフォーマンスを実現します。

要約(オリジナル)

Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research efforts have predominantly revolved around harnessing the power of deep learning techniques to enhance specific elements (e.g., keypoint matching), but are still based on the original, non-differentiable pipeline. Instead, we propose a new deep pipeline VGGSfM, where each component is fully differentiable and thus can be trained in an end-to-end manner. To this end, we introduce new mechanisms and simplifications. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. Furthermore, we recover all cameras simultaneously based on the image and track features instead of gradually registering cameras. Finally, we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.

arxiv情報

著者	Jianyuan Wang,Nikita Karaev,Christian Rupprecht,David Novotny
発行日	2023-12-07 18:59:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Geometry Grounded Deep Structure From Motion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー