DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

要約

ステレオマッチングは、コンピュータービジョンとロボット工学におけるメトリック深度推定の重要な技術です。
オクルージョンや非テクスチャなどの現実世界の課題により、両眼一致キューからの正確な視差推定が妨げられます。
最近、単眼相対奥行き推定は、視覚基礎モデルを使用して顕著な一般化を示しています。
したがって、単眼深度キューを使用したロバストなステレオマッチングを促進するために、リカレントステレオマッチングフレームワークにロバストな単眼相対深度モデルを組み込み、深度基礎モデルベースのステレオマッチング、DEFOM-Stereoのための新しいフレームワークを構築します。
特徴抽出段階では、従来の CNN と DEFOM の特徴を統合することにより、結合されたコンテキストとマッチング特徴エンコーダーを構築します。
更新段階では、DEFOM によって予測された深度を使用して反復的な視差を初期化し、スケール更新モジュールを導入して正しいスケールで視差を調整します。
DEFOM-Stereo は、最先端 (SOTA) メソッドを使用したシーンフローデータセット上で同等のパフォーマンスを発揮することが検証されており、特により強力なゼロショット一般化を示しています。
さらに、DEFOM-Stereo は、KITTI 2012、KITTI 2015、Middlebury、および ETH3D ベンチマークで SOTA パフォーマンスを達成し、多くの指標で 1 位にランクされています。
ロバストビジョンの課題の下での共同評価では、私たちのモデルは同時に個々のベンチマークで以前のモデルを上回りました。
どちらの結果も、提案されたモデルの優れた機能を示しています。

要約(オリジナル)

Stereo matching is a key technique for metric depth estimation in computer vision and robotics. Real-world challenges like occlusion and non-texture hinder accurate disparity estimation from binocular matching cues. Recently, monocular relative depth estimation has shown remarkable generalization using vision foundation models. Thus, to facilitate robust stereo matching with monocular depth cues, we incorporate a robust monocular relative depth model into the recurrent stereo-matching framework, building a new framework for depth foundation model-based stereo-matching, DEFOM-Stereo. In the feature extraction stage, we construct the combined context and matching feature encoder by integrating features from conventional CNNs and DEFOM. In the update stage, we use the depth predicted by DEFOM to initialize the recurrent disparity and introduce a scale update module to refine the disparity at the correct scale. DEFOM-Stereo is verified to have comparable performance on the Scene Flow dataset with state-of-the-art (SOTA) methods and notably shows much stronger zero-shot generalization. Moreover, DEFOM-Stereo achieves SOTA performance on the KITTI 2012, KITTI 2015, Middlebury, and ETH3D benchmarks, ranking 1st on many metrics. In the joint evaluation under the robust vision challenge, our model simultaneously outperforms previous models on the individual benchmarks. Both results demonstrate the outstanding capabilities of the proposed model.

arxiv情報

著者	Hualie Jiang,Zhiqiang Lou,Laiyan Ding,Rui Xu,Minglang Tan,Wenjie Jiang,Rui Huang
発行日	2025-01-16 10:59:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー