Camera Height Doesn’t Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation

要約

この論文では、単眼深度ネットワークに絶対スケールを学習させ、通常のトレーニングデータ、つまり走行ビデオだけから道路シーンの深さのメートル単位を推定させる新しいトレーニング方法を紹介します。
このトレーニングフレームワークを StableCamH と呼びます。
重要なアイデアは、路上で見つかった車両をスケール監視のソースとして活用し、トレーニングにしっかりと組み込むことです。
StableCamH は、フレーム内の車のサイズを検出して推定し、そこから抽出したスケール情報をカメラの高さ推定に集約し、ビデオシーケンス全体にわたる一貫性がスケール監視として強制されます。
これにより、スケールを意識しない単眼深度ネットワークの堅牢な教師なしトレーニングが実現され、補助センサーや追加の監視を必要とせずに、スケールを認識するだけでなく、メトリック精度も向上します。
KITTI および Cityscapes データセットに関する広範な実験により、StableCamH の有効性と、関連する手法と比較したその最先端の精度が示されています。
また、StableCamH により、異なるカメラ高さの混合データセットでのトレーニングが可能になり、これにより大規模なトレーニングが可能になり、より高い一般化が可能になることも示します。
メトリック深度の再構成は、あらゆる道路シーンのビジュアルモデリングに不可欠であり、StableCamH は、メトリック深度推定器として任意のモデルをトレーニングする手段を確立することで、その展開を民主化します。

要約(オリジナル)

In this paper, we introduce a novel training method for making any monocular depth network learn absolute scale and estimate metric road-scene depth just from regular training data, i.e., driving videos. We refer to this training framework as StableCamH. The key idea is to leverage cars found on the road as sources of scale supervision but to incorporate them in the training robustly. StableCamH detects and estimates the sizes of cars in the frame and aggregates scale information extracted from them into a camera height estimate whose consistency across the entire video sequence is enforced as scale supervision. This realizes robust unsupervised training of any, otherwise scale-oblivious, monocular depth network to become not only scale-aware but also metric-accurate without the need for auxiliary sensors and extra supervision. Extensive experiments on the KITTI and Cityscapes datasets show the effectiveness of StableCamH and its state-of-the-art accuracy compared with related methods. We also show that StableCamH enables training on mixed datasets of different camera heights, which leads to larger-scale training and thus higher generalization. Metric depth reconstruction is essential in any road-scene visual modeling, and StableCamH democratizes its deployment by establishing the means to train any model as a metric depth estimator.

arxiv情報

著者	Genki Kinoshita,Ko Nishino
発行日	2024-03-20 09:12:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Camera Height Doesn’t Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー