Camera Height Doesn’t Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation

要約

この論文では、単眼深度ネットワークに絶対スケールを学習させ、通常のトレーニングデータ、つまり走行ビデオだけから道路シーンの深さのメートル単位を推定させる新しいトレーニング方法を紹介します。
私たちはこのトレーニングフレームワークを FUMET と呼んでいます。
重要なアイデアは、路上で見つかった車両をスケール監視のソースとして活用し、それらをネットワークトレーニングに強力に組み込むことです。
FUMET は、フレーム内の車のサイズを検出して推定し、そこから抽出したスケール情報をカメラの高さの推定値に集約します。ビデオシーケンス全体にわたるその一貫性は、スケールの監視として強制されます。
これにより、スケールを意識しない単眼深度ネットワークの堅牢な教師なしトレーニングが実現され、補助センサーや追加の監視を必要とせずにスケールを意識するだけでなく、メトリック精度も向上します。
KITTI と Cityscapes データセットに関する広範な実験により、最先端の精度を実現する FUMET の有効性が示されています。
また、FUMET により、異なるカメラ高さの混合データセットでのトレーニングが可能になり、大規模なトレーニングとより優れた一般化につながることも示します。
メトリック深度の再構成は、あらゆる道路シーンのビジュアルモデリングに不可欠であり、FUMET は、あらゆるモデルをメトリック深度推定器に変換する手段を確立することで、その展開を民主化します。

要約(オリジナル)

In this paper, we introduce a novel training method for making any monocular depth network learn absolute scale and estimate metric road-scene depth just from regular training data, i.e., driving videos. We refer to this training framework as FUMET. The key idea is to leverage cars found on the road as sources of scale supervision and to incorporate them in network training robustly. FUMET detects and estimates the sizes of cars in a frame and aggregates scale information extracted from them into an estimate of the camera height whose consistency across the entire video sequence is enforced as scale supervision. This realizes robust unsupervised training of any, otherwise scale-oblivious, monocular depth network so that they become not only scale-aware but also metric-accurate without the need for auxiliary sensors and extra supervision. Extensive experiments on the KITTI and the Cityscapes datasets show the effectiveness of FUMET, which achieves state-of-the-art accuracy. We also show that FUMET enables training on mixed datasets of different camera heights, which leads to larger-scale training and better generalization. Metric depth reconstruction is essential in any road-scene visual modeling, and FUMET democratizes its deployment by establishing the means to convert any model into a metric depth estimator.

arxiv情報

著者	Genki Kinoshita,Ko Nishino
発行日	2024-10-01 16:12:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Camera Height Doesn’t Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー