3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video

要約

深度と自我運動の推定は、自律型ロボットの定位やナビゲーション、自律走行に不可欠である。近年の研究により、ラベル付けされていない単眼映像から画素ごとの深度と自車両の動きを学習することが可能となった。明示的な3Dジオメトリを用いた3D階層的な洗練と拡張を用いた新しい教師なし学習フレームワークを提案する。このフレームワークでは、深度推定と姿勢推定が階層的かつ相互に結合され、推定された姿勢を層ごとに洗練させる。推定された深度と粗い姿勢を持つ画像中のピクセルをワープさせることにより、中間ビュー画像を提案し、合成する。そして、新しいビュー画像と隣接するフレームの画像から残留姿勢変換を推定し、粗い姿勢を洗練させることができる。本論文では、反復精緻化を微分可能な方法で実装することで、フレームワーク全体を均一に最適化する。一方、新しい画像補強法は、新しいビュー画像を合成することにより、3次元空間における姿勢を創造的に補強するが、新しい補強された2次元画像を得ることにより、姿勢推定のために提案される。KITTIでの実験では、我々の深度推定が最先端の性能を達成し、他の補助的なタスクを利用する最近のアプローチをも凌駕することを実証した。また、教師なし単眼学習ベースの手法を上回り、ジオメトリベースの手法であるORB-SLAM2のバックエンド最適化に匹敵する性能を達成しています。

要約(オリジナル)

Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. A novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized uniformly. Meanwhile, a new image augmentation method is proposed for the pose estimation by synthesizing a new view image, which creatively augments the pose in 3D space but gets a new augmented 2D image. The experiments on KITTI demonstrate that our depth estimation achieves state-of-the-art performance and even surpasses recent approaches that utilize other auxiliary tasks. Our visual odometry outperforms all recent unsupervised monocular learning-based methods and achieves competitive performance to the geometry-based method, ORB-SLAM2 with back-end optimization.

arxiv情報

著者	Guangming Wang,Jiquan Zhong,Shijie Zhao,Wenhua Wu,Zhe Liu,Hesheng Wang
発行日	2022-06-08 04:42:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー