VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

要約

単眼 3D セマンティック占有予測は、単一の RGB カメラを使用するコンパクトさのため、ロボットビジョンにおいて重要になってきています。
ただし、既存の方法ではカメラの遠近ジオメトリが適切に考慮されていないことが多く、その結果、画像の深度範囲に沿って情報の不均衡が生じます。
この問題に対処するために、VPOcc という名前の消失点 (VP) ガイドによる単眼 3D セマンティック占有予測フレームワークを提案します。
私たちのフレームワークは、VP を利用した 3 つの新しいモジュールで構成されています。
まず、VPZoomer モジュールでは、最初に特徴抽出に VP を利用し、VP に基づいてズームイン画像を生成することで、シーン全体で情報のバランスが取れた特徴抽出を実現します。
2 番目に、VP ガイド付きクロスアテンション (VPCA) モジュールを使用して VP に向けてポイントをサンプリングすることにより、パースペクティブジオメトリを意識した特徴集約を実行します。
最後に、バランスのとれた特徴ボリュームフュージョン (BVFV) モジュールを使用して、元のボクセル特徴ボリュームとズームインボクセル特徴ボリュームを効果的に融合することにより、情報のバランスがとれた特徴ボリュームを作成します。
実験では、私たちの方法が SemanticKITTI と SSCBench-KITTI360 で IoU と mIoU の両方で最先端のパフォーマンスを達成することを示しています。
これらの結果は、VP の利用を通じて画像内の情報の不均衡に効果的に対処することによって得られます。
私たちのコードは www.github.com/anonymous で入手できます。

要約(オリジナル)

Monocular 3D semantic occupancy prediction is becoming important in robot vision due to the compactness of using a single RGB camera. However, existing methods often do not adequately account for camera perspective geometry, resulting in information imbalance along the depth range of the image. To address this issue, we propose a vanishing point (VP) guided monocular 3D semantic occupancy prediction framework named VPOcc. Our framework consists of three novel modules utilizing VP. First, in the VPZoomer module, we initially utilize VP in feature extraction to achieve information balanced feature extraction across the scene by generating a zoom-in image based on VP. Second, we perform perspective geometry-aware feature aggregation by sampling points towards VP using a VP-guided cross-attention (VPCA) module. Finally, we create an information-balanced feature volume by effectively fusing original and zoom-in voxel feature volumes with a balanced feature volume fusion (BVFV) module. Experiments demonstrate that our method achieves state-of-the-art performance for both IoU and mIoU on SemanticKITTI and SSCBench-KITTI360. These results are obtained by effectively addressing the information imbalance in images through the utilization of VP. Our code will be available at www.github.com/anonymous.

arxiv情報

著者	Junsu Kim,Junhee Lee,Ukcheol Shin,Jean Oh,Kyungdon Joo
発行日	2024-08-07 05:23:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー