Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

要約

ビジョンベースの3D占有予測のタスクは、3Dジオメトリを再構築し、2Dから3Dへのビュー変換が不可欠なステップである2Dカラー画像からセマンティッククラスを推定することを目的としています。
以前のほとんどの方法では、2D画像機能を3DグリッドにマッピングするBevpoolingやVoxelpoolingなど、前方投影を行います。
ただし、特定の高さ範囲内の機能を表す現在のグリッドは、通常、他の高さ範囲に属する多くの混乱する機能を導入します。
この課題に対処するために、紛らわしい機能を除外する前に明示的な高さを組み込んだ新しいフレームワークである深い高さ分離（DHD）を提示します。
具体的には、DHDは最初に明示的な監督を介して高さマップを予測します。
高さ分布統計に基づいて、DHDはマスクガイドハイトサンプリング（MGHS）を設計して、高さマップを複数のバイナリマスクに適応的に分離します。
MGHSは、2D画像機能を複数のサブスペースに投影します。各グリッドには、妥当な高さ範囲内の機能が含まれています。
最後に、相乗的特徴集約（SFA）モジュールが展開され、チャネルと空間的親和性を介して特徴表現を強化し、さらに占有率の洗練を可能にします。
人気のあるOcc3D-Nuscenesベンチマークでは、入力フレームが最小限であっても、最先端のパフォーマンスを実現します。
ソースコードはhttps://github.com/yanzq95/dhdでリリースされます。

要約(オリジナル)

The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a certain height range usually introduces many confusing features that belong to other height ranges. To address this challenge, we present Deep Height Decoupling (DHD), a novel framework that incorporates explicit height prior to filter out the confusing features. Specifically, DHD first predicts height maps via explicit supervision. Based on the height distribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to adaptively decouple the height map into multiple binary masks. MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges. Finally, a Synergistic Feature Aggregation (SFA) module is deployed to enhance the feature representation through channel and spatial affinities, enabling further occupancy refinement. On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art performance even with minimal input frames. Source code is released at https://github.com/yanzq95/DHD.

arxiv情報

著者	Yuan Wu,Zhiqiang Yan,Zhengxue Wang,Xiang Li,Le Hui,Jian Yang
発行日	2025-02-06 12:30:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー