RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

要約

3D占有予測により、ロボットは周囲のシーンの空間的な細かいジオメトリとセマンティクスを取得することができ、具体化された知覚に不可欠なタスクになりました。
密なボクセルの代わりに3Dガウスに基づく既存の方法は、ガウスの幾何学と不透明な特性を効果的に活用することはなく、複雑な環境のネットワークの推定を制限し、3Dガウスによるシーンの説明を制限します。
この論文では、ロボットと呼ばれるロボットの幾何学的およびセマンティックシーンの理解を高める3D占有予測法を提案します。
それは、不透明なガイド付きセルフエンコーダー（OSE）を利用して、周囲のシーンの微細な幾何学的モデリングを達成するために、重複するガウスと幾何学的認識クロスエンコーダー（GCE）の意味的な曖昧さを軽減します。
OCC-ScannetおよびEmpodiedOcc-Scannetデータセットで広範な実験を実施し、ROBOOCCはローカルおよびグローバルカメラ設定の両方で最先端のパフォーマンスを実現しています。
さらに、ガウスパラメーターのアブレーション研究では、提案されたROBOOCCは、IouおよびMiouメトリックのそれぞれ（8.47、6.27）の大きなマージン（8.47、6.27）の大きなマージンで最先端の方法よりも優れています。
コードはまもなくリリースされます。

要約(オリジナル)

3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network’s estimation of complex environments and also limits the description of the scene by 3D Gaussians. In this paper, we propose a 3D occupancy prediction method which enhances the geometric and semantic scene understanding for robots, dubbed RoboOcc. It utilizes the Opacity-guided Self-Encoder (OSE) to alleviate the semantic ambiguity of overlapping Gaussians and the Geometry-aware Cross-Encoder (GCE) to accomplish the fine-grained geometric modeling of the surrounding scene. We conduct extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet datasets, and our RoboOcc achieves state-of the-art performance in both local and global camera settings. Further, in ablation studies of Gaussian parameters, the proposed RoboOcc outperforms the state-of-the-art methods by a large margin of (8.47, 6.27) in IoU and mIoU metric, respectively. The codes will be released soon.

arxiv情報

著者	Zhang Zhang,Qiang Zhang,Wei Cui,Shuai Shi,Yijie Guo,Gang Han,Wen Zhao,Hengle Ren,Renjing Xu,Jian Tang
発行日	2025-04-20 13:06:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー