Scene as Occupancy

要約

人間のドライバーは、複雑な交通状況を視覚システムによって容易に表現することができます。このような精密な知覚能力は、ドライバーのプランニングに不可欠である。これを実現するためには、物理的な3次元シーンを、セルごとに意味的なラベルを持つ構造化されたグリッドマップに量子化する、幾何学に配慮した表現が望まれる（3D Occupancyと呼ばれる）。バウンディングボックスの形式と比較して、占有率の背後にある重要な洞察は、シーン内の重要な障害物のきめ細かい詳細を捕捉し、それによって後続のタスクを促進することができるということである。先行文献や同時進行の文献は、主に単一のシーン補完タスクに集中しており、この占有表現の可能性は、より広範な影響に執着する可能性があると主張することができる。本論文では、3次元占有率を再構成するためのカスケードデコーダと時間ボクセルデコーダを備えた、マルチビュービジョン中心のパイプラインであるOccNetを提案します。OccNetの核となるのは、3次元物理世界を表現するための一般的な占有率埋め込みである。このような記述子は、検出、セグメンテーション、プランニングなど、幅広いドライビングタスクに適用することができる。この新しい表現と我々の提案するアルゴリズムの有効性を検証するために、我々はnuScenesの上に構築された最初の高密度高品質3D占有ベンチマークであるOpenOccを提案する。実証実験の結果、複数のタスクにおいて明らかな性能向上が見られ、例えば、運動計画では衝突率が15%～58%減少することが確認され、我々の手法の優位性が実証された。

要約(オリジナル)

Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver’s planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occupancy is that it could capture the fine-grained details of critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior or concurrent literature mainly concentrate on a single scene completion task, where we might argue that the potential of this occupancy representation might obsess broader impact. In this paper, we propose OccNet, a multi-view vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. At the core of OccNet is a general occupancy embedding to represent 3D physical world. Such a descriptor could be applied towards a wide span of driving tasks, including detection, segmentation and planning. To validate the effectiveness of this new representation and our proposed algorithm, we propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes. Empirical experiments show that there are evident performance gain across multiple tasks, e.g., motion planning could witness a collision rate reduction by 15%-58%, demonstrating the superiority of our method.

arxiv情報

著者	Wenwen Tong,Chonghao Sima,Tai Wang,Silei Wu,Hanming Deng,Li Chen,Yi Gu,Lewei Lu,Ping Luo,Dahua Lin,Hongyang Li
発行日	2023-06-05 13:01:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Scene as Occupancy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー