Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

要約

根底にある幾何学的原始として、3D指向のボックスを使用してローカリゼーションとマッピングの両方が可能なオブジェクト中心のフレームワークの出力としてシーンレベルの3Dオブジェクト検出を再検討します。
既存の3Dオブジェクト検出アプローチはグローバルに動作し、メトリックカメラポーズの先験的存在に暗黙的に依存していますが、私たちの方法は、動きからの部屋（RFM）は、未配置の画像のコレクションで動作します。
構造からの標準の2Dキーポイントベースのマッチャーを、画像由来の3Dボックスに基づいてオブジェクト中心のマッチャーに置き換えることにより、メトリックカメラのポーズ、オブジェクトトラックを推定し、最後にグローバルなセマンティック3Dオブジェクトマップを作成します。
アプリオリのポーズが利用可能な場合、個々の観測に対するグローバル3Dボックスの最適化を通じて、マップの品質を大幅に改善できます。
RFMは強力なローカリゼーションパフォーマンスを示し、その後、これらのグローバルな方法がポイントクラウドまたは高密度のボリュームを介したオーバーパラメーター化に依存しているにもかかわらず、CA-1MおよびScannet ++の主要なポイントベースおよびマルチビュー3Dオブジェクト検出方法よりも高品質のマップを生成します。
モーションからの部屋は、キュービーを完全なシーンに拡張するだけでなく、シーン内のオブジェクトの数に比例した本質的にまばらなローカリゼーションとパラメトリックマッピングを可能にする一般的なオブジェクト中心の表現を実現します。

要約(オリジナル)

We revisit scene-level 3D object detection as the output of an object-centric framework capable of both localization and mapping using 3D oriented boxes as the underlying geometric primitive. While existing 3D object detection approaches operate globally and implicitly rely on the a priori existence of metric camera poses, our method, Rooms from Motion (RfM) operates on a collection of un-posed images. By replacing the standard 2D keypoint-based matcher of structure-from-motion with an object-centric matcher based on image-derived 3D boxes, we estimate metric camera poses, object tracks, and finally produce a global, semantic 3D object map. When a priori pose is available, we can significantly improve map quality through optimization of global 3D boxes against individual observations. RfM shows strong localization performance and subsequently produces maps of higher quality than leading point-based and multi-view 3D object detection methods on CA-1M and ScanNet++, despite these global methods relying on overparameterization through point clouds or dense volumes. Rooms from Motion achieves a general, object-centric representation which not only extends the work of Cubify Anything to full scenes but also allows for inherently sparse localization and parametric mapping proportional to the number of objects in a scene.

arxiv情報

著者	Justin Lazarow,Kai Kang,Afshin Dehghan
発行日	2025-05-29 17:59:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー