PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation

要約

RGB画像からの新しいオブジェクトのポーズ推定は、ゼロショット汎化において重要な課題を提示する。これは、RGB観察画像と、学習中に見られなかったオブジェクトのCADモデルとの間の相対的な6次元変換を推定する必要があるためである。本論文では、PicoPoseを紹介する。PicoPoseは、3段階のピクセル間対応学習プロセスを用いて、この課題に取り組むために設計された新しいフレームワークである。まず、PicoPoseは、RGBの観測データから得られた特徴量と、レンダリングされたオブジェクトのテンプレートから得られた特徴量をマッチングさせ、最もマッチしたテンプレートを特定し、粗い対応関係を確立する。次に、PicoPoseは粗対応マップから、面内回転、スケール、2次元平行移動を含む2次元アフィン変換を大域的に回帰することにより、対応関係を滑らかにします。第三に、PicoPoseはアフィン変換をベストマッチテンプレートの特徴マップに適用し、局所領域内の対応オフセットを学習することで、きめ細かい対応を実現する。対応関係を段階的に精緻化することで、PicoPoseはPnP/RANSACで計算された物体姿勢の精度を大幅に向上させます。PicoPoseは、BOPベンチマークの7つのコアデータセットで最先端の性能を達成し、CADモデルや物体参照画像で表現された新しい物体への卓越した汎用性を示す。コードとモデルはhttps://github.com/foollh/PicoPose。

要約(オリジナル)

Novel object pose estimation from RGB images presents a significant challenge for zero-shot generalization, as it involves estimating the relative 6D transformation between an RGB observation and a CAD model of an object that was not seen during training. In this paper, we introduce PicoPose, a novel framework designed to tackle this task using a three-stage pixel-to-pixel correspondence learning process. Firstly, PicoPose matches features from the RGB observation with those from rendered object templates, identifying the best-matched template and establishing coarse correspondences. Secondly, PicoPose smooths the correspondences by globally regressing a 2D affine transformation, including in-plane rotation, scale, and 2D translation, from the coarse correspondence map. Thirdly, PicoPose applies the affine transformation to the feature map of the best-matched template and learns correspondence offsets within local regions to achieve fine-grained correspondences. By progressively refining the correspondences, PicoPose significantly improves the accuracy of object poses computed via PnP/RANSAC. PicoPose achieves state-of-the-art performance on the seven core datasets of the BOP benchmark, demonstrating exceptional generalization to novel objects represented by CAD models or object reference images. Code and models are available at https://github.com/foollh/PicoPose.

arxiv情報

著者	Lihua Liu,Jiehong Lin,Zhenxin Liu,Kui Jia
発行日	2025-04-03 14:16:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー