Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

要約

カテゴリレベルのオブジェクトのポーズ推定は、事前に定義されたカテゴリから以前に見えなかったインスタンスの6Dポーズとサイズを予測することを目的としており、多様なオブジェクトインスタンス全体で強力な一般化が必要です。
多くの以前の方法は、クラス内のバリエーションを軽減しようとしますが、それらはしばしば複雑な幾何学または標準的な形状からの重要な逸脱を示すインスタンスと格闘しています。
この問題に対処するために、ローカルからグローバルへの幾何学的集合体を使用したインスタンス適応キーポイント学習を可能にする新しいカテゴリレベルのオブジェクトポーズ推定フレームワークであるINKLポーズを提案します。
具体的には、私たちの方法は、最初に、インスタンス適応キーポイント検出器を使用して、意味的に一貫した幾何学的に有益なキーポイントを予測し、次に洗練します。
MAMBAでの双方向モデリングを有効にするために、バックワード機能シーケンスを構築しながら空間コヒーレンスを保持するシンプルで効果的な機能シーケンスフリッピング戦略を導入します。
さらに、キーポイント分布の均一なカバレッジと空間的多様性を促進するために、表面損失と分離損失を設計します。
結果のキーポイントは、6Dポーズとサイズの回帰のための標準空間にマッピングされます。
Camera25、Real275、およびHouseCat6Dでの広範な実験は、INKLポーズが16.7Mパラメーターで最先端のパフォーマンスを達成し、NVIDIA RTX 4090D GPUで36 fpsで実行されることを示しています。

要約(オリジナル)

Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address this issue, we propose INKL-Pose, a novel category-level object pose estimation framework that enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Specifically, our method first predicts semantically consistent and geometrically informative keypoints using an Instance-Adaptive Keypoint Detector, then refines them: (1) a Local Keypoint Feature Aggregator capturing fine-grained geometries, and (2) a Global Keypoint Feature Aggregator using bidirectional Mamba for structural consistency. To enable bidirectional modeling in Mamba, we introduce a simple yet effective Feature Sequence Flipping strategy that preserves spatial coherence while constructing backward feature sequence. Additionally, we design a surface loss and a separation loss to encourage uniform coverage and spatial diversity in keypoint distribution. The resulting keypoints are mapped to a canonical space for 6D pose and size regression. Extensive experiments on CAMERA25, REAL275, and HouseCat6D show that INKL-Pose achieves state-of-the-art performance with 16.7M parameters and runs at 36 FPS on an NVIDIA RTX 4090D GPU.

arxiv情報

著者	Xiao Zhang,Lu Zou,Tao Lu,Yuan Yao,Zhangjin Huang,Guoping Wang
発行日	2025-06-18 13:21:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー