MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model

要約

オブジェクトのポーズ推定は、ロボットが環境を理解し、相互作用するための中核的な手段です。
このタスクでは、単眼カテゴリレベルの方法は、単一のRGBカメラのみが必要なため、魅力的です。
ただし、現在の方法は、クラス内の既知のオブジェクトの形状前モデルまたはCADモデルに依存しています。
拡散ベースのモノクラーカテゴリレベル9Dオブジェクトポーズ生成方法、Monodiff9dを提案します。
私たちの動機は、拡散モデルの確率論的性質を活用して、クラス内の不明なオブジェクトの推定のための形状前、CADモデル、または深さセンサーの必要性を軽減することです。
最初に、単眼画像からdinov2を介して粗い深さをゼロショット方法で推定し、ポイントクラウドに変換します。
次に、ポイントクラウドのグローバルな特徴を入力画像と融合し、融合した特徴とエンコードされた時間ステップを使用して、monodiff9dを条件にします。
最後に、ガウスノイズからオブジェクトのポーズを回復するために、変圧器ベースの脱切り剤を設計します。
2つの一般的なベンチマークデータセットでの広範な実験は、Monodiff9Dが、どの段階でも形状前モデルまたはCADモデルを必要とせずに、最先端のモノクラーカテゴリレベルの9Dオブジェクトを達成することを示しています。
私たちのコードは、https：//github.com/cnjianliu/monodiff9dで公開されます。

要約(オリジナル)

Object pose estimation is a core means for robots to understand and interact with their environment. For this task, monocular category-level methods are attractive as they require only a single RGB camera. However, current methods rely on shape priors or CAD models of the intra-class known objects. We propose a diffusion-based monocular category-level 9D object pose generation method, MonoDiff9D. Our motivation is to leverage the probabilistic nature of diffusion models to alleviate the need for shape priors, CAD models, or depth sensors for intra-class unknown object pose estimation. We first estimate coarse depth via DINOv2 from the monocular image in a zero-shot manner and convert it into a point cloud. We then fuse the global features of the point cloud with the input image and use the fused features along with the encoded time step to condition MonoDiff9D. Finally, we design a transformer-based denoiser to recover the object pose from Gaussian noise. Extensive experiments on two popular benchmark datasets show that MonoDiff9D achieves state-of-the-art monocular category-level 9D object pose estimation accuracy without the need for shape priors or CAD models at any stage. Our code will be made public at https://github.com/CNJianLiu/MonoDiff9D.

arxiv情報

著者	Jian Liu,Wei Sun,Hui Yang,Jin Zheng,Zichen Geng,Hossein Rahmani,Ajmal Mian
発行日	2025-04-14 17:21:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー