Locate n’ Rotate: Two-stage Openable Part Detection with Foundation Model Priors

要約

多関節オブジェクトの開閉可能な部分を検出することは、引き出しを引き出すなどのインテリジェントロボット工学の下流アプリケーションにとって非常に重要です。
このタスクでは、オブジェクトのカテゴリとモーションを理解する必要があるため、マルチタスクの課題が生じます。
既存の手法のほとんどはカテゴリ固有であるか、特定のデータセットでトレーニングされており、目に見えない環境やオブジェクトへの一般化が欠けています。
この論文では、知覚グループ化と幾何学的事前分布を組み込んだ、以前の方法を上回るパフォーマンスを実現する、多機能開閉可能部品検出 (MOPD) と呼ばれるトランスフォーマーベースの開閉可能部品検出 (OPD) フレームワークを提案します。
フレームワークの最初の段階では、開閉可能な部品の検出に知覚グループ化特徴事前分布を提供する知覚グループ化特徴モデルを導入し、クロスアテンションメカニズムを通じて検出結果を強化します。
第 2 段階では、幾何学的理解特徴モデルが、運動パラメータを予測するための幾何学的特徴事前分布を提供します。
既存の方法と比較して、私たちが提案したアプローチは、検出と動きパラメータ予測の両方で優れたパフォーマンスを示します。
コードとモデルは https://github.com/lisiqi-zju/MOPD で公開されています。

要約(オリジナル)

Detecting the openable parts of articulated objects is crucial for downstream applications in intelligent robotics, such as pulling a drawer. This task poses a multitasking challenge due to the necessity of understanding object categories and motion. Most existing methods are either category-specific or trained on specific datasets, lacking generalization to unseen environments and objects. In this paper, we propose a Transformer-based Openable Part Detection (OPD) framework named Multi-feature Openable Part Detection (MOPD) that incorporates perceptual grouping and geometric priors, outperforming previous methods in performance. In the first stage of the framework, we introduce a perceptual grouping feature model that provides perceptual grouping feature priors for openable part detection, enhancing detection results through a cross-attention mechanism. In the second stage, a geometric understanding feature model offers geometric feature priors for predicting motion parameters. Compared to existing methods, our proposed approach shows better performance in both detection and motion parameter prediction. Codes and models are publicly available at https://github.com/lisiqi-zju/MOPD

arxiv情報

著者	Siqi Li,Xiaoxue Chen,Haoyu Cheng,Guyue Zhou,Hao Zhao,Guanzhong Tian
発行日	2024-12-17 18:52:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Locate n’ Rotate: Two-stage Openable Part Detection with Foundation Model Priors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー