RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

要約

未知の関節角度を持つ多関節ロボットの視覚に基づく姿勢推定は、協調ロボット工学や人間とロボットの相互作用タスクに応用されている。現在のフレームワークでは、ニューラルネットワークエンコーダを用いて画像の特徴を抽出し、その下流層で関節角度とロボットの姿勢を予測している。ロボットの画像には本来、ロボットの物理的構造に関する豊富な情報が含まれているが、既存の手法ではそれを十分に活用できないことが多い。この問題に対処するために、マスキングベースの自己教師埋め込み予測アーキテクチャを用いて、ロボットの物理モデルに関する情報をエンコーダに融合させる手法であるRoboPEPPを紹介する。具体的には、ロボットの関節をマスキングし、周囲のマスキングされていない領域から関節の埋め込みを推測するエンコーダ予測モデルを事前に学習することで、ロボットの物理モデルに対するエンコーダの理解を強化する。そして、事前に訓練されたエンコーダと予測器のペアは、関節角度とキーポイント予測ネットワークとともに、ポーズと関節角度の推定のために微調整される。微調整中の入力のランダムマスキングと評価中のキーポイントフィルタリングにより、ロバスト性がさらに向上する。いくつかのデータセットで評価した我々の手法は、オクルージョンの影響を最も受けにくく、実行時間が最も短い一方で、ロボットの姿勢と関節角度の推定において最高の結果を達成した。

要約(オリジナル)

Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks. Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose. While images of robots inherently contain rich information about the robot’s physical structures, existing methods often fail to leverage it fully; therefore, limiting performance under occlusions and truncations. To address this, we introduce RoboPEPP, a method that fuses information about the robot’s physical model into the encoder using a masking-based self-supervised embedding-predictive architecture. Specifically, we mask the robot’s joints and pre-train an encoder-predictor model to infer the joints’ embeddings from surrounding unmasked regions, enhancing the encoder’s understanding of the robot’s physical model. The pre-trained encoder-predictor pair, along with joint angle and keypoint prediction networks, is then fine-tuned for pose and joint angle estimation. Random masking of input during fine-tuning and keypoint filtering during evaluation further improves robustness. Our method, evaluated on several datasets, achieves the best results in robot pose and joint angle estimation while being the least sensitive to occlusions and requiring the lowest execution time.

arxiv情報

著者	Raktim Gautam Goswami,Prashanth Krishnamurthy,Yann LeCun,Farshad Khorrami
発行日	2025-05-02 17:36:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー