Multi-Keypoint Affordance Representation for Functional Dexterous Grasping

要約

機能的な器用な握りには、正確な手観察相互作用が必要であり、単純な握りを超えています。
既存のアフォーダンスベースの方法は、主に粗い相互作用領域を予測し、把握姿勢を直接制約することはできず、視覚的知覚と操作の間の切断につながります。
この問題に対処するために、機能的な接点ポイントをローカライズすることにより、タスク駆動型の把握構成を直接エンコードする機能的な器用なグラズピンのマルチキーポイントアフォーダンス表現を提案します。
私たちの方法では、接触誘導マルチキーポイントアフォーダンス（CMKA）を導入し、微細なアフォーダンス特徴抽出のための大きな視覚モデルと組み合わせた弱い監督のための人間の把握体験画像を活用し、マニュアルキーポイント注釈を避けながら一般化を達成します。
さらに、キーポイントベースの把握マトリックス変換（KGT）メソッドを提示し、ハンドキーポイントとオブジェクトの接点間の空間的一貫性を確保し、視覚的知覚と器用なグラッピングアクションの間に直接的なリンクを提供します。
公共の実世界のFAHデータセット、Isaacgymシミュレーション、および挑戦的なロボットタスクに関する実験により、この方法により、アフォーダンスのローカリゼーションの精度、一貫性、目に見えないツールとタスクへの一般化が大幅に改善され、視覚的なアフォーダンス学習と器用なロボット操作の間のギャップが架かることが示されています。
ソースコードとデモビデオは、https://github.com/popeyepxx/mkaで公開されます。

要約(オリジナル)

Functional dexterous grasping requires precise hand-object interaction, going beyond simple gripping. Existing affordance-based methods primarily predict coarse interaction regions and cannot directly constrain the grasping posture, leading to a disconnection between visual perception and manipulation. To address this issue, we propose a multi-keypoint affordance representation for functional dexterous grasping, which directly encodes task-driven grasp configurations by localizing functional contact points. Our method introduces Contact-guided Multi-Keypoint Affordance (CMKA), leveraging human grasping experience images for weak supervision combined with Large Vision Models for fine affordance feature extraction, achieving generalization while avoiding manual keypoint annotations. Additionally, we present a Keypoint-based Grasp matrix Transformation (KGT) method, ensuring spatial consistency between hand keypoints and object contact points, thus providing a direct link between visual perception and dexterous grasping actions. Experiments on public real-world FAH datasets, IsaacGym simulation, and challenging robotic tasks demonstrate that our method significantly improves affordance localization accuracy, grasp consistency, and generalization to unseen tools and tasks, bridging the gap between visual affordance learning and dexterous robotic manipulation. The source code and demo videos will be publicly available at https://github.com/PopeyePxx/MKA.

arxiv情報

著者	Fan Yang,Dongsheng Luo,Wenrui Chen,Jiacheng Lin,Junjie Cai,Kailun Yang,Zhiyong Li,Yaonan Wang
発行日	2025-02-27 11:54:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Keypoint Affordance Representation for Functional Dexterous Grasping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー