Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders

要約

単眼のRGB画像からのハンドオブジェクトのポーズ推定は、主に手観書の相互作用に固有の重度の閉塞のために、重要な課題のままです。
既存の方法では、グローバルな構造的認識と推論を十分に調査していないため、閉塞された手観察相互作用の処理における有効性が制限されます。
この課題に対処するために、ホマエと呼ばれるマスクされた自動エンコーダーに基づいて、閉塞を意識したハンドオブジェクトのポーズ推定方法を提案します。
具体的には、手観察の相互作用の領域に構造化された閉塞を課すターゲット中心のマスキング戦略を提案し、モデルが閉塞した構造に関するコンテキスト認識の特徴と理由を学習するよう促します。
さらに、デコーダーから抽出されたマルチスケール機能を統合して、署名された距離フィールド（SDF）を予測し、グローバルコンテキストと細かいジオメトリの両方をキャプチャします。
幾何学的知覚を高めるために、暗黙のSDFとSDFから派生した明示的なポイントクラウドを組み合わせて、両方の表現の相補的強度を活用します。
この融合により、SDFのグローバルなコンテキストとポイントクラウドが提供する正確なローカルジオメトリを組み合わせることにより、閉塞領域のより堅牢な取り扱いが可能になります。
挑戦的なDexyCBおよびHO3DV2ベンチマークに関する広範な実験は、Homaeが手観オブジェクトのポーズ推定で最先端のパフォーマンスを達成することを示しています。
コードとモデルをリリースします。

要約(オリジナル)

Hand-object pose estimation from monocular RGB images remains a significant challenge mainly due to the severe occlusions inherent in hand-object interactions. Existing methods do not sufficiently explore global structural perception and reasoning, which limits their effectiveness in handling occluded hand-object interactions. To address this challenge, we propose an occlusion-aware hand-object pose estimation method based on masked autoencoders, termed as HOMAE. Specifically, we propose a target-focused masking strategy that imposes structured occlusion on regions of hand-object interaction, encouraging the model to learn context-aware features and reason about the occluded structures. We further integrate multi-scale features extracted from the decoder to predict a signed distance field (SDF), capturing both global context and fine-grained geometry. To enhance geometric perception, we combine the implicit SDF with an explicit point cloud derived from the SDF, leveraging the complementary strengths of both representations. This fusion enables more robust handling of occluded regions by combining the global context from the SDF with the precise local geometry provided by the point cloud. Extensive experiments on challenging DexYCB and HO3Dv2 benchmarks demonstrate that HOMAE achieves state-of-the-art performance in hand-object pose estimation. We will release our code and model.

arxiv情報

著者	Hui Yang,Wei Sun,Jian Liu,Jin Zheng,Jian Xiao,Ajmal Mian
発行日	2025-06-12 15:30:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー