Reconstructing Hands in 3D with Transformers

要約

単眼入力から手を 3D で再構築できるアプローチを紹介します。
ハンドメッシュ回復に対する当社のアプローチである HaMeR は、完全にトランスベースのアーキテクチャに従っており、以前の研究と比較して精度と堅牢性が大幅に向上して手を分析できます。
HaMeR の成功の鍵は、トレーニングに使用されるデータと手を再構成するためのディープネットワークの容量の両方をスケールアップすることにあります。
トレーニングデータの場合、2D または 3D の手のアノテーションを含む複数のデータセットを組み合わせます。
ディープモデルには、大規模な Vision Transformer アーキテクチャを使用します。
私たちの最終モデルは、一般的な 3D 手のポーズのベンチマークで以前のベースラインを常に上回っています。
制御されていない設定での設計の効果をさらに評価するために、既存の自然環境のデータセットに 2D ハンドキーポイントアノテーションを付けます。
この新しく収集されたアノテーションのデータセット HInt では、既存のベースラインに比べて大幅な改善が見られることが実証されています。
コード、データ、モデルはプロジェクト Web サイト (https://geopavlakos.github.io/hamer/) で公開しています。

要約(オリジナル)

We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR’s success lies in scaling up both the data used for training and the capacity of the deep network for hand reconstruction. For training data, we combine multiple datasets that contain 2D or 3D hand annotations. For the deep model, we use a large scale Vision Transformer architecture. Our final model consistently outperforms the previous baselines on popular 3D hand pose benchmarks. To further evaluate the effect of our design in non-controlled settings, we annotate existing in-the-wild datasets with 2D hand keypoint annotations. On this newly collected dataset of annotations, HInt, we demonstrate significant improvements over existing baselines. We make our code, data and models available on the project website: https://geopavlakos.github.io/hamer/.

arxiv情報

著者	Georgios Pavlakos,Dandan Shan,Ilija Radosavovic,Angjoo Kanazawa,David Fouhey,Jitendra Malik
発行日	2023-12-08 18:59:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reconstructing Hands in 3D with Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー