PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

要約

このペーパーでは、トークン化された表現を使用して2D画像をジョイントアングルに直接マッピングすることにより、明示的なポーズ推定の必要性を排除するロボットハンドコントロールの新しいフレームワークであるPoselessを紹介します。
当社のアプローチは、ランダム化されたジョイント構成によって生成された合成トレーニングデータを活用し、実際のシナリオへのゼロショット一般化とロボットから人間の手への横断的転送を可能にします。
視覚入力をトークン化し、トランスベースのデコーダーを使用することにより、Poselessは、深さのあいまいさやデータ不足などの課題に対処しながら、堅牢で低遅延の制御を実現します。
実験結果は、人間で標識されたデータセットに依存することなく、関節角度予測精度の競争力を示しています。

要約(オリジナル)

This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using tokenized representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By tokenizing visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.

arxiv情報

著者	Alan Dao,Dinh Bach Vu,Tuan Le Duc Anh,Bui Quang Huy
発行日	2025-03-10 09:34:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー