Unifying Scene Representation and Hand-Eye Calibration with 3D Foundation Models

要約

環境を表現することはロボット工学における中心的な課題であり、効果的な意思決定には不可欠です。
従来、マニピュレータに取り付けられたカメラで画像をキャプチャする前に、ユーザーはチェッカーボードや AprilTag などの特定の外部マーカーを使用してカメラを調整する必要がありました。
しかし、コンピュータービジョンの最近の進歩により、 \emph{3D 基礎モデル} が開発されました。
これらは、豊富な視覚的特徴がない場合でも、非常に少ない画像で高速かつ正確なマルチビュー対応を確立できる、事前にトレーニングされた大規模なニューラルネットワークです。
この論文では、マニピュレータに取り付けられた RGB カメラを備えたロボットシステムのシーン表現アプローチに 3D 基礎モデルを統合することを提唱しています。
具体的には、Joint Calibration and Representation (JCR) 手法を提案します。
JCR は、マニピュレータに取り付けられたカメラでキャプチャされた RGB 画像を使用して、環境表現の構築と、特定のキャリブレーションマーカーがない場合のロボットのエンドエフェクタに対するカメラのキャリブレーションを同時に行います。
結果として得られる 3D 環境表現はロボットの座標フレームと位置合わせされ、物理的に正確なスケールを維持します。
事前のキャリブレーションなしで、マニピュレータに取り付けられた低コストの RGB カメラを使用して、JCR が効果的なシーン表現を構築できることを実証します。

要約(オリジナル)

Representing the environment is a central challenge in robotics, and is essential for effective decision-making. Traditionally, before capturing images with a manipulator-mounted camera, users need to calibrate the camera using a specific external marker, such as a checkerboard or AprilTag. However, recent advances in computer vision have led to the development of \emph{3D foundation models}. These are large, pre-trained neural networks that can establish fast and accurate multi-view correspondences with very few images, even in the absence of rich visual features. This paper advocates for the integration of 3D foundation models into scene representation approaches for robotic systems equipped with manipulator-mounted RGB cameras. Specifically, we propose the Joint Calibration and Representation (JCR) method. JCR uses RGB images, captured by a manipulator-mounted camera, to simultaneously construct an environmental representation and calibrate the camera relative to the robot’s end-effector, in the absence of specific calibration markers. The resulting 3D environment representation is aligned with the robot’s coordinate frame and maintains physically accurate scales. We demonstrate that JCR can build effective scene representations using a low-cost RGB camera attached to a manipulator, without prior calibration.

arxiv情報

著者	Weiming Zhi,Haozhan Tang,Tianyi Zhang,Matthew Johnson-Roberson
発行日	2024-04-17 18:29:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unifying Scene Representation and Hand-Eye Calibration with 3D Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー