Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

要約

画像ガイドによるオブジェクトのアセンブリは、コンピュータービジョンにおける急成長中の研究トピックです。
この論文では、構造 3D モデル (たとえば、3D オブジェクトライブラリから描画されたビルディングブロックで構築されたモデル) のマルチビュー画像を、ロボットアームによって実行可能な一連の詳細な組み立て命令に変換するという新しいタスクを紹介します。
レプリケーション用のターゲット 3D モデルのマルチビュー画像を入力すると、このタスク用に設計されたモデルは、3D モデルの構築に使用される個々のコンポーネントの認識、各コンポーネントの幾何学的姿勢の推定、実行可能なモデルの推定など、いくつかのサブタスクに対処する必要があります。
物理的なルールに従った組み立て順序。
マルチビュー画像と 3D オブジェクトの間の正確な 2D-3D 対応関係を確立することは、技術的に困難です。
これに取り組むために、ニューラルアセンブラーとして知られるエンドツーエンドモデルを提案します。
このモデルは、各頂点が画像から認識されたコンポーネントを表し、エッジが 3D モデルのトポロジーを指定するオブジェクトグラフを学習して、組み立て計画の導出を可能にします。
私たちはこのタスクのベンチマークを確立し、ニューラルアセンブラーと代替ソリューションの包括的な実証的評価を実施します。
私たちの実験は、Neural Assembler の優位性を明確に示しています。

要約(オリジナル)

Image-guided object assembly represents a burgeoning research topic in computer vision. This paper introduces a novel task: translating multi-view images of a structural 3D model (for example, one constructed with building blocks drawn from a 3D-object library) into a detailed sequence of assembly instructions executable by a robotic arm. Fed with multi-view images of the target 3D model for replication, the model designed for this task must address several sub-tasks, including recognizing individual components used in constructing the 3D model, estimating the geometric pose of each component, and deducing a feasible assembly order adhering to physical rules. Establishing accurate 2D-3D correspondence between multi-view images and 3D objects is technically challenging. To tackle this, we propose an end-to-end model known as the Neural Assembler. This model learns an object graph where each vertex represents recognized components from the images, and the edges specify the topology of the 3D model, enabling the derivation of an assembly plan. We establish benchmarks for this task and conduct comprehensive empirical evaluations of Neural Assembler and alternative solutions. Our experiments clearly demonstrate the superiority of Neural Assembler.

arxiv情報

著者	Hongyu Yan,Yadong Mu
発行日	2024-04-25 08:53:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー