D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

要約

シーンの表現は、ロボット操作システムにおける重要な設計上の選択です。
理想的な表現は、多様な操作タスクの要求を満たすために、3D、動的、セマンティックである必要があります。
ただし、以前の作品には 3 つの特性がすべて同時に欠けていることがよくあります。
この作業では、D$^3$Fields (動的 3D 記述子フィールド) を導入します。
これらのフィールドは、基礎となる 3D 環境のダイナミクスをキャプチャし、セマンティック機能とインスタンスマスクの両方をエンコードします。
具体的には、ワークスペース内の任意の 3D 点を多視点の 2D 視覚観察に投影し、基礎モデルから得られたフィーチャを補間します。
結果として得られる融合記述子フィールドにより、さまざまなコンテキスト、スタイル、インスタンスを持つ 2D 画像を使用した柔軟な目標仕様が可能になります。
これらの記述子フィールドの有効性を評価するために、私たちの表現をゼロショット方式で幅広いロボット操作タスクに適用します。
現実世界のシナリオとシミュレーションの両方での広範な評価を通じて、D$^3$Fields が一般化可能であり、ゼロショットのロボット操作タスクに対して効果的であることを実証します。
Dense Object Nets や DINO などの最先端の高密度記述子との定量的な比較では、D$^3$Fields は大幅に優れた汎化能力と操作精度を示します。

要約(オリジナル)

Scene representation has been a crucial design choice in robotic manipulation systems. An ideal representation should be 3D, dynamic, and semantic to meet the demands of diverse manipulation tasks. However, previous works often lack all three properties simultaneously. In this work, we introduce D$^3$Fields – dynamic 3D descriptor fields. These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks. Specifically, we project arbitrary 3D points in the workspace onto multi-view 2D visual observations and interpolate features derived from foundational models. The resulting fused descriptor fields allow for flexible goal specifications using 2D images with varied contexts, styles, and instances. To evaluate the effectiveness of these descriptor fields, we apply our representation to a wide range of robotic manipulation tasks in a zero-shot manner. Through extensive evaluation in both real-world scenarios and simulations, we demonstrate that D$^3$Fields are both generalizable and effective for zero-shot robotic manipulation tasks. In quantitative comparisons with state-of-the-art dense descriptors, such as Dense Object Nets and DINO, D$^3$Fields exhibit significantly better generalization abilities and manipulation accuracy.

arxiv情報

著者	Yixuan Wang,Zhuoran Li,Mingtong Zhang,Katherine Driggs-Campbell,Jiajun Wu,Li Fei-Fei,Yunzhu Li
発行日	2023-09-28 02:50:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー