MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation

要約

正確なジオメトリと豊富なセマンティクスを組み合わせると、言語ガイドによるロボット操作に非常に効果的であることが証明されています。
動的シーンの既存の方法は、リアルタイムでの更新に失敗するか、単純なシーン編集のために追加の深度センサーに依存するため、現実世界への適用性が制限されます。
この論文では、高品質の再構成のために 2D ガウスのコレクションを使用する表現である MSGField を紹介します。これは、セマンティック情報とモーション情報をエンコードするための属性でさらに強化されています。
特に、各プリミティブのモーションを限られたモーションベースの組み合わせに分解することで、モーションフィールドをコンパクトに表現します。
ガウススプラッティングの微分可能なリアルタイムレンダリングを活用することで、2 つのカメラビューのみからの画像監視により、複雑な非剛体の動きであっても、オブジェクトの動きを迅速に最適化できます。
さらに、オブジェクト事前分布を利用して、明確に定義されたセマンティクスを効率的に取得するパイプラインを設計しました。
柔軟で非常に小さなオブジェクトを含む当社の困難なデータセットにおいて、当社のメソッドは、言語ガイドによる操作の静的環境で 79.2%、動的環境で 63.3% の成功率を達成しました。
指定物体の把握においては、点群ベースの手法と同等の成功率90%を達成しています。
コードとデータセットは https://shengyu724.github.io/MSGField.github.io でリリースされます。

要約(オリジナル)

Combining accurate geometry with rich semantics has been proven to be highly effective for language-guided robotic manipulation. Existing methods for dynamic scenes either fail to update in real-time or rely on additional depth sensors for simple scene editing, limiting their applicability in real-world. In this paper, we introduce MSGField, a representation that uses a collection of 2D Gaussians for high-quality reconstruction, further enhanced with attributes to encode semantic and motion information. Specially, we represent the motion field compactly by decomposing each primitive’s motion into a combination of a limited set of motion bases. Leveraging the differentiable real-time rendering of Gaussian splatting, we can quickly optimize object motion, even for complex non-rigid motions, with image supervision from only two camera views. Additionally, we designed a pipeline that utilizes object priors to efficiently obtain well-defined semantics. In our challenging dataset, which includes flexible and extremely small objects, our method achieve a success rate of 79.2% in static and 63.3% in dynamic environments for language-guided manipulation. For specified object grasping, we achieve a success rate of 90%, on par with point cloud-based methods. Code and dataset will be released at:https://shengyu724.github.io/MSGField.github.io.

arxiv情報

著者	Yu Sheng,Runfeng Lin,Lidian Wang,Quecheng Qiu,YanYong Zhang,Yu Zhang,Bei Hua,Jianmin Ji
発行日	2024-10-21 07:46:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー