Learning Object Manipulation Skills from Video via Approximate Differentiable Physics

要約

我々は、1つのビデオデモを見ることによって、ロボットに簡単な物体操作タスクを教えることを目的とする。この目標に向けて、我々は、入力ビデオで実演された動作を模倣するために、粗くかつ時間的に進化する3Dシーンを出力する最適化アプローチを提案する。従来の研究と同様に、微分可能なレンダラは3Dシーンと2Dビデオ間の知覚的な忠実度を保証する。また、重力、摩擦、手と物、物と物の相互作用などの物理法則を近似的にモデル化する常微分方程式（ODE）を解くために微分可能なアプローチを取り入れた点が特徴である。これにより、手や物体の状態の推定品質を劇的に向上させることができるだけでなく、コストのかかる強化学習を行わずに、ロボットに直接変換できる物理的に許容される軌道を生成することができる。我々は、右から左に何かを引っ張る、あるいは、何かの前に何かを置くといった9つの動作からなる54のビデオデモからなる3D再構成タスクで我々のアプローチを評価した。本アプローチは、従来の最先端技術に比べ約30%向上し、特に、何かを何かの上に置くといった、2つの物体の物理的相互作用を含む難しい動作において優れた品質を実証することができた。最後に、学習したスキルをフランカ・エミカ・パンダ・ロボットで紹介します。

要約(オリジナル)

We aim to teach robots to perform simple object manipulation tasks by watching a single video demonstration. Towards this goal, we propose an optimization approach that outputs a coarse and temporally evolving 3D scene to mimic the action demonstrated in the input video. Similar to previous work, a differentiable renderer ensures perceptual fidelity between the 3D scene and the 2D video. Our key novelty lies in the inclusion of a differentiable approach to solve a set of Ordinary Differential Equations (ODEs) that allows us to approximately model laws of physics such as gravity, friction, and hand-object or object-object interactions. This not only enables us to dramatically improve the quality of estimated hand and object states, but also produces physically admissible trajectories that can be directly translated to a robot without the need for costly reinforcement learning. We evaluate our approach on a 3D reconstruction task that consists of 54 video demonstrations sourced from 9 actions such as pull something from right to left or put something in front of something. Our approach improves over previous state-of-the-art by almost 30%, demonstrating superior quality on especially challenging actions involving physical interactions of two objects such as put something onto something. Finally, we showcase the learned skills on a Franka Emika Panda robot.

arxiv情報

著者	Vladimir Petrik,Mohammad Nomaan Qureshi,Josef Sivic,Makarand Tapaswi
発行日	2022-08-03 10:21:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Learning Object Manipulation Skills from Video via Approximate Differentiable Physics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー