Learning to Act from Actionless Videos through Dense Correspondences

要約

この研究では、アクションアノテーションを使用せずに、少数のビデオデモンストレーションから、さまざまなロボットや環境にわたって多様なタスクを確実に実行できるビデオベースのロボットポリシーを構築するアプローチを紹介します。
私たちの方法では、タスクに依存しない表現として画像を利用し、状態とアクションの情報の両方をエンコードし、ロボットの目標を指定するための一般的な表現としてテキストを利用します。
アクションを実行するロボットを「幻覚」させるビデオを合成し、フレーム間の密な対応と組み合わせることで、私たちのアプローチは、明示的なアクションラベルを必要とせずに、環境に対して実行する閉じた形式のアクションを推測できます。
この独自の機能により、RGB ビデオのみに基づいてポリシーをトレーニングし、学習したポリシーをさまざまなロボットタスクに展開することができます。
テーブルトップの操作およびナビゲーションタスクに関する学習ポリシーにおけるアプローチの有効性を実証します。
さらに、効率的なビデオモデリングのためのオープンソースフレームワークを提供し、4 つの GPU を使用した高忠実度のポリシーモデルの 1 日以内のトレーニングを可能にします。

要約(オリジナル)

In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that “hallucinate” robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.

arxiv情報

著者	Po-Chen Ko,Jiayuan Mao,Yilun Du,Shao-Hua Sun,Joshua B. Tenenbaum
発行日	2023-10-12 17:59:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Act from Actionless Videos through Dense Correspondences

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー