Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics

要約

追加のトレーニングを必要とせずに、既製のテキストベースのトランスフォーマーが、数ショットのコンテキスト内視覚模倣学習を実行し、視覚的観察をデモンストレーターの動作をエミュレートするアクションシーケンスにマッピングできることを示します。
これは、キーポイントアクショントークン (KAT) と呼ばれるフレームワークを介して、視覚的な観察 (入力) とアクションの軌跡 (出力) を、テキストで事前トレーニングされたトランスフォーマー (GPT-4 Turbo) が取り込んで生成できる一連のトークンに変換することで実現します。
言語のみのトレーニングを受けているにもかかわらず、これらのトランスフォーマーは、トークン化された視覚的なキーポイントの観察を行動の軌跡に変換することに優れ、低データ領域での最先端の模倣学習 (拡散ポリシー) と同等以上のパフォーマンスを発揮することを示します。
現実世界の日常的なタスクのスイート。
KAT は、一般的な言語ドメインで動作するのではなく、テキストベースのトランスフォーマーを活用してビジョンおよびアクションドメインで動作し、デモンストレーションデータの一般的なパターンを学習して高効率の模倣学習を実現します。これは、自然言語モデルを身体化されたタスクに再利用するための有望な新しい手段を示しています。
。
ビデオは https://www.robot-learning.uk/keypoint-action-tokens でご覧いただけます。

要約(オリジナル)

We show that off-the-shelf text-based Transformers, with no additional training, can perform few-shot in-context visual imitation learning, mapping visual observations to action sequences that emulate the demonstrator’s behaviour. We achieve this by transforming visual observations (inputs) and trajectories of actions (outputs) into sequences of tokens that a text-pretrained Transformer (GPT-4 Turbo) can ingest and generate, via a framework we call Keypoint Action Tokens (KAT). Despite being trained only on language, we show that these Transformers excel at translating tokenised visual keypoint observations into action trajectories, performing on par or better than state-of-the-art imitation learning (diffusion policies) in the low-data regime on a suite of real-world, everyday tasks. Rather than operating in the language domain as is typical, KAT leverages text-based Transformers to operate in the vision and action domains to learn general patterns in demonstration data for highly efficient imitation learning, indicating promising new avenues for repurposing natural language models for embodied tasks. Videos are available at https://www.robot-learning.uk/keypoint-action-tokens.

arxiv情報

著者	Norman Di Palo,Edward Johns
発行日	2024-10-17 19:31:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー