A Backbone for Long-Horizon Robot Task Understanding

要約

エンドツーエンドのロボット学習、特に長期的なタスクの場合、多くの場合、予測不可能な結果が発生し、一般化が不十分になります。
これらの課題に対処するために、ロボットのタスクの理解と伝達性を強化するための新しいTherbligベースのBackbone Framework (TBBF)を提案します。
このフレームワークは、Therbligs (基本アクション要素) をバックボーンとして使用して、高レベルのロボットタスクを要素ロボット構成に分解し、現在の基礎モデルと統合してタスクの理解を向上させます。
このアプローチは、オフライントレーニングとオンラインテストの 2 つの段階で構成されます。
オフライントレーニングの段階で、さまざまなタスクにわたって正確なサーブリグセグメンテーションを実現する Meta-RGate SynerFusion (MGSF) ネットワークを開発しました。
オンラインテスト段階では、新しいタスクのワンショットデモンストレーションが収集された後、MGSF ネットワークが高レベルの知識を抽出し、アクション登録 (ActionREG) を使用してイメージにエンコードされます。
さらに、Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) を採用して、正確なアクションの実行を保証し、新しいロボットシナリオでの軌道移動を容易にします。
実験結果ではこれらの方法が検証され、単純なシナリオと複雑なシナリオについて、サーブリグセグメンテーションで 94.37% の再現率、実世界のオンラインロボットテストでそれぞれ 94.4% と 80% の成功率を達成しました。
補足資料は、https://sites.google.com/view/therbligsbasedbackbone/home から入手できます。

要約(オリジナル)

End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robot configurations, which are then integrated with current foundation models to improve task understanding. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, the Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action execution, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively. Supplementary material is available at: https://sites.google.com/view/therbligsbasedbackbone/home

arxiv情報

著者	Xiaoshuai Chen,Wei Chen,Dongmyoung Lee,Yukun Ge,Nicolas Rojas,Petar Kormushev
発行日	2024-08-07 12:01:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Backbone for Long-Horizon Robot Task Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー