Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation

要約

大規模言語モデル (LLM) は、長期にわたる操作タスクのタスク計画において人気が高まっています。
LLM で生成された計画の有効性を高めるために、計画プロセスのガイドとして視覚的なデモンストレーションやオンラインビデオが広く使用されています。
ただし、微妙な動きを伴うが豊富な接触インタラクションを含む操作タスクの場合、LLM がデモンストレーションを完全に解釈するには、視覚認識だけでは不十分な場合があります。
さらに、視覚データは、実際のロボットで効果的に実行するために重要な、力関連のパラメータと条件に関する限られた情報を提供します。
この論文では、人間のデモンストレーションからの触覚情報と力-トルク情報を組み込んで、新しいタスクシナリオの計画を作成する LLM の能力を強化する、コンテキスト内学習フレームワークを紹介します。
私たちは、各モダリティを包括的なタスクプランに順次統合するブートストラップ推論パイプラインを提案します。
このタスクプランは、新しいタスク構成での計画の参照として使用されます。
2 つの異なる逐次操作タスクに関する実際の実験では、マルチモーダルデモンストレーションに対する LLM の理解を向上させ、全体的な計画パフォーマンスを向上させる上でのフレームワークの有効性が実証されています。

要約(オリジナル)

Large Language Models (LLMs) have gained popularity in task planning for long-horizon manipulation tasks. To enhance the validity of LLM-generated plans, visual demonstrations and online videos have been widely employed to guide the planning process. However, for manipulation tasks involving subtle movements but rich contact interactions, visual perception alone may be insufficient for the LLM to fully interpret the demonstration. Additionally, visual data provides limited information on force-related parameters and conditions, which are crucial for effective execution on real robots. In this paper, we introduce an in-context learning framework that incorporates tactile and force-torque information from human demonstrations to enhance LLMs’ ability to generate plans for new task scenarios. We propose a bootstrapped reasoning pipeline that sequentially integrates each modality into a comprehensive task plan. This task plan is then used as a reference for planning in new task configurations. Real-world experiments on two different sequential manipulation tasks demonstrate the effectiveness of our framework in improving LLMs’ understanding of multi-modal demonstrations and enhancing the overall planning performance.

arxiv情報

著者	Kejia Chen,Zheng Shen,Yue Zhang,Lingyun Chen,Fan Wu,Zhenshan Bing,Sami Haddadin,Alois Knoll
発行日	2024-09-18 10:36:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー