Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

要約

言語条件付きロボット操作は、単純なピックアンドプレイスから意図認識や視覚的推論を必要とするタスクまで、自然言語命令を実行可能なアクションに変換することを目的としています。
人間の意思決定における速い思考と遅い思考の 2 つの並行システムを示唆する認知科学の二重プロセス理論に触発され、人間の認知アーキテクチャを模倣してタスクを分類し、
命令タイプに基づいて 2 つのシステムで決定します。
当社の RFST は 2 つの主要なコンポーネントで構成されています。1) 現在のユーザー指示に基づいてどのシステムをアクティブ化するかを決定する指示弁別器、2) ポリシーに合わせて微調整されたビジョン言語モデルで構成される低速思考システム
これにより、ロボットはユーザーの意図を認識したり、推論タスクを実行したりできるようになります。
私たちの方法論を評価するために、私たちは現実世界の軌跡を特徴とするデータセットを構築し、自発的な衝動から熟考を必要とするタスクに至るまでの行動を捕捉しました。
シミュレーションと現実世界のシナリオの両方における私たちの結果は、私たちのアプローチが意図の認識と推論を必要とする複雑なタスクを適切に管理していることを裏付けています。
プロジェクトは https://jlm-z.github.io/RSFT/ で入手できます。

要約(オリジナル)

The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning. Inspired by the dual process theory in cognitive science, which suggests two parallel systems of fast and slow thinking in human decision-making, we introduce Robotics with Fast and Slow Thinking (RFST), a framework that mimics human cognitive architecture to classify tasks and makes decisions on two systems based on instruction types. Our RFST consists of two key components: 1) an instruction discriminator to determine which system should be activated based on the current user instruction, and 2) a slow-thinking system that is comprised of a fine-tuned vision language model aligned with the policy networks, which allows the robot to recognize user intention or perform reasoning tasks. To assess our methodology, we built a dataset featuring real-world trajectories, capturing actions ranging from spontaneous impulses to tasks requiring deliberate contemplation. Our results, both in simulation and real-world scenarios, confirm that our approach adeptly manages intricate tasks that demand intent recognition and reasoning. The project is available at https://jlm-z.github.io/RSFT/

arxiv情報

著者	Minjie Zhu,Yichen Zhu,Jinming Li,Junjie Wen,Zhiyuan Xu,Zhengping Che,Chaomin Shen,Yaxin Peng,Dong Liu,Feifei Feng,Jian Tang
発行日	2024-01-08 19:00:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー