Multi-Level Compositional Reasoning for Interactive Instruction Following

要約

自然言語指令によって家事を実行するロボットエージェントは、環境をナビゲートし、環境内のオブジェクトと対話するという複雑な作業を習得する必要があります。
エージェントに与えられるタスクは複合的なものが多いため、タスクを完了するには、たとえばコーヒーを一杯持ってくるなど、複数のサブタスクについて推論する必要があるため、困難です。
この課題に対処するために、タスクを複数のサブ目標に分割し、それらを個別に処理してより適切なナビゲーションとインタラクションを実現することで、課題を分割して克服することを提案します。
私たちはこれをマルチレベル構成推論エージェント (MCR-Agent) と呼びます。
具体的には、3段階の行動方針を学習します。
最も高いレベルでは、高レベルのポリシー構成コントローラーによる言語命令に基づいて、人間が解釈可能な一連のサブ目標が実行されると推論します。
中間レベルでは、ナビゲーションポリシーとさまざまな独立したインタラクションポリシーを切り替えることにより、マスターポリシーによってエージェントのナビゲーションを区別して制御します。
最後に、最下位レベルで、適切なインタラクションポリシーを使用して、対応するオブジェクトマスクによる操作アクションを推測します。
私たちのアプローチは、人間が解釈可能なサブ目標を生成するだけでなく、ルールベースの計画や意味論的空間記憶を使用せずに、効率性指標 (目に見えないセットの PLWSR) において同等の最先端技術と比較して 2.03% の絶対的な利得を達成します。

要約(オリジナル)

Robotic agents performing domestic chores by natural language directives are required to master the complex job of navigating environment and interacting with objects in the environments. The tasks given to the agents are often composite thus are challenging as completing them require to reason about multiple subtasks, e.g., bring a cup of coffee. To address the challenge, we propose to divide and conquer it by breaking the task into multiple subgoals and attend to them individually for better navigation and interaction. We call it Multi-level Compositional Reasoning Agent (MCR-Agent). Specifically, we learn a three-level action policy. At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller. At the middle level, we discriminatively control the agent’s navigation by a master policy by alternating between a navigation policy and various independent interaction policies. Finally, at the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy. Our approach not only generates human interpretable subgoals but also achieves 2.03% absolute gain to comparable state of the arts in the efficiency metric (PLWSR in unseen set) without using rule-based planning or a semantic spatial memory.

arxiv情報

著者	Suvaansh Bhambri,Byeonghwi Kim,Jonghyun Choi
発行日	2024-03-13 02:37:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Level Compositional Reasoning for Interactive Instruction Following

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー