REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

要約

ロボットタスク計画は、ロボットが一連の複雑なタスクを完了できるようにする実行可能なアクションシーケンスに人間の命令を分解します。
最近の大規模な言語モデル（LLM）ベースのタスクプランナーは驚くべきパフォーマンスを達成していますが、人間の指示は明確で簡単であると仮定しています。
ただし、実際のユーザーは専門家ではなく、ロボットへの指示には多くの場合、かなりのあいまいさが含まれています。
言語学者は、そのようなあいまいさは、対話の文脈と環境に大きく依存する表現（res）を参照することから頻繁に生じることを示唆しています。
このあいまいさは、ロボットがもっとサービスを提供すべき高齢者と子供の間でさらに一般的です。
このペーパーでは、人間の指示内のRESのこのようなあいまいさが、LLMベースのロボットタスク計画にどのように影響し、この問題を克服する方法を研究しています。
この目的のために、漠然としたRES（REI-Bench）を使用した最初のロボットタスク計画ベンチマークを提案します。ここでは、RESのあいまいさがロボット計画のパフォーマンスを大幅に分解できることがわかり、成功率は最大77.9％になります。
また、ほとんどの障害ケースは、プランナーに欠落しているオブジェクトに起因することを観察します。
RESの問題を軽減するために、単純で効果的なアプローチ、タスク指向のコンテキスト認知を提案します。タスク指向のコンテキスト認知は、ロボットの明確な指示を生成し、迅速で思考のチェーンと比較して最先端のパフォーマンスを実現します。
この作業は、特に非専門家のユーザー、たとえば高齢者や子供向けに、ロボットタスク計画をより実用的にすることにより、人間とロボットの相互作用（HRI）の研究コミュニティに貢献しています。

要約(オリジナル)

Robot task planning decomposes human instructions into executable action sequences that enable robots to complete a series of complex tasks. Although recent large language model (LLM)-based task planners achieve amazing performance, they assume that human instructions are clear and straightforward. However, real-world users are not experts, and their instructions to robots often contain significant vagueness. Linguists suggest that such vagueness frequently arises from referring expressions (REs), whose meanings depend heavily on dialogue context and environment. This vagueness is even more prevalent among the elderly and children, who robots should serve more. This paper studies how such vagueness in REs within human instructions affects LLM-based robot task planning and how to overcome this issue. To this end, we propose the first robot task planning benchmark with vague REs (REI-Bench), where we discover that the vagueness of REs can severely degrade robot planning performance, leading to success rate drops of up to 77.9%. We also observe that most failure cases stem from missing objects in planners. To mitigate the REs issue, we propose a simple yet effective approach: task-oriented context cognition, which generates clear instructions for robots, achieving state-of-the-art performance compared to aware prompt and chains of thought. This work contributes to the research community of human-robot interaction (HRI) by making robot task planning more practical, particularly for non-expert users, e.g., the elderly and children.

arxiv情報

著者	Chenxi Jiang,Chuhao Zhou,Jianfei Yang
発行日	2025-05-16 05:27:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー