Infer Human’s Intentions Before Following Natural Language Instructions

要約

AI エージェントが人間の役に立つためには、自然言語の指示に従い、人間の環境で日常の共同作業を完了できる必要があります。
しかし、実際の人間の指示は本質的にあいまいさを持っています。なぜなら、人間の話者は隠された目標や意図について十分な事前知識があると想定しているからです。
標準的な言語の基礎付けおよび計画方法は、人間の内部目標を環境内の追加の部分的に観察可能な要素としてモデル化していないため、そのような曖昧さに対処できません。
私たちは、共同的な身体化されたタスクに続くより良い自然言語指導を目的とした、新しいフレームワークである社会的身体的推論による指示に従う (FISER) を提案します。
私たちのフレームワークは、中間推論ステップとして人間の目標と意図について明示的な推論を行います。
私たちは一連の Transformer ベースのモデルを実装し、それらを困難なベンチマークである HandMeThat で評価します。
私たちは、行動計画を立てる前に社会的推論を使用して人間の意図を明示的に推測することは、純粋なエンドツーエンドのアプローチを超えることを経験的に示しています。
また、利用可能な最大の事前トレーニング済み言語モデルを使用した思考連鎖プロンプトなどの強力なベースラインと実装を比較し、FISER が調査中の具体化された社会的推論タスクでより優れたパフォーマンスを提供し、最先端のレベルに達していることを発見しました。
ハンドミーザット。

要約(オリジナル)

For AI agents to be helpful to humans, they should be able to follow natural language instructions to complete everyday cooperative tasks in human environments. However, real human instructions inherently possess ambiguity, because the human speakers assume sufficient prior knowledge about their hidden goals and intentions. Standard language grounding and planning methods fail to address such ambiguities because they do not model human internal goals as additional partially observable factors in the environment. We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative embodied tasks. Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps. We implement a set of Transformer-based models and evaluate them over a challenging benchmark, HandMeThat. We empirically demonstrate that using social reasoning to explicitly infer human intentions before making action plans surpasses purely end-to-end approaches. We also compare our implementation with strong baselines, including Chain of Thought prompting on the largest available pre-trained language models, and find that FISER provides better performance on the embodied social reasoning tasks under investigation, reaching the state-of-the-art on HandMeThat.

arxiv情報

著者	Yanming Wan,Yue Wu,Yiping Wang,Jiayuan Mao,Natasha Jaques
発行日	2024-09-26 17:19:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Infer Human’s Intentions Before Following Natural Language Instructions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー