Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation

要約

言語条件付きロボット操作は、自然言語で伝えられた指示を理解して実行するためにロボットを教えることにより、人間とロボットエージェントの間のシームレスなコミュニケーションと協力を可能にすることを目的とした新興分野です。
この学際的な分野は、人間の指示とロボット行動の間のギャップを埋めるために、シーンの理解、言語処理、およびポリシー学習を統合します。
この包括的な調査では、言語条件付きのロボット操作における最近の進歩を体系的に調査します。
既存の方法を、言語条件付きの報酬形状、言語条件付けされた政策学習、神経反体系人工知能、および大規模な言語モデル（LLM）やビジョン言語モデル（VLM）などの基礎モデル（FM）の利用に分類します。
具体的には、意味情報抽出、環境と評価、補助タスク、およびタスク表現戦略に関する最先端の手法を分析します。
比較分析を実施することにより、言語の指示をロボットアクションでブリッジングする際の現在のアプローチの強みと制限を強調します。
最後に、一般化能力の向上を可能にし、言語条件付きロボットマニピュレーターの安全性の問題に対処することに焦点を当てた、オープンな課題と将来の研究の方向性について説明します。

要約(オリジナル)

Language-conditioned robot manipulation is an emerging field aimed at enabling seamless communication and cooperation between humans and robotic agents by teaching robots to comprehend and execute instructions conveyed in natural language. This interdisciplinary area integrates scene understanding, language processing, and policy learning to bridge the gap between human instructions and robotic actions. In this comprehensive survey, we systematically explore recent advancements in language-conditioned robotic manipulation. We categorize existing methods into language-conditioned reward shaping, language-conditioned policy learning, neuro-symbolic artificial intelligence, and the utilization of foundational models (FMs) such as large language models (LLMs) and vision-language models (VLMs). Specifically, we analyze state-of-the-art techniques concerning semantic information extraction, environment and evaluation, auxiliary tasks, and task representation strategies. By conducting a comparative analysis, we highlight the strengths and limitations of current approaches in bridging language instructions with robot actions. Finally, we discuss open challenges and future research directions, focusing on potentially enhancing generalization capabilities and addressing safety issues in language-conditioned robot manipulators.

arxiv情報

著者	Hongkuan Zhou,Xiangtong Yao,Oier Mees,Yuan Meng,Ted Xiao,Yonatan Bisk,Jean Oh,Edward Johns,Mohit Shridhar,Dhruv Shah,Jesse Thomason,Kai Huang,Joyce Chai,Zhenshan Bing,Alois Knoll
発行日	2025-02-17 10:45:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー