Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

要約

オブジェクトの状態は、その現在の状態や状態を反映しており、ロボットのタスクの計画と操作にとって重要です。
ただし、オブジェクトの状態を検出し、状態に応じたロボットの計画を生成することは困難です。
最近、事前トレーニングされた大規模言語モデル (LLM) とビジョン言語モデル (VLM) が、計画を生成する際に優れた機能を発揮するようになりました。
ただし、私たちの知る限り、LLM または VLM がオブジェクトの状態に依存したプランを生成できるかどうかについてはほとんど調査されていません。
これを研究するために、事前訓練されたニューラルネットワークによって強化されたタスク計画エージェントであるオブジェクト状態敏感エージェント (OSSA) を導入します。
我々は、OSSA に対して 2 つの方法を提案します。(i) 事前にトレーニングされた視覚処理モジュール (高密度キャプションモデル、DCM) と自然言語処理モデル (LLM) で構成されるモジュラーモデル、および (ii) のみで構成されるモノリシックモデル。
VLM。
2 つの方法のパフォーマンスを定量的に評価するために、テーブルを片付けることがタスクであるテーブルトップシナリオを使用します。
私たちは、オブジェクトの状態を考慮したマルチモーダルベンチマークデータセットを提供します。
私たちの結果は、どちらの方法もオブジェクトの状態に敏感なタスクに使用できることを示していますが、モノリシックアプローチの方がモジュラーアプローチよりも優れています。
OSSA のコードは https://github.com/Xiao-wen-Sun/OSSA で入手できます。

要約(オリジナル)

The state of an object reflects its current status or condition and is important for a robot’s task planning and manipulation. However, detecting an object’s state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our knowledge, there is hardly any investigation on whether LLMs or VLMs can also generate object state-sensitive plans. To study this, we introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks. We propose two methods for OSSA: (i) a modular model consisting of a pre-trained vision processing module (dense captioning model, DCM) and a natural language processing model (LLM), and (ii) a monolithic model consisting only of a VLM. To quantitatively evaluate the performances of the two methods, we use tabletop scenarios where the task is to clear the table. We contribute a multimodal benchmark dataset that takes object states into consideration. Our results show that both methods can be used for object state-sensitive tasks, but the monolithic approach outperforms the modular approach. The code for OSSA is available at https://github.com/Xiao-wen-Sun/OSSA

arxiv情報

著者	Xiaowen Sun,Xufeng Zhao,Jae Hee Lee,Wenhao Lu,Matthias Kerzel,Stefan Wermter
発行日	2024-10-16 14:48:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー