Vision-Language Interpreter for Robot Task Planning

要約

大規模言語モデル (LLM) により、言語ガイド付きロボットプランナーの開発が加速しています。
一方、シンボリックプランナーには解釈可能という利点があります。
この論文では、これら 2 つの傾向を橋渡しする新しいタスク、すなわちマルチモーダル計画問題仕様を提案します。
目的は、プランナーが計画を見つけるために使用する機械可読ファイルである問題記述 (PD) を生成することです。
言語指導と場面観察から PD を生成することにより、言語ガイド付きフレームワークで記号プランナーを駆動できます。
我々は、最先端の LLM とビジョン言語モデルを使用して PD を生成する新しいフレームワークである、Vision-Language Interpreter (ViLaIn) を提案します。
ViLaIn は、シンボリックプランナーからのエラーメッセージフィードバックを通じて、生成された PD を改良できます。
私たちの目的は、ViLaIn とシンボリックプランナーは有効なロボットプランをどの程度正確に生成できるか? という質問に答えることです。
ViLaIn を評価するために、問題記述生成 (ProDG) データセットと呼ばれる新しいデータセットを導入します。
このフレームワークは 4 つの新しい評価指標で評価されます。
実験結果は、ViLaIn が 99\% 以上の精度で構文的に正しい問題を生成し、58\% 以上の精度で有効な計画を生成できることを示しています。
私たちのコードとデータセットは https://github.com/omron-sonicx/ViLaIn で入手できます。

要約(オリジナル)

Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By generating PDs from language instruction and scene observation, we can drive symbolic planners in a language-guided framework. We propose a Vision-Language Interpreter (ViLaIn), a new framework that generates PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine generated PDs via error message feedback from the symbolic planner. Our aim is to answer the question: How accurately can ViLaIn and the symbolic planner generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset called the problem description generation (ProDG) dataset. The framework is evaluated with four new evaluation metrics. Experimental results show that ViLaIn can generate syntactically correct problems with more than 99\% accuracy and valid plans with more than 58\% accuracy. Our code and dataset are available at https://github.com/omron-sinicx/ViLaIn.

arxiv情報

著者	Keisuke Shirai,Cristian C. Beltran-Hernandez,Masashi Hamaya,Atsushi Hashimoto,Shohei Tanaka,Kento Kawaharazuka,Kazutoshi Tanaka,Yoshitaka Ushiku,Shinsuke Mori
発行日	2024-02-20 03:13:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision-Language Interpreter for Robot Task Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー