ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

要約

特定のオブジェクトを正確に見つけてナビゲートする機能は、実世界で動作し、オブジェクトと対話してタスクを完了する具現化されたエージェントにとって重要な機能です。
このようなオブジェクトナビゲーションタスクは、通常、ラベル付きオブジェクトを使用した視覚環境での大規模なトレーニングを必要とし、未知の環境での新しいオブジェクトへの一般化が不十分です。
この作業では、新しいゼロショットオブジェクトナビゲーション手法である、ソフトコモンセンス制約 (ESC) を使用した探索を紹介します。これは、事前トレーニング済みモデルの常識的な知識を、ナビゲーションの経験やビジュアルに関するその他のトレーニングなしでオープンワールドオブジェクトナビゲーションに移すものです。
環境。
まず、ESC は、オープンワールドのプロンプトベースのグラウンディング用に事前トレーニング済みのビジョンと言語モデルを活用し、部屋とオブジェクトの推論用に事前トレーニング済みの常識言語モデルを活用します。
次に、ESC は、効率的な探索のためのソフトロジックの述語としてモデル化することにより、常識的な知識をナビゲーションアクションに変換します。
MP3D、HM3D、および RoboTHOR ベンチマークでの広範な実験は、ESC メソッドがベースラインを大幅に改善し、ゼロショットオブジェクトナビゲーションの新しい最先端の結果を達成することを示しています (たとえば、CoW よりも 225\% の相対成功率の改善)
MP3D)。

要約(オリジナル)

The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 225\% relative Success Rate improvement than CoW on MP3D).

arxiv情報

著者	Kaiwen Zhou,Kaizhi Zheng,Connor Pryor,Yilin Shen,Hongxia Jin,Lise Getoor,Xin Eric Wang
発行日	2023-01-30 18:37:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー