Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

要約

ビジョンアンドランゲージナビゲーション(VLN)の主な課題は、未知の環境において自然言語の指示をいかに理解するかということである。従来のVLNアルゴリズムの主な限界は、行動を間違えた場合、エージェントが指示に従わなかったり、不要な領域を探索したりして、回復不可能な経路に誘導してしまうことであった。この問題に対処するため、我々は、誤った最近の行動を修正するために、探索ポリシーを展開する階層型ナビゲーション手法であるMeta-Exploreを提案する。未訪問だが観測可能な状態の中から適切に選択された局所的な目標に向かってエージェントを移動させるエクスプロイトポリシーが、以前に訪れた状態にエージェントを移動させる手法よりも優れていることを明らかにする。また、意味的に意味のある手がかりを用いて、後悔するような探索を想像することの必要性を強調する。本アプローチの鍵は、エージェント周囲のオブジェクトの配置をスペクトル領域で理解することである。具体的には、シーンオブジェクトスペクトル（SOS）と呼ばれる新しい視覚表現を提示し、検出されたオブジェクトのカテゴリごとの2次元フーリエ変換を行う。探索方針とSOSの特徴を組み合わせることで、エージェントは有望なローカルゴールを選択することにより、その経路を修正することができる。我々は、3つのVLNベンチマークで本方法を評価する。R2R、SOON、REVERIEである。Meta-Exploreは他のベースラインを凌駕し、大きな汎化性能を示す。また、提案するスペクトル領域SOS特徴を用いたローカルゴール探索は、SOONベンチマークにおいて成功率を17.1%、SPLを20.6%と大幅に向上させる。

要約(オリジナル)

The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rate by 17.1% and SPL by 20.6% for the SOON benchmark.

arxiv情報

著者	Minyoung Hwang,Jaeyeon Jeong,Minsoo Kim,Yoonseon Oh,Songhwai Oh
発行日	2023-03-07 17:39:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー