Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

要約

複雑な屋内環境をナビゲートするには、目標位置に向かうエージェントのナビゲーションプロセスに正確に情報を提供するために、ロボットエージェントが動作する空間を深く理解する必要があります。
最近の学習ベースのナビゲーションアプローチでは、シミュレーションで必要な経験を収集することで、エージェントのシーン理解とナビゲーション能力が同時に達成されます。
残念ながら、シミュレータがナビゲーションポリシーをトレーニングするための効率的なツールであるとしても、結果として得られるモデルは現実世界に移すと失敗することがよくあります。
考えられる解決策の 1 つは、シーンの重要なドメイン不変プロパティを含む中間レベルの視覚表現をナビゲーションモデルに提供することです。
しかし、モデルを現実世界に移しやすくする最適な表現は何でしょうか?
それらはどのように組み合わせることができますか?
この研究では、強化学習セットアップに続いて PointGoal ナビゲーションタスクを実行するために、さまざまな中間レベルの視覚表現を組み合わせる深層学習アーキテクチャのベンチマークを提案することで、これらの問題に対処します。
提案されたすべてのナビゲーションモデルは、合成オフィス環境でハビタットシミュレーターを使用してトレーニングされ、実際のロボットプラットフォームを使用して同じ現実世界の環境でテストされています。
実際の状況でパフォーマンスを効率的に評価するために、シミュレータ内で現実的なナビゲーションエピソードを生成する検証ツールが提案されています。
私たちの実験では、ナビゲーションモデルがマルチモーダル入力から恩恵を受けることができ、検証ツールが時間とリソースを節約しながら、現実世界で予想されるナビゲーションパフォーマンスを適切に推定できることがわかりました。
取得した環境の合成 3D モデルと実際の 3D モデルは、Habitat 上に構築された検証ツールのコードとともに、次のリンクで公開されています: https://iplab.dmi.unict.it/EmbodiedVN/

要約(オリジナル)

Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously by collecting the required experience in simulation. Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world. One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene. But, what are the best representations that facilitate the transfer of a model to the real-world? How can they be combined? In this work we address these issues by proposing a benchmark of Deep Learning architectures to combine a range of mid-level visual representations, to perform a PointGoal navigation task following a Reinforcement Learning setup. All the proposed navigation models have been trained with the Habitat simulator on a synthetic office environment and have been tested on the same real-world environment using a real robotic platform. To efficiently assess their performance in a real context, a validation tool has been proposed to generate realistic navigation episodes inside the simulator. Our experiments showed that navigation models can benefit from the multi-modal input and that our validation tool can provide good estimation of the expected navigation performance in the real world, while saving time and resources. The acquired synthetic and real 3D models of the environment, together with the code of our validation tool built on top of Habitat, are publicly available at the following link: https://iplab.dmi.unict.it/EmbodiedVN/

arxiv情報

著者	Marco Rosano,Antonino Furnari,Luigi Gulino,Corrado Santoro,Giovanni Maria Farinella
発行日	2023-10-04 16:14:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー