Learning with a Mole: Transferable latent spatial representations for navigation without reconstruction

要約

3D 環境でナビゲートするエージェントには、意思決定と計画に役立つ観察履歴のコンパクトで実用的な表現を保持する何らかの形式のメモリが必要です。
ほとんどのエンドツーエンドの学習アプローチでは、表現は潜在的であり、通常は明確に定義された解釈がありませんが、古典的なロボット工学では、通常はジオメトリとセンサーモデルおよび/または学習によって推定される何らかの形式のマップを生成するシーンの再構成によってこれに対処します。
この研究では、ターゲットの下流タスクとは独立して、再構築を明示的に最適化することなく、シーンの実用的な表現を学習することを提案します。
学習された表現は、ウェイポイントから分岐する複数の短いサブエピソードを、最も重要なことに、直接視覚的に観察することなくナビゲートするように訓練された盲目の補助エージェントによって最適化されます。
私たちは、盲目特性が重要であり、（訓練された）潜在表現が計画のための唯一の手段となることを主張し、示します。
精査実験により、学習された表現が再構築ではなくナビゲーション性を最適化することを示します。
下流のタスクでは、分布の変化、特に sim2real ギャップに対して堅牢であることを示し、実際のオフィスビルで実際の物理ロボットを使用して評価し、パフォーマンスを大幅に向上させます。

要約(オリジナル)

Agents navigating in 3D environments require some form of memory, which should hold a compact and actionable representation of the history of observations useful for decision taking and planning. In most end-to-end learning approaches the representation is latent and usually does not have a clearly defined interpretation, whereas classical robotics addresses this with scene reconstruction resulting in some form of map, usually estimated with geometry and sensor models and/or learning. In this work we propose to learn an actionable representation of the scene independently of the targeted downstream task and without explicitly optimizing reconstruction. The learned representation is optimized by a blind auxiliary agent trained to navigate with it on multiple short sub episodes branching out from a waypoint and, most importantly, without any direct visual observation. We argue and show that the blindness property is important and forces the (trained) latent representation to be the only means for planning. With probing experiments we show that the learned representation optimizes navigability and not reconstruction. On downstream tasks we show that it is robust to changes in distribution, in particular the sim2real gap, which we evaluate with a real physical robot in a real office building, significantly improving performance.

arxiv情報

著者	Guillaume Bono,Leonid Antsfeld,Assem Sadek,Gianluca Monaci,Christian Wolf
発行日	2023-09-29 12:37:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning with a Mole: Transferable latent spatial representations for navigation without reconstruction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー