LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation

要約

ポイントゴールナビゲーション (PGN) は、事前に構築されたマップに依存せずにゴールポイントまで視覚的にナビゲートできるようにロボットをトレーニングする、マップレスナビゲーションアプローチです。
深層強化学習を使用した複雑な環境の処理は大幅に進歩しているにもかかわらず、現在の PGN 手法は単一ロボットシステム向けに設計されており、多様なプラットフォームを使用するマルチロボットシナリオへの一般化可能性は限られています。
この論文では、PGN の知識伝達フレームワークを提案することでこの制限に対処し、教師ロボットが学習したナビゲーションモデルを、未知のプラットフォームやブラックボックスプラットフォームを含む学生ロボットに転送できるようにします。
一人称視点 (FPV) 表現 (ビュー画像、方向転換/前進アクション) を、普遍的に適用可能な三人称視点 (TPV) 表現 (ローカルマップ、サブゴール) に変換する、新しい知識蒸留 (KD) フレームワークを導入します。
状態は SLAM を使用して再構築されたローカルマップとして再定義され、アクションは事前定義されたグリッド上のサブ目標にマッピングされます。
トレーニング効率を高めるために、ノイズに強いローカルマップ記述子 (LMD) を介してトレーニングエピソードを調整する、サンプリング効率の高い KD アプローチを提案します。
この方法は 2D 車輪付きロボットで検証されていますが、ドローンなどの 3D アクションスペースにも拡張できます。
Habitat-Sim で行われた実験は、最小限の実装労力で済む、提案されたフレームワークの実現可能性を示しています。
この研究は、スケーラブルでクロスプラットフォームの PGN ソリューションの可能性を強調し、マルチロボットシナリオにおける組み込み型 AI システムの適用可能性を拡大します。

要約(オリジナル)

Point goal navigation (PGN) is a mapless navigation approach that trains robots to visually navigate to goal points without relying on pre-built maps. Despite significant progress in handling complex environments using deep reinforcement learning, current PGN methods are designed for single-robot systems, limiting their generalizability to multi-robot scenarios with diverse platforms. This paper addresses this limitation by proposing a knowledge transfer framework for PGN, allowing a teacher robot to transfer its learned navigation model to student robots, including those with unknown or black-box platforms. We introduce a novel knowledge distillation (KD) framework that transfers first-person-view (FPV) representations (view images, turning/forward actions) to universally applicable third-person-view (TPV) representations (local maps, subgoals). The state is redefined as reconstructed local maps using SLAM, while actions are mapped to subgoals on a predefined grid. To enhance training efficiency, we propose a sampling-efficient KD approach that aligns training episodes via a noise-robust local map descriptor (LMD). Although validated on 2D wheeled robots, this method can be extended to 3D action spaces, such as drones. Experiments conducted in Habitat-Sim demonstrate the feasibility of the proposed framework, requiring minimal implementation effort. This study highlights the potential for scalable and cross-platform PGN solutions, expanding the applicability of embodied AI systems in multi-robot scenarios.

arxiv情報

著者	Riku Uemura,Kanji Tanaka,Kenta Tsukahara,Daiki Iwata
発行日	2024-12-23 05:05:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー