GaussNav: Gaussian Splatting for Visual Navigation

要約

身体化されたビジョンでは、インスタンスイメージゴールナビゲーション (IIN) では、エージェントが未探索の環境内でゴールイメージに描かれている特定のオブジェクトを見つける必要があります。
IIN の主な難しさは、さまざまな視点からターゲットオブジェクトを認識し、潜在的な注意をそらすものを拒否する必要があることに起因します。
既存の地図ベースのナビゲーション方法は主に鳥瞰図 (BEV) 地図の表現形式を採用していますが、シーン内の詳細なテクスチャの表現が不足しています。
上記の問題に対処するために、3D ガウススプラッティング (3DGS) に基づいて新しい地図表現を構築する、IIN タスク用の新しいガウススプラッティングナビゲーション (GaussNav と略称) フレームワークを提案します。
提案されたフレームワークにより、エージェントはシーンの形状と意味情報を記憶できるだけでなく、オブジェクトのテクスチャー特徴も保持できるようになります。
当社の GaussNav フレームワークは、困難な Habitat-Matterport 3D (HM3D) データセットでパス長 (SPL) で重み付けされた成功が 0.252 から 0.578 に増加したことから明らかなように、パフォーマンスが大幅に向上しました。
私たちのコードは公開されます。

要約(オリジナル)

In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird’s Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. Our code will be made publicly available.

arxiv情報

著者	Xiaohan Lei,Min Wang,Wengang Zhou,Houqiang Li
発行日	2024-03-18 09:56:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GaussNav: Gaussian Splatting for Visual Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー