Vision-Language Navigation(VLN)は有望なパラダイムとして浮上し、モバイルロボットがゼロショット推論を実行し、特定の事前プログラミングなしでタスクを実行できるようになりました。
包括的な評価のために、汎用VLNタスク用のカスタマイズされたプラットフォームである「Rover Master」という名前のUGV(無人地上車両)システムの機能的プロトタイプを設計します。
Rover MasterにCliprover Pipelineを統合して展開して、さまざまな現実世界のシナリオにわたってスループット、障害物回避能力、および軌跡のパフォーマンスを評価します。
Vision-language navigation (VLN) has emerged as a promising paradigm, enabling mobile robots to perform zero-shot inference and execute tasks without specific pre-programming. However, current systems often separate map exploration and path planning, with exploration relying on inefficient algorithms due to limited (partially observed) environmental information. In this paper, we present a novel navigation pipeline named ”ClipRover” for simultaneous exploration and target discovery in unknown environments, leveraging the capabilities of a vision-language model named CLIP. Our approach requires only monocular vision and operates without any prior map or knowledge about the target. For comprehensive evaluations, we design the functional prototype of a UGV (unmanned ground vehicle) system named ”Rover Master”, a customized platform for general-purpose VLN tasks. We integrate and deploy the ClipRover pipeline on Rover Master to evaluate its throughput, obstacle avoidance capability, and trajectory performance across various real-world scenarios. Experimental results demonstrate that ClipRover consistently outperforms traditional map traversal algorithms and achieves performance comparable to path-planning methods that depend on prior map and target knowledge. Notably, ClipRover offers real-time active navigation without requiring pre-captured candidate images or pre-built node graphs, addressing key limitations of existing VLN pipelines.
著者 | Yuxuan Zhang,Adnan Abdullah,Sanjeev J. Koppal,Md Jahidul Islam |
発行日 | 2025-02-12 21:07:10+00:00 |
arxivサイト | arxiv_id(pdf) |
提供元, 利用サービス
arxiv.jp, Google