NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

要約

なじみのない環境をナビゲートすることは、家庭用ロボットに大きな課題をもたらし、新しい装飾とレイアウトについて認識する能力と推論を必要とします。
既存の強化学習方法は、通常、広範なマッピングと探索に依存しているため、新しい環境に直接転送することはできません。
これらの課題に対処するために、事前に訓練された基礎モデルの論理的知識と一般化能力をゼロショットナビゲーションに転送しようとします。
大規模なビジョン言語モデルを拡散ネットワークと統合することにより、\ mname〜という名前のアプローチは、ロボットがロボットを生成するのを支援できる次のステップでエージェントの潜在的な観測を継続的に予測する視覚予測因子を構築します。
さらに、ナビゲーションの時間的特性を適応させるために、時間の歴史的情報を導入して、予測された画像がナビゲーションシーンに沿っていることを確認します。
次に、下流の画像ナビゲーションタスクを解決するための目標測定ポリシーへのガイダンスとして、予測される将来のフレームを埋め込む情報融合フレームワークを慎重に設計しました。
このアプローチは、シミュレートされた環境と現実世界の両方の環境にわたってナビゲーション制御と一般化を強化します。
広範な実験を通じて、私たちの方法の堅牢性と汎用性を実証し、多様な設定でのロボットナビゲーションの効率と有効性を改善する可能性を示しています。

要約(オリジナル)

Navigating unfamiliar environments presents significant challenges for household robots, requiring the ability to recognize and reason about novel decoration and layout. Existing reinforcement learning methods cannot be directly transferred to new environments, as they typically rely on extensive mapping and exploration, leading to time-consuming and inefficient. To address these challenges, we try to transfer the logical knowledge and the generalization ability of pre-trained foundation models to zero-shot navigation. By integrating a large vision-language model with a diffusion network, our approach named \mname ~constructs a visual predictor that continuously predicts the agent’s potential observations in the next step which can assist robots generate robust actions. Furthermore, to adapt the temporal property of navigation, we introduce temporal historical information to ensure that the predicted image is aligned with the navigation scene. We then carefully designed an information fusion framework that embeds the predicted future frames as guidance into goal-reaching policy to solve downstream image navigation tasks. This approach enhances navigation control and generalization across both simulated and real-world environments. Through extensive experimentation, we demonstrate the robustness and versatility of our method, showcasing its potential to improve the efficiency and effectiveness of robotic navigation in diverse settings.

arxiv情報

著者	Yiran Qin,Ao Sun,Yuze Hong,Benyou Wang,Ruimao Zhang
発行日	2025-02-19 17:27:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー