TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability

要約

新しいテスト環境ではオブジェクトのクラスと配置が異なるため、オブジェクトとゴール間のビジュアルナビゲーションのためのエンドツーエンドの深層強化学習 (DRL) の一般化は長年の課題です。
ドメインに依存しない視覚表現を学習することは、訓練された DRL エージェントが目に見えないシーンやオブジェクトを一般化できるようにするために重要です。
このレターでは、ゼロショット機能を備えたエンドツーエンドのオブジェクトと目標の視覚ナビゲーションポリシーを学習するためのターゲット指向性注意ネットワーク (TDANet) が提案されています。
TDANet は、オブジェクト間の空間的関係と意味的関係の両方を学習する新しいターゲットアテンション (TA) モジュールを備えており、TDANet がターゲットに最も関連性の高い観測オブジェクトに焦点を当てるのに役立ちます。
シャムアーキテクチャ (SA) 設計により、TDANet は現在の状態とターゲットの状態の違いを区別し、ドメインに依存しない視覚的表現を生成します。
TDANet のナビゲーションパフォーマンスを評価するために、AI2-THOR を組み込んだ AI 環境で広範な実験が行われています。
シミュレーション結果は、他の最先端のモデルよりも高いナビゲーション成功率 (SR) と長さによる成功重み付け (SPL) により、目に見えないシーンやターゲットオブジェクトに対する TDANet の強力な一般化能力を示しています。
TDANet はついに実際のシーンで車輪付きロボットに導入され、TDANet が現実世界に十分に一般化されたことを示しています。

要約(オリジナル)

The generalization of the end-to-end deep reinforcement learning (DRL) for object-goal visual navigation is a long-standing challenge since object classes and placements vary in new test environments. Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects. In this letter, a target-directed attention network (TDANet) is proposed to learn the end-to-end object-goal visual navigation policy with zero-shot ability. TDANet features a novel target attention (TA) module that learns both the spatial and semantic relationships among objects to help TDANet focus on the most relevant observed objects to the target. With the Siamese architecture (SA) design, TDANet distinguishes the difference between the current and target states and generates the domain-independent visual representation. To evaluate the navigation performance of TDANet, extensive experiments are conducted in the AI2-THOR embodied AI environment. The simulation results demonstrate a strong generalization ability of TDANet to unseen scenes and target objects, with higher navigation success rate (SR) and success weighted by length (SPL) than other state-of-the-art models. TDANet is finally deployed on a wheeled robot in real scenes, demonstrating satisfactory generalization of TDANet to the real world.

arxiv情報

著者	Shiwei Lian,Feitian Zhang
発行日	2024-08-12 07:20:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー