3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

要約

コンパクトで有益な 3D シーン表現を構築することは、特に長期間にわたる複雑な環境において、効果的に体現された探索と推論を行うために不可欠です。
オブジェクト中心の 3D シーングラフなどの既存の表現は、制限されたテキスト関係を持つ孤立したオブジェクトとしてシーンをモデル化することで空間関係を過度に単純化しており、微妙な空間理解を必要とするクエリに対処することが困難になっています。
さらに、これらの表現には能動的な探索や記憶管理のための自然なメカニズムが欠けており、生涯にわたる自律性への応用が妨げられています。
この研究では、身体化されたエージェントのための新しい 3D シーンメモリフレームワークである 3D-Mem を提案します。
3D-Mem は、メモリースナップショットと呼ばれる有益なマルチビュー画像を使用して、シーンを表現し、探索された領域の豊富な視覚情報をキャプチャします。
未踏の領域を垣間見るフロンティアスナップショットを導入することでフロンティアベースの探索をさらに統合し、エージェントが既知の情報と潜在的な新しい情報の両方を考慮して情報に基づいた意思決定を行えるようにします。
アクティブな探索設定で生涯にわたる記憶をサポートするために、3D-Mem の増分構築パイプラインと、メモリ管理のためのメモリ取得技術を紹介します。
3 つのベンチマークに関する実験結果は、3D-Mem が 3D 環境におけるエージェントの探索および推論能力を大幅に強化することを実証し、身体型 AI におけるアプリケーションを進歩させる可能性を強調しています。

要約(オリジナル)

Constructing compact and informative 3D scene representations is essential for effective embodied exploration and reasoning, especially in complex environments over extended periods. Existing representations, such as object-centric 3D scene graphs, oversimplify spatial relationships by modeling scenes as isolated objects with restrictive textual relationships, making it difficult to address queries requiring nuanced spatial understanding. Moreover, these representations lack natural mechanisms for active exploration and memory management, hindering their application to lifelong autonomy. In this work, we propose 3D-Mem, a novel 3D scene memory framework for embodied agents. 3D-Mem employs informative multi-view images, termed Memory Snapshots, to represent the scene and capture rich visual information of explored regions. It further integrates frontier-based exploration by introducing Frontier Snapshots-glimpses of unexplored areas-enabling agents to make informed decisions by considering both known and potential new information. To support lifelong memory in active exploration settings, we present an incremental construction pipeline for 3D-Mem, as well as a memory retrieval technique for memory management. Experimental results on three benchmarks demonstrate that 3D-Mem significantly enhances agents’ exploration and reasoning capabilities in 3D environments, highlighting its potential for advancing applications in embodied AI.

arxiv情報

著者	Yuncong Yang,Han Yang,Jiachen Zhou,Peihao Chen,Hongxin Zhang,Yilun Du,Chuang Gan
発行日	2024-12-15 06:10:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー