AIR-Embodied: An Efficient Active 3DGS-based Interaction and Reconstruction Framework with Embodied Large Language Model

要約

3D 再構築とニューラルレンダリングの最近の進歩により、高品質のデジタルアセットの作成が強化されましたが、既存の方法は、さまざまなオブジェクトの形状、テクスチャ、オクルージョンにわたって一般化するのに苦労しています。
Next Best View (NBV) プランニングと学習ベースのアプローチは解決策を提供しますが、多くの場合、事前定義された基準によって制限され、人間のような常識でオクルージョンを管理できません。
これらの問題に対処するために、我々は、身体化された AI エージェントと大規模な事前トレーニング済みマルチモーダル言語モデルを統合して、アクティブ 3DGS 再構成を改善する新しいフレームワークである AIR-Embodied を紹介します。
AIR-Embedded は 3 段階のプロセスを利用します。つまり、マルチモーダルプロンプトによる現在の再構成状態の理解、視点選択とインタラクティブなアクションによるタスクの計画、正確な実行を保証するための閉ループ推論の採用です。
エージェントは、計画された結果と実際の結果の間の差異に基づいてアクションを動的に調整します。
仮想環境と現実世界の環境にわたる実験評価により、AIR-Embedded が再構築の効率と品質を大幅に向上させ、アクティブ 3D 再構築の課題に対する堅牢なソリューションを提供することが実証されました。

要約(オリジナル)

Recent advancements in 3D reconstruction and neural rendering have enhanced the creation of high-quality digital assets, yet existing methods struggle to generalize across varying object shapes, textures, and occlusions. While Next Best View (NBV) planning and Learning-based approaches offer solutions, they are often limited by predefined criteria and fail to manage occlusions with human-like common sense. To address these problems, we present AIR-Embodied, a novel framework that integrates embodied AI agents with large-scale pretrained multi-modal language models to improve active 3DGS reconstruction. AIR-Embodied utilizes a three-stage process: understanding the current reconstruction state via multi-modal prompts, planning tasks with viewpoint selection and interactive actions, and employing closed-loop reasoning to ensure accurate execution. The agent dynamically refines its actions based on discrepancies between the planned and actual outcomes. Experimental evaluations across virtual and real-world environments demonstrate that AIR-Embodied significantly enhances reconstruction efficiency and quality, providing a robust solution to challenges in active 3D reconstruction.

arxiv情報

著者	Zhenghao Qi,Shenghai Yuan,Fen Liu,Haozhi Cao,Tianchen Deng,Jianfei Yang,Lihua Xie
発行日	2024-09-24 12:22:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AIR-Embodied: An Efficient Active 3DGS-based Interaction and Reconstruction Framework with Embodied Large Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー