Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

要約

潜在シーンの表現は、強化学習 (RL) エージェントのトレーニングにおいて重要な役割を果たします。
シーンを記述する適切な潜在ベクトルを取得するために、最近の研究では、3D を認識した潜在条件付き NeRF パイプラインをシーン表現学習に組み込んでいます。
ただし、これらの NeRF 関連の方法は、ボリュームレンダリングにおける非効率的な高密度サンプリングのため、3D 構造情報を認識するのが困難です。
さらに、空きスペースと占有スペースを均等に考慮するため、シーン表現ベクトルに含まれるきめ細かい意味情報が不足しています。
どちらも、ダウンストリーム RL タスクのパフォーマンスを破壊する可能性があります。
上記の課題に対処するために、効率的な 3D ガウススプラッティング (3DGS) を採用して 3D シーン表現を初めて学習する新しいフレームワークを提案します。
簡単に言うと、NeRF よりも幾何学的な認識を備えた 3DGS 技術とシーン表現を橋渡しする、クエリベースの一般化可能な 3DGS を紹介します。
さらに、きめの細かい意味論的特徴を 3D ガウスに基礎付け、さらにシーン表現ベクトルに抽出するための階層的意味論的エンコーディングを提案します。
私たちは、Maniskill2 と Robomimic を含む 2 つの RL プラットフォームで 10 の異なるタスクにわたって広範な実験を実施しています。
結果は、私たちの方法が他の 5 つのベースラインよりも大幅に優れていることを示しています。
8 つのタスクでは最高の成功率を達成し、他の 2 つのタスクでは 2 番目に良い成功率を達成しました。

要約(オリジナル)

Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rendering. Moreover, they lack fine-grained semantic information included in their scene representation vectors because they evenly consider free and occupied spaces. Both of them can destroy the performance of downstream RL tasks. To address the above challenges, we propose a novel framework that adopts the efficient 3D Gaussian Splatting (3DGS) to learn 3D scene representation for the first time. In brief, we present the Query-based Generalizable 3DGS to bridge the 3DGS technique and scene representations with more geometrical awareness than those in NeRFs. Moreover, we present the Hierarchical Semantics Encoding to ground the fine-grained semantic features to 3D Gaussians and further distilled to the scene representation vectors. We conduct extensive experiments on two RL platforms including Maniskill2 and Robomimic across 10 different tasks. The results show that our method outperforms the other 5 baselines by a large margin. We achieve the best success rates on 8 tasks and the second-best on the other two tasks.

arxiv情報

著者	Jiaxu Wang,Ziyi Zhang,Qiang Zhang,Jia Li,Jingkai Sun,Mingyuan Sun,Junhao He,Renjing Xu
発行日	2024-06-04 14:49:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー