FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

要約

意味的にインタラクティブなラディアンスフィールドは、シーンの理解と操作を達成するために具体化されたAIなど、3D実世界のアプリケーションにとって長い間有望なバックボーンでした。
ただし、オブジェクトコンポーネントのクエリに関しては、言語のあいまいさと品質の低下のため、多粒度相互作用は依然として困難なタスクです。
この作業では、3Dガウススプラッティング（3DG）内の部分レベルのオープンボキャブラリークエリをサポートするアプローチであるFMLGSを提示します。
セグメントのすべてのモデル2（SAM2）に基づいて、一貫したオブジェクトおよび部分レベルのセマンティクスを構築およびクエリするための効率的なパイプラインを提案します。
オブジェクトパーツ間の言語のあいまいさの問題を解決するための意味偏差戦略を設計しました。これは、濃縮された情報のための微調整されたターゲットの意味的な特徴を補間します。
トレーニングを受けたら、自然言語を使用してオブジェクトとその説明可能な部分の両方を照会できます。
他の最先端の方法との比較は、私たちの方法が指定されたパートレベルのターゲットをより適切に見つけることができるだけでなく、速度と精度の両方に関する1位のパフォーマンスを達成できることを証明しています。
一方、FMLGSをさらに、3Dシーンをインタラクティブにナビゲートし、ターゲットを見つけ、チャットインターフェイスを介してユーザーの要求に応答できる仮想エージェントとして統合します。

要約(オリジナル)

The semantically interactive radiance field has long been a promising backbone for 3D real-world applications, such as embodied AI to achieve scene understanding and manipulation. However, multi-granularity interaction remains a challenging task due to the ambiguity of language and degraded quality when it comes to queries upon object components. In this work, we present FMLGS, an approach that supports part-level open-vocabulary query within 3D Gaussian Splatting (3DGS). We propose an efficient pipeline for building and querying consistent object- and part-level semantics based on Segment Anything Model 2 (SAM2). We designed a semantic deviation strategy to solve the problem of language ambiguity among object parts, which interpolates the semantic features of fine-grained targets for enriched information. Once trained, we can query both objects and their describable parts using natural language. Comparisons with other state-of-the-art methods prove that our method can not only better locate specified part-level targets, but also achieve first-place performance concerning both speed and accuracy, where FMLGS is 98 x faster than LERF, 4 x faster than LangSplat and 2.5 x faster than LEGaussians. Meanwhile, we further integrate FMLGS as a virtual agent that can interactively navigate through 3D scenes, locate targets, and respond to user demands through a chat interface, which demonstrates the potential of our work to be further expanded and applied in the future.

arxiv情報

著者	Xin Tan,Yuzhou Ji,He Zhu,Yuan Xie
発行日	2025-04-11 14:33:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー