BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding

要約

ロボットアプリケーションにおいて、動的なシーン理解は依然として根強い課題である。初期の動的マッピング手法は、特定のカテゴリをマスクまたは追跡することで、短期的な動的オブジェクトがカメラの動き推定に与える悪影響を軽減することに重点を置いていたが、長期的なシーンの変化に適応するには不十分であることが多い。最近の取り組みでは、合成データセットで訓練されたニューラルネットワークを使用して、長期的な動的環境におけるオブジェクトの関連付けに取り組んでいるが、それらは依然として、事前に定義されたオブジェクトの形状とカテゴリに依存している。他の手法では、視覚的、幾何学的、あるいは意味的なヒューリスティックスを関連付けに組み込んでいるが、ロバスト性に欠けることが多い。この研究では、あらかじめ定義されたカテゴリ、形状プリオル、または大規模な関連付けデータセットの必要性を排除した、クラスにとらわれないシーンごとの点群エンコーダBYEを紹介する。BYEは1シーケンスの探索データのみで学習され、動的に変化するシーンにおいて効率的にオブジェクトの関連付けを行うことができる。さらに、視覚言語モデル（VLM）の意味的な強みとBYEのシーン固有の専門知識を組み合わせたアンサンブル方式を提案し、オブジェクトの関連付けタスクで7％の改善と95％の成功率を達成した。コードとデータセットはhttps://byencoder.github.io。

要約(オリジナル)

Dynamic scene understanding remains a persistent challenge in robotic applications. Early dynamic mapping methods focused on mitigating the negative influence of short-term dynamic objects on camera motion estimation by masking or tracking specific categories, which often fall short in adapting to long-term scene changes. Recent efforts address object association in long-term dynamic environments using neural networks trained on synthetic datasets, but they still rely on predefined object shapes and categories. Other methods incorporate visual, geometric, or semantic heuristics for the association but often lack robustness. In this work, we introduce BYE, a class-agnostic, per-scene point cloud encoder that removes the need for predefined categories, shape priors, or extensive association datasets. Trained on only a single sequence of exploration data, BYE can efficiently perform object association in dynamically changing scenes. We further propose an ensembling scheme combining the semantic strengths of Vision Language Models (VLMs) with the scene-specific expertise of BYE, achieving a 7% improvement and a 95% success rate in object association tasks. Code and dataset are available at https://byencoder.github.io.

arxiv情報

著者	Chenguang Huang,Shengchao Yan,Wolfram Burgard
発行日	2024-12-03 13:34:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー