Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs

要約

我々は、人間が密集する環境における長期的なロボット計画に不可欠な、長期的な人間の軌道予測のための新しいアプローチを提案します。
最先端の人間の軌跡予測手法は、衝突回避と短期計画に焦点を当てていること、および人間と環境の複雑な相互作用をモデル化できないことによって限界があります。
対照的に、私たちのアプローチは、環境と人間の相互作用のシーケンスを予測し、この情報を使用して最大 60 秒の期間にわたる軌道予測をガイドすることで、これらの制限を克服します。
大規模言語モデル (LLM) を活用して、シーンに関する豊富なコンテキスト情報に基づいて LLM 予測を条件付けすることで、環境との相互作用を予測します。
この情報は、環境のジオメトリ、セマンティクス、および通過可能性を階層表現にエンコードする 3D ダイナミックシーングラフとして提供されます。
次に、連続時間マルコフ連鎖に基づく確率的アプローチを使用して、これらの相互作用シーケンスを人間の位置にわたるマルチモーダル時空間分布に統合します。
私たちのアプローチを評価するために、複雑な屋内環境における人間の長期的な軌跡の新しい半合成データセットを導入します。これには、人間と物体の相互作用の注釈も含まれています。
徹底した実験評価により、60 秒の期間における最良の非特権ベースラインと比較して、私たちのアプローチは 54% 低い平均負対数尤度 (NLL) と 26.5% 低い Best-of-20 変位誤差を達成することを示しています。

要約(オリジナル)

We present a novel approach for long-term human trajectory prediction, which is essential for long-horizon robot planning in human-populated environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood (NLL) and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged baselines for a time horizon of 60s.

arxiv情報

著者	Nicolas Gorlo,Lukas Schmid,Luca Carlone
発行日	2024-05-01 14:50:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー