Multi-Agent Inverse Reinforcement Learning in Real World Unstructured Pedestrian Crowds

要約

大学のキャンパス、レストラン、食料品店、病院などの混雑した公共空間におけるソーシャルロボットナビゲーションは、ますます重要な研究分野となっています。
この目標を達成するための中核戦略の 1 つは、通常は逆強化学習 (IRL) を介して人間の報酬関数を学習することによって、人間の意図、つまり人間の動きを支配する根底にある心理的要因を理解することです。
IRL の大幅な進歩にも関わらず、高密度で構造化されていない歩行者の群衆の中で複数のエージェントの報酬関数を同時に学習することは、これらのシナリオ (追い越し、交差点、進路変更、車道など) で発生する密接に結合した社会的相互作用の性質のため、依然として困難なままです。
この論文では、現実世界の非構造化歩行者群に対する新しいマルチエージェント最大エントロピー逆強化学習アルゴリズムを紹介します。
私たちのアプローチの鍵となるのは、精度をわずかに低下させる代わりに扱いやすさを実現する、いわゆる扱いやすさと合理性のトレードオフトリックと名付けた、シンプルだが効果的な数学的トリックです。
私たちは、ETH、UCY、SCAND、JRDB、および混雑した場所で収集された Speedway と呼ばれる新しいデータセットを含むいくつかのデータセットで、古典的なシングルエージェント MaxEnt IRL と最先端の軌道予測手法に対するアプローチを比較します。
高密度で複雑なエージェントのインタラクションに焦点を当てた大学キャンパスの交差点。
私たちの主な調査結果は、高密度の Speedway データセットにおいて、私たちのアプローチが単一エージェント IRL と比較して 2 倍を超える改善で上位 7 ベースラインの中で 1 位にランクされ、最先端の大型トランスベースのエンコーダ/デコーダモデルと競合できることを示しています。
ETH/UCY などのまばらなデータセット (上位 7 ベースラインの中で 3 位)。

要約(オリジナル)

Social robot navigation in crowded public spaces such as university campuses, restaurants, grocery stores, and hospitals, is an increasingly important area of research. One of the core strategies for achieving this goal is to understand humans’ intent–underlying psychological factors that govern their motion–by learning their reward functions, typically via inverse reinforcement learning (IRL). Despite significant progress in IRL, learning reward functions of multiple agents simultaneously in dense unstructured pedestrian crowds has remained intractable due to the nature of the tightly coupled social interactions that occur in these scenarios \textit{e.g.} passing, intersections, swerving, weaving, etc. In this paper, we present a new multi-agent maximum entropy inverse reinforcement learning algorithm for real world unstructured pedestrian crowds. Key to our approach is a simple, but effective, mathematical trick which we name the so-called tractability-rationality trade-off trick that achieves tractability at the cost of a slight reduction in accuracy. We compare our approach to the classical single-agent MaxEnt IRL as well as state-of-the-art trajectory prediction methods on several datasets including the ETH, UCY, SCAND, JRDB, and a new dataset, called Speedway, collected at a busy intersection on a University campus focusing on dense, complex agent interactions. Our key findings show that, on the dense Speedway dataset, our approach ranks 1st among top 7 baselines with >2X improvement over single-agent IRL, and is competitive with state-of-the-art large transformer-based encoder-decoder models on sparser datasets such as ETH/UCY (ranks 3rd among top 7 baselines).

arxiv情報

著者	Rohan Chandra,Haresh Karnan,Negar Mehr,Peter Stone,Joydeep Biswas
発行日	2024-12-15 03:48:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Agent Inverse Reinforcement Learning in Real World Unstructured Pedestrian Crowds

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー