Improving robot navigation in crowded environments using intrinsic rewards

要約

混雑した環境での自律ナビゲーションは、多くのアプリケーションで未解決の問題であり、将来のスマートシティでロボットと人間が共存するために不可欠です。
近年、深層強化学習アプローチは、モデルベースのアルゴリズムよりも優れていることが証明されています。
それにもかかわらず、提供された結果が有望であっても、作品はモデルが提供する機能を利用することができません.
彼らは通常、トレーニングプロセスで局所的な最適化に陥り、最適なポリシーを学習できなくなります。
ゴール付近や動的障害物付近の状態など、考えられるすべての状態に適切にアクセスして対話することはできません。
この作業では、本質的な報酬を使用して探索と搾取のバランスを取り、エージェントがトレーニングされた時間ではなく状態の不確実性に応じて探索し、エージェントが未知の状態にもっと興味を持つようにすることを提案します。
このアプローチの利点を説明し、クラウドナビゲーションに使用される可能性のある他の探索アルゴリズムと比較します。
最先端のいくつかのアルゴリズムを修正して多くのシミュレーション実験が行われ、内因性報酬の使用により、ロボットがより速く学習し、より短いナビゲーション時間でより高い報酬と成功率 (より少ない衝突) に到達し、状態よりも優れていることが示されました。
最先端の。

要約(オリジナル)

Autonomous navigation in crowded environments is an open problem with many applications, essential for the coexistence of robots and humans in the smart cities of the future. In recent years, deep reinforcement learning approaches have proven to outperform model-based algorithms. Nevertheless, even though the results provided are promising, the works are not able to take advantage of the capabilities that their models offer. They usually get trapped in local optima in the training process, that prevent them from learning the optimal policy. They are not able to visit and interact with every possible state appropriately, such as with the states near the goal or near the dynamic obstacles. In this work, we propose using intrinsic rewards to balance between exploration and exploitation and explore depending on the uncertainty of the states instead of on the time the agent has been trained, encouraging the agent to get more curious about unknown states. We explain the benefits of the approach and compare it with other exploration algorithms that may be used for crowd navigation. Many simulation experiments are performed modifying several algorithms of the state-of-the-art, showing that the use of intrinsic rewards makes the robot learn faster and reach higher rewards and success rates (fewer collisions) in shorter navigation times, outperforming the state-of-the-art.

arxiv情報

著者	Diego Martinez-Baselga,Luis Riazuelo,Luis Montano
発行日	2023-02-13 17:54:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving robot navigation in crowded environments using intrinsic rewards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー