Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem

要約

巡回役員問題 (TOP) は、確率論的最適化の困難なタスクです。
この問題では、駐車監視員が駐車センサーを備えた市内を案内され、できるだけ多くの駐車違反者に罰金を科します。
TOP の主な課題は、駐車違反の動的な性質であり、罰金が課されているかどうかに関係なく、ランダムに現れてはしばらくすると消えてしまいます。
したがって、解決策は、現在罰金が課されている駐車違反に動的に適応すると同時に、違反が行われている間に警察官が到着する可能性を高めるために事前に計画を立てる必要があります。
さまざまな解決策が存在しますが、これらの方法では、将来の駐車違反に罰金を科すことができるかどうかという行動の影響を考慮するのが難しいことがよくあります。
この論文では、TOP 向けの新しい空間認識型深層強化学習アプローチである SATOP を提案します。
私たちの新しい状態エンコーダーは、駐車スポット、エージェント、アクションの間の空間関係を活用して、各アクションの表現を作成します。
さらに、与えられた環境における将来の相互アクション相関を学習するための新しいメッセージパッシングモジュールを提案します。
したがって、エージェントは、アクションを実行した後、さらに駐車違反に罰金を科す可能性を推定できます。
メルボルンの実世界データに基づいた環境を使用してメソッドを評価します。
私たちの結果は、SATOP が最先端の TOP エージェントを常に上回っており、駐車違反に対して最大 22% 多くの罰金を科すことができることを示しています。

要約(オリジナル)

The traveling officer problem (TOP) is a challenging stochastic optimization task. In this problem, a parking officer is guided through a city equipped with parking sensors to fine as many parking offenders as possible. A major challenge in TOP is the dynamic nature of parking offenses, which randomly appear and disappear after some time, regardless of whether they have been fined. Thus, solutions need to dynamically adjust to currently fineable parking offenses while also planning ahead to increase the likelihood that the officer arrives during the offense taking place. Though various solutions exist, these methods often struggle to take the implications of actions on the ability to fine future parking violations into account. This paper proposes SATOP, a novel spatial-aware deep reinforcement learning approach for TOP. Our novel state encoder creates a representation of each action, leveraging the spatial relationships between parking spots, the agent, and the action. Furthermore, we propose a novel message-passing module for learning future inter-action correlations in the given environment. Thus, the agent can estimate the potential to fine further parking violations after executing an action. We evaluate our method using an environment based on real-world data from Melbourne. Our results show that SATOP consistently outperforms state-of-the-art TOP agents and is able to fine up to 22% more parking offenses.

arxiv情報

著者	Niklas Strauß,Matthias Schubert
発行日	2024-01-11 15:16:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー