Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning

要約

Autonomous Mobility-on-Demand (AMoD) システムは、中央で調整された自動運転車のフリートが移動リクエストに動的に対応する、進化する交通モードです。
これらのシステムの制御は通常、大規模なネットワーク最適化問題として定式化され、強化学習 (RL) がこの分野の未解決の課題を解決する有望なアプローチとして最近浮上しています。
最近の集中型 RL アプローチは、オンラインデータからの学習に焦点を当てており、現実世界の交通システム内でのインタラクションのサンプルあたりのコストを無視しています。
これらの制限に対処するために、オフライン強化学習のレンズを通じて AMoD システムの制御を形式化し、現在のモビリティオペレータがすぐに利用できるオフラインデータのみを使用して効果的な制御戦略を学習することを提案します。
私たちは設計上の決定をさらに調査し、実世界のモビリティシステムからのデータに基づいて、オフライン学習によって (i) オンライン手法と同等のパフォーマンスを示し、(ii) サンプル効率の高いオンライン微調整が可能になる AMoD 制御ポリシーをどのように回復できるかを示す経験的証拠を提供します。
-チューニング、(iii) 複雑なシミュレーション環境の必要性を排除します。
重要なのは、この論文が、オフライン RL がモビリティシステムなどの経済的に重要なシステム内で RL ベースのソリューションを適用するための有望なパラダイムであることを実証していることです。

要約(オリジナル)

Autonomous Mobility-on-Demand (AMoD) systems are an evolving mode of transportation in which a centrally coordinated fleet of self-driving vehicles dynamically serves travel requests. The control of these systems is typically formulated as a large network optimization problem, and reinforcement learning (RL) has recently emerged as a promising approach to solve the open challenges in this space. Recent centralized RL approaches focus on learning from online data, ignoring the per-sample-cost of interactions within real-world transportation systems. To address these limitations, we propose to formalize the control of AMoD systems through the lens of offline reinforcement learning and learn effective control strategies using solely offline data, which is readily available to current mobility operators. We further investigate design decisions and provide empirical evidence based on data from real-world mobility systems showing how offline learning allows to recover AMoD control policies that (i) exhibit performance on par with online methods, (ii) allow for sample-efficient online fine-tuning and (iii) eliminate the need for complex simulation environments. Crucially, this paper demonstrates that offline RL is a promising paradigm for the application of RL-based solutions within economically-critical systems, such as mobility systems.

arxiv情報

著者	Carolin Schmidt,Daniele Gammelli,Francisco Camara Pereira,Filipe Rodrigues
発行日	2023-08-25 14:28:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー