Adaptive Reinforcement Learning for Unobservable Random Delays

要約

標準の強化学習（RL）の設定では、エージェントと環境の間の相互作用は通常、マルコフ決定プロセス（MDP）としてモデル化されます。これは、エージェントが瞬時に観察し、遅滞なくアクションを選択し、すぐに実行すると仮定します。
サイバー物理システムなどの現実世界の動的環境では、エージェントとシステム間の相互作用が遅れているため、この仮定はしばしば崩壊します。
これらの遅延は時間の経過とともに確率的に変化する可能性があり、通常は観察できません。つまり、アクションを決定するときは不明です。
既存の方法は、遅延がはるかに低い場合でも、遅延に既知の固定上限を想定することにより、この不確実性を控えめに扱います。
この作業では、インタラクションレイヤーを紹介します。これは、エージェントが適応性があり、容認できない時間変化の遅延を適応的かつシームレスに処理できるようにする一般的なフレームワークです。
具体的には、エージェントは、予測不可能な遅延とネットワーク上で送信される失われたアクションパケットの両方を処理するために、可能な将来のアクションのマトリックスを生成します。
このフレームワークに基づいて、遅延適応を備えたモデルベースのアルゴリズム、Actor-Critic（ACDA）を開発します。これは、パターンを遅らせるために動的に調整します。
私たちの方法は、幅広い移動ベンチマーク環境にわたって最先端のアプローチを大幅に上回っています。

要約(オリジナル)

In standard Reinforcement Learning (RL) settings, the interaction between the agent and the environment is typically modeled as a Markov Decision Process (MDP), which assumes that the agent observes the system state instantaneously, selects an action without delay, and executes it immediately. In real-world dynamic environments, such as cyber-physical systems, this assumption often breaks down due to delays in the interaction between the agent and the system. These delays can vary stochastically over time and are typically unobservable, meaning they are unknown when deciding on an action. Existing methods deal with this uncertainty conservatively by assuming a known fixed upper bound on the delay, even if the delay is often much lower. In this work, we introduce the interaction layer, a general framework that enables agents to adaptively and seamlessly handle unobservable and time-varying delays. Specifically, the agent generates a matrix of possible future actions to handle both unpredictable delays and lost action packets sent over networks. Building on this framework, we develop a model-based algorithm, Actor-Critic with Delay Adaptation (ACDA), which dynamically adjusts to delay patterns. Our method significantly outperforms state-of-the-art approaches across a wide range of locomotion benchmark environments.

arxiv情報

著者	John Wikman,Alexandre Proutiere,David Broman
発行日	2025-06-17 11:11:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Reinforcement Learning for Unobservable Random Delays

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー