Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation

要約

ディープリプレゼンテーションラーニングを利用した強化学習 (RL) は、手動設計なしで自動運転 (SD) タスクを解決できるエンドツーエンドの学習フレームワークを提供します。
ただし、時変する非定常環境では、熟達しているが特殊化された RL ポリシーが実行時に失敗します。
たとえば、晴れた日にトレーニングされた RL ベースの SD ポリシーは、雨天にはうまく一般化されません。
メタ学習によって RL エージェントは新しいタスクや環境に適応できるようになりますが、そのオフライン操作では、非定常環境に直面したときにエージェントにオンライン適応能力を与えることができません。
この研究では、\emph{推測的オンライン先読み適応} (COLA) に基づくオンラインメタ強化学習アルゴリズムを提案します。
COLA は、先読みホライズンでの将来のパフォーマンスに関するエージェントの推測を最大化することにより、すべてのステップでオンライン適応を決定します。
実験結果は、動的に変化する天候や照明条件の下で、COLA ベースの自己適応型運転が、オンライン適応性の点でベースラインポリシーよりも優れていることを示しています。
デモビデオ、ソースコード、付録は、{\tt https://github.com/Panshark/COLA} で入手できます。

要約(オリジナル)

Powered by deep representation learning, reinforcement learning (RL) provides an end-to-end learning framework capable of solving self-driving (SD) tasks without manual designs. However, time-varying nonstationary environments cause proficient but specialized RL policies to fail at execution time. For example, an RL-based SD policy trained under sunny days does not generalize well to rainy weather. Even though meta learning enables the RL agent to adapt to new tasks/environments, its offline operation fails to equip the agent with online adaptation ability when facing nonstationary environments. This work proposes an online meta reinforcement learning algorithm based on the \emph{conjectural online lookahead adaptation} (COLA). COLA determines the online adaptation at every step by maximizing the agent’s conjecture of the future performance in a lookahead horizon. Experimental results demonstrate that under dynamically changing weather and lighting conditions, the COLA-based self-adaptive driving outperforms the baseline policies in terms of online adaptability. A demo video, source code, and appendixes are available at {\tt https://github.com/Panshark/COLA}

arxiv情報

著者	Tao Li,Haozhe Lei,Quanyan Zhu
発行日	2023-03-08 01:48:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー