A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

要約

強化学習 (RL) は、複雑な環境における意思決定のための強力なフレームワークを提供します。
ただし、ハードウェア効率が高く、生物にヒントを得た方法で RL を実装することは依然として課題です。
この論文では、実数値観測による RL 問題を解決するための新しいスパイキングニューラルネットワーク (SNN) アーキテクチャを紹介します。
提案されたモデルには、以前の研究に基づいて構築された、時間差 (TD) エラー変調と適格性トレースが追加された、多層イベントベースのクラスタリングが組み込まれています。
アブレーション研究により、これらのコンポーネントが提案されたモデルのパフォーマンスに大きな影響を与えることが確認されています。
適格性トレースを備えた表形式のアクター批判アルゴリズムと、最先端の近接ポリシー最適化 (PPO) アルゴリズムがベンチマークとして使用されます。
当社のネットワークは常に表形式のアプローチよりも優れたパフォーマンスを発揮し、マウンテンカー、カートポール、アクロボットなどの古典的な RL 環境で安定した制御ポリシーを発見することに成功しました。
提案されたモデルは、計算要件とハードウェア実装要件の点で魅力的なトレードオフを提供します。
このモデルは外部メモリバッファもグローバル誤差勾配計算も必要とせず、ローカル学習ルールとブロードキャストされた TD エラー信号によって駆動されるシナプス更新がオンラインで発生します。
したがって、この取り組みは、よりハードウェア効率の高い RL ソリューションの開発に貢献します。

要約(オリジナル)

Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model’s performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.

arxiv情報

著者	Sergio F. Chevtchenko,Yeshwanth Bethi,Teresa B. Ludermir,Saeed Afshar
発行日	2023-08-08 10:59:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー