Exploring the Promise and Limits of Real-Time Recurrent Learning

要約

シーケンス処理リカレントニューラルネットワーク (RNN) のリアルタイムリカレント学習 (RTRL) には、時間逆伝播 (BPTT) に比べて概念的な利点があります。
RTRL では、過去のアクティベーションをキャッシュしたり、コンテキストを切り詰めたりする必要がなく、オンライン学習が可能になります。
ただし、RTRL は時間と空間が複雑であるため、実用的ではありません。
この問題を克服するために、RTRL に関する最近の研究は近似理論に焦点を当てていますが、実験は診断設定に限定されることがよくあります。
ここでは、より現実的な設定における RTRL の実用的な可能性を探ります。
私たちは、RTRL とポリシー勾配を組み合わせたアクタークリティカル手法を研究し、DMLab-30、ProcGen、および Atari-2600 環境のいくつかのサブセットでテストします。
DMLab メモリタスクでは、1.2 B 未満の環境フレームでトレーニングされたシステムは、10 B フレームでトレーニングされたよく知られた IMPALA および R2D2 ベースラインと競合するか、それを上回ります。
このような困難なタスクに対応するために、要素ごとの再帰性を備えた特定のよく知られたニューラルアーキテクチャに焦点を当て、近似なしで扱いやすい RTRL を可能にします。
重要なのは、マルチレイヤの場合の複雑さなど、現実世界のアプリケーションではめったに対処されない RTRL の制限についても説明することです。

要約(オリジナル)

Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL’s time and space complexity make it impractical. To overcome this problem, most recent work on RTRL focuses on approximation theories, while experiments are often limited to diagnostic settings. Here we explore the practical promise of RTRL in more realistic settings. We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari-2600 environments. On DMLab memory tasks, our system trained on fewer than 1.2 B environmental frames is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10 B frames. To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation. Importantly, we also discuss rarely addressed limitations of RTRL in real-world applications, such as its complexity in the multi-layer case.

arxiv情報

著者	Kazuki Irie,Anand Gopalakrishnan,Jürgen Schmidhuber
発行日	2024-02-28 16:40:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring the Promise and Limits of Real-Time Recurrent Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー