Distributed TD(0) with Almost No Communication

要約

線形関数近似を使用した分散時間差分学習の新しい非漸近分析を提供します。
私たちのアプローチは、$N$ エージェントが TD(0) メソッドの同一のローカルコピーを実行し、最後に 1 回だけ結果を平均する「ワンショット平均化」に依存しています。
線形時間高速化現象のバージョンを示します。分散プロセスの収束時間は、TD(0) の収束時間よりも $N$ 倍速くなります。
これは、時間差分法に対する並列処理の利点を証明した最初の結果です。

要約(オリジナル)

We provide a new non-asymptotic analysis of distributed temporal difference learning with linear function approximation. Our approach relies on “one-shot averaging,” where $N$ agents run identical local copies of the TD(0) method and average the outcomes only once at the very end. We demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of $N$ faster than the convergence time of TD(0). This is the first result proving benefits from parallelism for temporal difference methods.

arxiv情報

著者	Rui Liu,Alex Olshevsky
発行日	2023-05-25 17:00:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Distributed TD(0) with Almost No Communication

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー