Learning from Demonstration with Implicit Nonlinear Dynamics Models

要約

Learning from Demonstration (LfD) は、複雑な動作を伴うタスクを解決するトレーニングポリシーの有用なパラダイムです。
実際には、LfD の適用を成功させるには、ポリシー実行中のエラーの蓄積、つまり、時間の経過とともに悪化するエラーと、その結果として生じる配布外の動作によるドリフトの問題を克服する必要があります。
既存の研究は、データ収集のスケーリング、人間参加型による政策エラーの修正、政策予測の時間的アンサンブル、または動的システムモデルのパラメーターの学習を通じて、この問題に対処しようとしています。
この研究では、この問題を克服するための代替アプローチを提案し、検証します。
リザーバーコンピューティングからインスピレーションを得て、調整可能な動的特性を持つ固定非線形動的システムを含む新しいニューラルネットワーク層を開発します。
LASA 手書きデータセットを使用して、人間の手書き動作を再現するタスクにおけるニューラルネットワーク層の有効性を検証します。
実証実験を通じて、私たちの層を既存のニューラルネットワークアーキテクチャに組み込むことで、LfD の複合誤差の問題が解決されることを実証します。
さらに、政策予測の時間的アンサンブルやエコーステートネットワーク (ESN) の実装など、既存のアプローチとの比較評価を実行します。
私たちのアプローチは、複数のダイナミクス領域に一般化して競争力のあるレイテンシスコアを維持しながら、手書きタスクにおいてより優れたポリシーの精度と堅牢性を実現することがわかりました。

要約(オリジナル)

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning the parameters of a dynamical system model. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a novel neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Networks (ESNs) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

arxiv情報

著者	Peter David Fagan,Subramanian Ramamoorthy
発行日	2024-09-27 14:12:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning from Demonstration with Implicit Nonlinear Dynamics Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー