HyperPPO: A scalable method for finding small policies for robotic control

要約

メモリが限られた高性能ロボットのニューラル制御には、パラメーターの少ないモデルが必要です。
このような小規模なニューラルネットワークアーキテクチャを見つけるには時間がかかる場合があります。
我々は、グラフハイパーネットワークを利用して複数のニューラルアーキテクチャの重みを同時に推定するオンポリシー強化学習アルゴリズムである HyperPPO を提案します。
私たちの方法では、一般的に使用されているネットワークよりもはるかに小さいが、高パフォーマンスのポリシーをエンコードしているネットワークの重みを推定します。
サンプル効率を維持しながら複数のトレーニング済みポリシーを同時に取得し、ユーザーに計算上の制約を満たすネットワークアーキテクチャを選択する選択肢を提供します。
私たちの方法は適切に拡張できることを示します。つまり、トレーニングリソースが増えると、より高いパフォーマンスのアーキテクチャへの収束が速くなります。
HyperPPO によって推定されたニューラルポリシーが Crazyflie2.1 クアローターの分散制御が可能であることを示します。
ウェブサイト: https://sites.google.com/usc.edu/hyperppo

要約(オリジナル)

Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well – more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo

arxiv情報

著者	Shashank Hegde,Zhehui Huang,Gaurav S. Sukhatme
発行日	2023-09-28 17:58:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HyperPPO: A scalable method for finding small policies for robotic control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー