For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

要約

近年、事前にトレーニングされた視覚モデルをモーター制御に活用することに注目が集まっています。
既存の研究では主にこの事前トレーニング段階の重要性が強調されていますが、制御固有の微調整中に下流のポリシー学習が果たすおそらく同様に重要な役割は無視されることがよくあります。
したがって、事前トレーニングされた視覚モデルが、さまざまな制御ポリシーの下でその有効性において一貫しているかどうかは不明のままです。
この理解のギャップを埋めるために、強化学習 (RL)、行動クローニングによる模倣学習 (BC)、および視覚的な模倣学習を含む 3 つの異なるクラスのポリシー学習手法を使用して、14 の事前トレーニング済みビジョンモデルに関する包括的な研究を実施します。
報酬関数 (VRF)。
私たちの研究では、事前トレーニングの有効性が下流のポリシー学習アルゴリズムの選択に大きく依存するという発見を含む、一連の興味深い結果が得られました。
我々は、RL 手法に基づいて従来受け入れられてきた評価は変動性が高く、したがって信頼性が低いことを示し、VRF や BC などのより堅牢な手法を使用することをさらに提唱します。
将来、事前トレーニング済みモデルとそのポリシー学習方法のより普遍的な評価を促進するために、私たちは作業と並行して、3 つの異なる環境にわたる 21 のタスクのベンチマークもリリースします。

要約(オリジナル)

In recent years, increasing attention has been directed to leveraging pre-trained vision models for motor control. While existing works mainly emphasize the importance of this pre-training phase, the arguably equally important role played by downstream policy learning during control-specific fine-tuning is often neglected. It thus remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies. To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF). Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm. We show that conventionally accepted evaluation based on RL methods is highly variable and therefore unreliable, and further advocate for using more robust methods like VRF and BC. To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work.

arxiv情報

著者	Yingdong Hu,Renhao Wang,Li Erran Li,Yang Gao
発行日	2023-06-20 08:23:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー