QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

要約

マルチタスク強化学習 (MTRL) は、複数のタスクを同時に学習して、個別に学習するよりもサンプル効率を高めることを目的としています。
従来の方法では、タスク間でパラメーターまたは再ラベル付けされたデータを共有することでこれを実現します。
この作業では、タスク間で行動ポリシーを共有するための新しいフレームワークを導入します。これは、既存の MTRL メソッドに加えて使用できます。
重要なアイデアは、他のタスクポリシーの動作を採用することで、各タスクのオフポリシーデータ収集を改善することです。
1 つのタスクで取得された役立つ行動を選択的に共有して、別のタスクのトレーニングデータを収集すると、より高品質な軌跡が得られ、サンプル効率の高い MTRL につながります。
したがって、タスクの Q 関数を使用して有用な共有可能な動作を評価および選択することにより、異なるタスクポリシー間で動作を選択的に共有する、Q スイッチ混合ポリシー (QMP) と呼ばれるシンプルで原理的なフレームワークを導入します。
QMP が基礎となる RL アルゴリズムのサンプル効率をどのように向上させるかを理論的に分析します。
私たちの実験によると、QMP の行動ポリシー共有は、多くの一般的な MTRL アルゴリズムを補完する利点を提供し、さまざまな操作、移動、ナビゲーション環境で行動を共有する代替方法よりも優れたパフォーマンスを発揮します。
ビデオは https://qmp-mtrl.github.io でご覧いただけます。

要約(オリジナル)

Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately. Traditional methods achieve this by sharing parameters or relabeled data between tasks. In this work, we introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods. The key idea is to improve each task’s off-policy data collection by employing behaviors from other task policies. Selectively sharing helpful behaviors acquired in one task to collect training data for another task can lead to higher-quality trajectories, leading to more sample-efficient MTRL. Thus, we introduce a simple and principled framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task’s Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP’s behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Videos are available at https://qmp-mtrl.github.io.

arxiv情報

著者	Grace Zhang,Ayush Jain,Injune Hwang,Shao-Hua Sun,Joseph J. Lim
発行日	2024-10-07 10:04:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー