Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies

要約

この論文では、NeurIPS 2022 コンペティショントラックで取り上げられる競技であるリアルロボットチャレンジ (RRC) III に対する当社のソリューションを紹介します。この競技は、事前に収集されたオフラインデータからの学習を通じて、器用なロボット操作タスクに取り組むことを目的としています。
参加者には、タスクごとに 2 種類のデータセット (エキスパートデータセットとさまざまなスキルレベルの混合データセット) が提供されました。
最も単純なオフラインポリシー学習アルゴリズムである Behavioral Cloning (BC) は、専門家のデータセットでトレーニングされた場合に非常に優れたパフォーマンスを示しましたが、最も高度なオフライン強化学習 (RL) アルゴリズムでさえも優れていました。
ただし、混合データセットに適用すると BC のパフォーマンスが低下し、オフライン RL アルゴリズムのパフォーマンスも満足のいくものではありませんでした。
混合データセットを調査したところ、このデータにはラベルが付いていませんでしたが、大量の専門家データが含まれていることがわかりました。
この問題に対処するために、混合データセット内の基礎となる専門家の行動を特定し、専門家データを効果的に分離する半教師あり学習ベースの分類器を提案しました。
BC のパフォーマンスをさらに向上させるために、RRC アリーナの幾何学的対称性を利用して、数学的変換を通じてトレーニングデータセットを強化しました。
最終的に、私たちの提案は、複雑なオフライン RL アルゴリズムや複雑なデータ処理、特徴量エンジニアリング技術を採用した参加者を含め、他のすべての参加者の提案を上回りました。

要約(オリジナル)

This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy learning algorithm, Behavioral Cloning (BC), performed remarkably well when trained on expert datasets, it outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, BC’s performance deteriorated when applied to mixed datasets, and the performance of offline RL algorithms was also unsatisfactory. Upon examining the mixed datasets, we observed that they contained a significant amount of expert data, although this data was unlabeled. To address this issue, we proposed a semi-supervised learning-based classifier to identify the underlying expert behavior within mixed datasets, effectively isolating the expert data. To further enhance BC’s performance, we leveraged the geometric symmetry of the RRC arena to augment the training dataset through mathematical transformations. In the end, our submission surpassed that of all other participants, even those who employed complex offline RL algorithms and intricate data processing and feature engineering techniques.

arxiv情報

著者	Qiang Wang,Robert McCarthy,David Cordova Bulens,Francisco Roldan Sanchez,Kevin McGuinness,Noel E. O’Connor,Stephen J. Redmond
発行日	2023-09-21 10:39:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー