Offline Imitation Learning with Variational Counterfactual Reasoning

要約

オフライン模倣学習 (IL) では、エージェントは追加のオンライン環境での対話を行わずに、最適な専門家の行動ポリシーを学習することを目指します。
ただし、ロボット操作などの現実世界の多くのシナリオでは、オフラインデータセットは報酬なしで次善の行動から収集されます。
専門家データが不足しているため、エージェントは通常、不十分な軌道を記憶するだけで問題が発生し、環境の変化に弱く、新しい環境に一般化する能力がありません。
高品質の専門家データを自動的に生成し、エージェントの汎化能力を向上させるために、\underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}antifactual data \underline という名前のフレームワークを提案します。
{A}反事実推論による拡張 (OILCA)。
特に、識別可能な変分オートエンコーダーを利用して、専門家によるデータ拡張のための \textit{counterfactual} サンプルを生成します。
生成された専門家データの影響と汎化性の向上を理論的に分析します。
さらに、私たちは広範な実験を行って、私たちのアプローチがディストリビューション内のパフォーマンスに関する \textsc{DeepMind Control Suite} ベンチマークとディストリビューション外の一般化に関する \textsc{CausalWorld} ベンチマークの両方でさまざまなベースラインを大幅に上回るパフォーマンスを示していることを実証しています。
私たちのコードは \url{https://github.com/ZexuSun/OILCA-NeurIPS23} で入手できます。

要約(オリジナル)

In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.

arxiv情報

著者	Bowei He,Zexu Sun,Jinxin Liu,Shuai Zhang,Xu Chen,Chen Ma
発行日	2023-12-29 09:40:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Offline Imitation Learning with Variational Counterfactual Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー