Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

要約

実際のシナリオで視覚強化学習（RL）のアプリケーションを成功させるには、同じタスクを使用するさまざまな環境にまたがる一般化が重要です。
ただし、実際のシーンで一般的な視覚的な注意散漫は、高次元の観察から視覚的なRLで学習された表現に悪影響を及ぼし、一般化のパフォーマンスを低下させる可能性があります。
この問題に取り組むために、報酬信号はRLでタスクに関連し、視覚的に不変であるため、報酬シーケンス分布（RSD）を学習することにより、タスクに関連する情報を抽出する新しいアプローチ、つまり特性報酬シーケンス予測（CRESP）を提案します。
気晴らし。
具体的には、RSDを介してタスク関連情報を効果的にキャプチャするために、CRESPは補助タスクを導入します。つまり、RSDの特性関数を予測して、タスク関連の表現を学習します。
対応する特性関数。
実験は、CRESPが目に見えない環境での一般化のパフォーマンスを大幅に改善し、さまざまな視覚的注意散漫を伴うDeepMindControlタスクのいくつかの最先端技術を上回っていることを示しています。

要約(オリジナル)

Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning (RL) in real scenarios. However, visual distractions — which are common in real scenes — from high-dimensional observations can be hurtful to the learned representations in visual RL, thus degrading the performance of generalization. To tackle this problem, we propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information by learning reward sequence distributions (RSDs), as the reward signals are task-relevant in RL and invariant to visual distractions. Specifically, to effectively capture the task-relevant information via RSDs, CRESP introduces an auxiliary task — that is, predicting the characteristic functions of RSDs — to learn task-relevant representations, because we can well approximate the high-dimensional distributions by leveraging the corresponding characteristic functions. Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with different visual distractions.

arxiv情報

著者	Rui Yang,Jie Wang,Zijie Geng,Mingxuan Ye,Shuiwang Ji,Bin Li,Feng Wu
発行日	2022-06-30 14:08:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー