Quasimetric Value Functions with Dense Rewards

要約

強化学習 (RL) をパラメータ化可能な目標に一般化したものとして、目標条件付き RL (GCRL) は、特にロボット工学における困難なタスクに広範囲に応用できます。
最近の研究では、GCRL $Q^\ast(s,a,g)$ の最適値関数が準計量構造を持っていることが確立され、そのような構造を尊重するターゲットを絞ったニューラルアーキテクチャにつながりました。
ただし、関連する分析は、サンプルの複雑さを悪化させる既知の要因である、報酬設定がまばらであることを前提としています。
準計量、つまり三角不等式を支える重要な特性が、密な報酬設定の下でも同様に保存されることを示します。
密な報酬はGCRLに有害であることが示された以前の発見とは対照的に、我々は三角不等式に必要な重要な条件を特定した。
この条件を満たす高密度報酬関数は、サンプルの複雑性を改善するだけであり、悪化することはありません。
これにより、高密度の報酬を備えた効率的なニューラルアーキテクチャをトレーニングする機会が開かれ、サンプルの複雑さに対するメリットがさらに高まります。
この提案を、困難な連続制御タスクを特徴とする GCRL の 12 の標準ベンチマーク環境で評価します。
私たちの経験的な結果は、高密度の報酬設定で準計量値関数をトレーニングすると、実際に疎な報酬を使用したトレーニングよりも優れたパフォーマンスを発揮することを確認しています。

要約(オリジナル)

As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s,a,g)$ has a quasimetric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses assume a sparse reward setting — a known aggravating factor to sample complexity. We show that the key property underpinning a quasimetric, viz., the triangle inequality, is preserved under a dense reward setting as well. Contrary to earlier findings where dense rewards were shown to be detrimental to GCRL, we identify the key condition necessary for triangle inequality. Dense reward functions that satisfy this condition can only improve, never worsen, sample complexity. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity. We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasimetric value function in our dense reward setting indeed outperforms training with sparse rewards.

arxiv情報

著者	Khadichabonu Valieva,Bikramjit Banerjee
発行日	2024-09-13 11:26:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Quasimetric Value Functions with Dense Rewards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー