LORD: Large Models based Opposite Reward Design for Autonomous Driving


強化学習 (RL) ベースの自動運転は、データ駆動型の模倣学習アプローチに代わる有望な代替手段として浮上しています。
ただし、RL の効果的な報酬関数を作成するには、さまざまなシナリオにわたる適切な運転行動の定義と定量化が複雑であるため、課題が生じます。
この研究では、望ましくない言語目標を通じた逆報酬設計に基づく新しい大規模モデルである LORD を紹介し、大規模な事前トレーニング済みモデルをゼロショット報酬モデルとして効率的に使用できるようにします。


Reinforcement learning (RL) based autonomous driving has emerged as a promising alternative to data-driven imitation learning approaches. However, crafting effective reward functions for RL poses challenges due to the complexity of defining and quantifying good driving behaviors across diverse scenarios. Recently, large pretrained models have gained significant attention as zero-shot reward models for tasks specified with desired linguistic goals. However, the desired linguistic goals for autonomous driving such as ‘drive safely’ are ambiguous and incomprehensible by pretrained models. On the other hand, undesired linguistic goals like ‘collision’ are more concrete and tractable. In this work, we introduce LORD, a novel large models based opposite reward design through undesired linguistic goals to enable the efficient use of large pretrained models as zero-shot reward models. Through extensive experiments, our proposed framework shows its efficiency in leveraging the power of large pretrained models for achieving safe and enhanced autonomous driving. Moreover, the proposed approach shows improved generalization capabilities as it outperforms counterpart methods across diverse and challenging driving scenarios.


著者 Xin Ye,Feng Tao,Abhirup Mallik,Burhaneddin Yaman,Liu Ren
発行日 2024-03-27 19:30:06+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO パーマリンク