Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

要約

操縦の意思決定は、自律空中戦闘用の無人戦闘航空機の中核です。
この問題を解決するために、我々はエージェントが空戦における効果的な判断を一から学習できるようにする自動カリキュラム強化学習法を提案します。
初期状態の範囲は、さまざまな難易度のカリキュラムを区別するために使用され、それによって操作の決定が簡単から難しいまでの一連のサブタスクに分割され、テスト結果はサブタスクを変更するために使用されます。
サブタスクが変化するにつれて、エージェントは簡単なものから難しいものまで一連のサブタスクを完了することを徐々に学習し、報酬関数の設計に労力を費やすことなく、さまざまな状態に対処するための効果的な操作決定を下せるようになります。
研究されたアブレーションは、この記事で提案されている自動カリキュラム学習が強化学習によるトレーニングに不可欠な要素であること、つまりエージェントはカリキュラム学習なしでは効果的な意思決定を完了できないことを示しています。
シミュレーション実験により、トレーニング後、エージェントは、追跡、攻撃、逃走などのさまざまな状態を考慮して、合理的かつ解釈可能な効果的な意思決定を行うことができることが示されています。

要約(オリジナル)

Maneuver decision-making is the core of unmanned combat aerial vehicle for autonomous air combat. To solve this problem, we propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch. The range of initial states are used for distinguishing curricula of different difficulty levels, thereby maneuver decision is divided into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.

arxiv情報

著者	Zhang Hong-Peng
発行日	2023-07-12 13:20:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー