Reward Shaping for Happier Autonomous Cyber Security Agents

要約

機械学習モデルの能力が高まるにつれて、複雑なタスクを解決する可能性が高まりました。
最も有望な方向性の 1 つは、深層強化学習を使用して、コンピューターネットワーク防御タスクで自律エージェントを訓練します。
この研究では、このタスクのトレーニング時にエージェントに提供される報酬信号の影響を研究します。
サイバーセキュリティタスクの性質により、報酬シグナルは通常、1) ペナルティの形で (セキュリティ侵害が発生した場合など)、2) 各防御エピソードにわたってまばらに分散されます。
このような報酬特性は、エージェントが進歩に対して定期的に報酬を得るという古典的な強化学習タスクでは典型的ではありません (失敗に対して時折ペナルティを受けることを参照)。
私たちは、エージェントがサンプルをより効率的にトレーニングし、潜在的により良いパフォーマンスに収束できるように、このギャップを埋めることができる報酬形成手法を研究しています。
まず、深層強化学習アルゴリズムがペナルティの大きさとその相対的なサイズに敏感であることを示します。
次に、ペナルティとプラスの外部報酬を組み合わせて、ペナルティのみのトレーニングと比較してその効果を研究します。
最後に、本質的な好奇心を内部のポジティブな報酬メカニズムとして評価し、それが高レベルのネットワーク監視タスクにとってそれほど有利ではない理由について説明します。

要約(オリジナル)

As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of the penalties and their relative size. Then, we combine penalties with positive external rewards and study their effect compared to penalty-only training. Finally, we evaluate intrinsic curiosity as an internal positive reward mechanism and discuss why it might not be as advantageous for high-level network monitoring tasks.

arxiv情報

著者	Elizabeth Bates,Vasilios Mavroudis,Chris Hicks
発行日	2023-10-20 15:04:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reward Shaping for Happier Autonomous Cyber Security Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー