PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning

要約

強化学習の実世界への応用が一般的になりつつある一方で、RL システムのセキュリティと堅牢性はさらに注目され、研究される価値があります。
特に、最近の研究では、マルチエージェント RL 環境では、バックドアトリガーアクションが被害者エージェント (別名トロイの木馬エージェント) に注入される可能性があり、バックドアトリガーアクションが検出されるとすぐに壊滅的な障害が発生する可能性があることが明らかになりました。
悪意のあるバックドアに対する RL エージェントのセキュリティを確保するために、この研究では、トロイの木馬エージェントとそれに対応する潜在的なトリガーアクションを検出することを目的として、マルチエージェント競争強化学習システムにおけるバックドア検出の問題を提案します。
トロイの木馬の動作を軽減しようとしています。
この問題を解決するために、アクティブ化されたトロイの木馬エージェントが蓄積した報酬がいくつかのタイムステップ後に著しく低下するという特性に基づいた PolicyCleanse を提案します。
PolicyCleanse とともに、検出されたバックドアを効果的に軽減できる機械の非学習ベースのアプローチも設計しています。
広範な実験により、提案された方法がトロイの木馬エージェントを正確に検出でき、さまざまな種類のエージェントや環境にわたって勝率において既存のバックドア軽減ベースラインアプローチを少なくとも 3% 上回ることが実証されました。

要約(オリジナル)

While real-world applications of reinforcement learning are becoming popular, the security and robustness of RL systems are worthy of more attention and exploration. In particular, recent works have revealed that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. Trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their Trojan behavior. In order to solve this problem, we propose PolicyCleanse that is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps. Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor. Extensive experiments demonstrate that the proposed methods can accurately detect Trojan agents, and outperform existing backdoor mitigation baseline approaches by at least 3% in winning rate across various types of agents and environments.

arxiv情報

著者	Junfeng Guo,Ang Li,Cong Liu
発行日	2023-09-14 08:15:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー