Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

要約

標的型ポイズニング攻撃では、攻撃者はエージェントと環境の相互作用を操作して、ターゲットポリシーと呼ばれる目的のポリシーをエージェントに採用させます。
以前の研究は、報酬や遷移などの標準 MDP プリミティブを変更する攻撃に主に焦点を当てていました。
このホワイトペーパーでは、攻撃者がピアのポリシーを変更することにより、エージェントの1つの有効な環境を暗黙的にポイズニングする2エージェント設定での標的型ポイズニング攻撃を研究します。
最適な攻撃を設計するための最適化フレームワークを開発します。ここで、攻撃のコストは、ソリューションがピアエージェントの想定されるデフォルトポリシーからどれだけ逸脱しているかを測定します。
この最適化フレームワークの計算特性をさらに研究します。
表形式の設定に焦点を当てると、常に実行可能な MDP プリミティブ (遷移と (無制限の) 報酬) に基づくポイズニング攻撃とは対照的に、暗黙的なポイズニング攻撃の実行可能性を判断するのは NP 困難であることを示します。
攻撃問題の実現可能性に十分な条件を確立する特性評価の結果と、攻撃の最適コストの上限と下限を提供します。
最適な敵対的ポリシーを見つけるための 2 つのアルゴリズムアプローチを提案します。表形式のポリシーを使用するモデルベースのアプローチと、パラメトリック/ニューラルポリシーを使用するモデルを使用しないアプローチです。
実験を通じて、提案されたアルゴリズムの有効性を紹介します。

要約(オリジナル)

In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy. Prior work has primarily focused on attacks that modify standard MDP primitives, such as rewards or transitions. In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer. We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent. We further study the computational properties of this optimization framework. Focusing on a tabular setting, we show that in contrast to poisoning attacks based on MDP primitives (transitions and (unbounded) rewards), which are always feasible, it is NP-hard to determine the feasibility of implicit poisoning attacks. We provide characterization results that establish sufficient conditions for the feasibility of the attack problem, as well as an upper and a lower bound on the optimal cost of the attack. We propose two algorithmic approaches for finding an optimal adversarial policy: a model-based approach with tabular policies and a model-free approach with parametric/neural policies. We showcase the efficacy of the proposed algorithms through experiments.

arxiv情報

著者	Mohammad Mohammadi,Jonathan Nöther,Debmalya Mandal,Adish Singla,Goran Radanovic
発行日	2023-02-27 14:52:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー