Robust Transformer with Locality Inductive Bias and Feature Normalization

要約

ビジョントランスフォーマーは、アテンションベースのネットワークを使用して、さまざまなコンピュータービジョンタスクで最先端の結果をもたらすことが実証されています。
ただし、トランスフォーマーの研究では、ほとんどの場合、堅牢性と精度のトレードオフが調査されておらず、依然として敵対的摂動の処理に苦労しています。
このホワイトペーパーでは、敵対的摂動に対するビジョントランスフォーマーの堅牢性を調査し、ホワイトボックス攻撃設定での堅牢性/精度のトレードオフを強化しようとします。
この目的のために、Locality iN Locality (LNL) 変換モデルを提案します。
LNL への局所性の導入は、線、エッジ、形状、さらにはオブジェクトなどの局所的な情報を集約するため、ロバスト性のパフォーマンスに寄与することを証明します。
さらに、ロバスト性のパフォーマンスをさらに向上させるために、LNL がモーメント (別名、平均および標準偏差) と正規化された特徴からトレーニング信号を抽出することをお勧めします。
ドイツの交通標識認識ベンチマーク (GTSRB) とカナダ高等研究所 (CIFAR-10) の精度と堅牢性の指標に関して最先端の結果を達成することにより、LNL の有効性と一般性を検証します。
より具体的には、交通標識分類の場合、提案された LNL は、最新の研究と比較して、クリーンでロバスト性の精度に関して 1.1% および ~35% の向上をもたらします。

要約(オリジナル)

Vision transformers have been demonstrated to yield state-of-the-art results on a variety of computer vision tasks using attention-based networks. However, research works in transformers mostly do not investigate robustness/accuracy trade-off, and they still struggle to handle adversarial perturbations. In this paper, we explore the robustness of vision transformers against adversarial perturbations and try to enhance their robustness/accuracy trade-off in white box attack settings. To this end, we propose Locality iN Locality (LNL) transformer model. We prove that the locality introduction to LNL contributes to the robustness performance since it aggregates local information such as lines, edges, shapes, and even objects. In addition, to further improve the robustness performance, we encourage LNL to extract training signal from the moments (a.k.a., mean and standard deviation) and the normalized features. We validate the effectiveness and generality of LNL by achieving state-of-the-art results in terms of accuracy and robustness metrics on German Traffic Sign Recognition Benchmark (GTSRB) and Canadian Institute for Advanced Research (CIFAR-10). More specifically, for traffic sign classification, the proposed LNL yields gains of 1.1% and ~35% in terms of clean and robustness accuracy compared to the state-of-the-art studies.

arxiv情報

著者	Omid Nejati Manzari,Hossein Kashiani,Hojat Asgarian Dehkordi,Shahriar Baradaran Shokouhi
発行日	2023-01-27 06:39:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Robust Transformer with Locality Inductive Bias and Feature Normalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー