Towards White Box Deep Learning

要約

ディープニューラルネットワークは脆弱な「ショートカット」機能を学習し、解釈が困難 (ブラックボックス) になり、敵対的な攻撃に対して脆弱になります。
この論文では、この問題に対する一般的なアーキテクチャ上の解決策としてセマンティック機能を提案します。
主なアイデアは、ドメインの適切なセマンティックトポロジにおいて特徴の局所性を重視し、強力な正則化を導入することです。
概念実証ネットワークは軽量で本質的に解釈可能で、敵対的トレーニングを必要とせずに、ほぼ人間レベルの敵対的テスト指標を達成します。
これらの結果とアプローチの一般的な性質により、意味論的な特徴についてさらなる研究が必要になります。
コードは https://github.com/314-Foundation/white-box-nn で入手できます。

要約(オリジナル)

Deep neural networks learn fragile ‘shortcut’ features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics – with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn

arxiv情報

著者	Maciej Satkiewicz
発行日	2024-04-17 17:58:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards White Box Deep Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー