Neural Networks Optimizations Against Concept and Data Drift in Malware Detection

要約

マルウェア検出における機械学習モデルの有望な結果にもかかわらず、マルウェアの絶え間ない進化によるコンセプトのドリフトの問題に直面しています。
新しいファイルのデータ分布がトレーニングファイルとは異なるため、定期的なモデルの更新が必要になるため、時間の経過とともにパフォーマンスが低下します。
この研究では、ドリフト問題に対処するためにベースラインニューラルネットワークを改善するためのモデルに依存しないプロトコルを提案します。
我々は、可能な限り最新の検証セットを使用した特徴量削減とトレーニングの重要性を示し、ドリフトに対してより効果的な古典的なバイナリクロスエントロピーを改良した、ドリフト耐性のあるバイナリクロスエントロピーという名前の損失関数を提案します。
私たちは EMBER データセット (2018 年) でモデルをトレーニングし、2020 年から 2023 年の間に収集された最近の悪意のあるファイルのデータセットで評価しました。私たちの改良されたモデルは、ベースラインモデルよりも 15.2% 多くのマルウェアを検出するという有望な結果を示しています。

要約(オリジナル)

Despite the promising results of machine learning models in malware detection, they face the problem of concept drift due to malware constant evolution. This leads to a decline in performance over time, as the data distribution of the new files differs from the training one, requiring regular model update. In this work, we propose a model-agnostic protocol to improve a baseline neural network to handle with the drift problem. We show the importance of feature reduction and training with the most recent validation set possible, and propose a loss function named Drift-Resilient Binary Cross-Entropy, an improvement to the classical Binary Cross-Entropy more effective against drift. We train our model on the EMBER dataset (2018) and evaluate it on a dataset of recent malicious files, collected between 2020 and 2023. Our improved model shows promising results, detecting 15.2% more malware than a baseline model.

arxiv情報

著者	William Maillet,Benjamin Marais
発行日	2023-08-21 16:13:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Neural Networks Optimizations Against Concept and Data Drift in Malware Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー