MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning

要約

マルウェアの最近の成長と増殖により、実践者は、マルウェアファミリに従って新しいサンプルを迅速に分類する能力をテストしました。
労働集約的なリバースエンジニアリングの取り組みとは対照的に、機械学習アプローチは速度と精度の向上を実証しています。
ただし、ほとんどの既存のディープラーニングマルウェアファミリ分類器は、トレーニング前に手動で分析される多数のサンプルを使用して調整する必要があります。
さらに、トレーニングセットの範囲を超えた新しいマルウェアサンプルが発生するにつれて、トレーニングセットを更新するために追加のリバースエンジニアリングの努力を採用する必要があります。
野生で見つかった新しいサンプルの膨大な量は、実務家に、最新の分類器を適切にトレーニングするのに十分なマルウェアをリバースエンジニアリングする能力にかなりの圧力をかけます。
この論文では、まばらなトレーニングデータで高精度を達成する半監視学習を使用して、マルウェアファミリ分類器であるMalmixerを提示します。
マルウェア特徴表現のためのドメイン知識認識データ増強手法を提示し、半監視されたマルウェアファミリ分類の少数のパフォーマンスを向上させます。
Malmixerが、少数のショットマルウェアファミリ分類設定で最先端のパフォーマンスを達成していることを示しています。
私たちの調査では、マルウェアの特徴のための軽量のドメイン知識認識データ増強方法の実現可能性と有効性を確認し、マルウェア分類の問題に対処する際の同様の半監視分類器の機能を示しています。

要約(オリジナル)

Recent growth and proliferation of malware have tested practitioners ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are painstakingly manually analyzed before training. Furthermore, as novel malware samples arise that are beyond the scope of the training set, additional reverse engineering effort must be employed to update the training set. The sheer volume of new samples found in the wild creates substantial pressure on practitioners ability to reverse engineer enough malware to adequately train modern classifiers. In this paper, we present MalMixer, a malware family classifier using semi-supervised learning that achieves high accuracy with sparse training data. We present a domain-knowledge-aware data augmentation technique for malware feature representations, enhancing few-shot performance of semi-supervised malware family classification. We show that MalMixer achieves state-of-the-art performance in few-shot malware family classification settings. Our research confirms the feasibility and effectiveness of lightweight, domain-knowledge-aware data augmentation methods for malware features and shows the capabilities of similar semi-supervised classifiers in addressing malware classification issues.

arxiv情報

著者	Jiliang Li,Yifan Zhang,Yu Huang,Kevin Leach
発行日	2025-04-17 17:51:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー