LLMs for Domain Generation Algorithm Detection

要約

この研究では、ドメイン生成アルゴリズム (DGA) を検出するための大規模言語モデル (LLM) の使用を分析します。
私たちは、インコンテキスト学習 (ICL) と教師あり微調整 (SFT) という 2 つの重要な技術の詳細な評価を実行し、それらが検出をどのように改善できるかを示します。
SFT はドメイン固有のデータを使用することでパフォーマンスを向上させますが、ICL は、検出モデルが多くの再トレーニングを必要とせずに新しい脅威に迅速に適応できるように支援します。
当社では、68 のマルウェアファミリと通常のドメインを含むカスタムデータセット上で Meta の Llama3 8B モデルを使用し、最近のワードベースの DGA を含むいくつかの検出が困難なスキームをカバーしています。
結果は、LLM ベースの方法が DGA 検出において競合する結果を達成できることを証明しました。
特に、SFT ベースの LLM DGA 検出器は、アテンションレイヤーを使用した最先端のモデルよりも優れたパフォーマンスを発揮し、4% の偽陽性率 (FPR) で 94% の精度を達成し、ワードベースの DGA ドメインの検出に優れています。

要約(オリジナル)

This work analyzes the use of large language models (LLMs) for detecting domain generation algorithms (DGAs). We perform a detailed evaluation of two important techniques: In-Context Learning (ICL) and Supervised Fine-Tuning (SFT), showing how they can improve detection. SFT increases performance by using domain-specific data, whereas ICL helps the detection model to quickly adapt to new threats without requiring much retraining. We use Meta’s Llama3 8B model, on a custom dataset with 68 malware families and normal domains, covering several hard-to-detect schemes, including recent word-based DGAs. Results proved that LLM-based methods can achieve competitive results in DGA detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR) and excelling at detecting word-based DGA domains.

arxiv情報

著者	Reynier Leyva La O,Carlos A. Catania,Tatiana Parlanti
発行日	2024-11-05 18:01:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLMs for Domain Generation Algorithm Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー