Towards Characterizing Cyber Networks with Large Language Models

要約

脅威ハンティングでは、ノイズの多い大規模な高次元データを分析して、まばらな敵対的な動作を見つけます。
私たちは、敵対的な活動は、たとえそれが偽装されていたとしても、高次元空間で完全に隠すのは非常に難しいと信じています。
この論文では、サイバーデータのこれらの潜在的な特徴を利用して、Cyber Log Embeddings Model (CLEM) と呼ばれるプロトタイプツールを介して異常を発見します。
CLEM は、実世界の運用ネットワークとモノのインターネット (IoT) サイバーセキュリティテストベッドの両方からの Zeek ネットワークトラフィックログでトレーニングされました。
モデルは、各ウィンドウを厳密に特徴付けるために、データのスライディングウィンドウで意図的にオーバートレーニングされます。
調整済みランドインデックス (ARI) を使用して、CLEM 出力の K 平均法クラスタリングとエンベディングの専門家によるラベル付けを比較します。
私たちのアプローチは、自然言語モデリングを使用してサイバーデータを理解することが期待できることを示しています。

要約(オリジナル)

Threat hunting analyzes large, noisy, high-dimensional data to find sparse adversarial behavior. We believe adversarial activities, however they are disguised, are extremely difficult to completely obscure in high dimensional space. In this paper, we employ these latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM). CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed. The model is deliberately overtrained on a sliding window of data to characterize each window closely. We use the Adjusted Rand Index (ARI) to comparing the k-means clustering of CLEM output to expert labeling of the embeddings. Our approach demonstrates that there is promise in using natural language modeling to understand cyber data.

arxiv情報

著者	Alaric Hartsock,Luiz Manella Pereira,Glenn Fink
発行日	2024-11-11 16:09:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Characterizing Cyber Networks with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー