Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

要約

インサイダーの脅威は、組織に大きな影響を与え、少数に不均衡になります。
これは、内部アクセスインサイダーがシステム、情報、インフラストラクチャに必要なためです。
％この影響の1つの例は、匿名の回答者が組織にインサイダーの脅威リスクであるWebベースの求人サイトのレビューを提出する場所です。
このようなリスクのシグナルは、パブリックWebベースのジョブ検索サイトのレビューへの匿名の提出に見られる場合があります。
この調査では、大規模な言語モデル（LLMS）がインサイダーの脅威感情を分析および検出する可能性を研究しています。
倫理的データ収集の懸念に対処するこの研究では、既存の雇用レビューデータセットとともにLLMを使用して合成データ生成を利用しています。
LLMSによって生成された感情スコアの比較分析は、専門家の人間のスコアリングに対してベンチマークされています。
調査結果は、LLMがほとんどの場合、人間の評価との整合性を示しているため、脅威感情の微妙な指標を効果的に特定することが明らかになりました。
このパフォーマンスは、合成データよりも人間で生成されたデータの方が低く、実際のデータの評価において改善の領域を示唆しています。
テキストの多様性分析では、人間で生成されたデータセットとLLM生成データセットの違いが見つかり、合成データは多少多様性を示しています。
全体として、結果は、インサイダーの脅威検出に対するLLMの適用性と、データ収集に関連する倫理的および物流的障壁を克服することにより、インサイダーセンチメントテストのためのスケーラブルなソリューションを示しています。

要約(オリジナル)

Insider threats wield an outsized influence on organizations, disproportionate to their small numbers. This is due to the internal access insiders have to systems, information, and infrastructure. %One example of this influence is where anonymous respondents submit web-based job search site reviews, an insider threat risk to organizations. Signals for such risks may be found in anonymous submissions to public web-based job search site reviews. This research studies the potential for large language models (LLMs) to analyze and detect insider threat sentiment within job site reviews. Addressing ethical data collection concerns, this research utilizes synthetic data generation using LLMs alongside existing job review datasets. A comparative analysis of sentiment scores generated by LLMs is benchmarked against expert human scoring. Findings reveal that LLMs demonstrate alignment with human evaluations in most cases, thus effectively identifying nuanced indicators of threat sentiment. The performance is lower on human-generated data than synthetic data, suggesting areas for improvement in evaluating real-world data. Text diversity analysis found differences between human-generated and LLM-generated datasets, with synthetic data exhibiting somewhat lower diversity. Overall, the results demonstrate the applicability of LLMs to insider threat detection, and a scalable solution for insider sentiment testing by overcoming ethical and logistical barriers tied to data acquisition.

arxiv情報

著者	Haywood Gelman,John D. Hastings
発行日	2025-04-07 16:01:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー