On Benchmarking Code LLMs for Android Malware Analysis

要約

大規模な言語モデル（LLMS）は、さまざまなコードインテリジェンスタスクで強力な機能を実証しています。
ただし、Androidマルウェア分析に対するそれらの有効性は未定のままです。
分解されたAndroidマルウェアコードは、悪意のあるロジックが多数の機能に埋もれているため、意味のある関数名が頻繁にないため、分析のための独自の課題を提示します。
このペーパーでは、Androidマルウェア分析におけるコードLLMの有効性を体系的に評価するように設計されたベンチマークフレームワークであるCAMAを紹介します。
CAMAは、悪意のある関数識別やマルウェア目的の要約など、主要なマルウェア分析タスクをサポートするための構造化されたモデル出力を指定します。
これらに基づいて構築され、3つのドメイン固有の評価メトリック（一貫性、忠実性、およびセマンティック関連性）を統合し、厳密な安定性と有効性評価とクロスモデル比較を可能にします。
近年収集された13の家族から118のAndroidマルウェアサンプルのベンチマークデータセットを構築し、750万を超える異なる機能を網羅し、CAMAを使用して4つの一般的なオープンソースコードLLMを評価します。
私たちの実験は、コードLLMが分解コードを解釈し、機能の名前変更に対する感度を定量化する方法についての洞察を提供し、マルウェア分析における潜在的および現在の制限の両方を強調します。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated strong capabilities in various code intelligence tasks. However, their effectiveness for Android malware analysis remains underexplored. Decompiled Android malware code presents unique challenges for analysis, due to the malicious logic being buried within a large number of functions and the frequent lack of meaningful function names. This paper presents CAMA, a benchmarking framework designed to systematically evaluate the effectiveness of Code LLMs in Android malware analysis. CAMA specifies structured model outputs to support key malware analysis tasks, including malicious function identification and malware purpose summarization. Built on these, it integrates three domain-specific evaluation metrics (consistency, fidelity, and semantic relevance), enabling rigorous stability and effectiveness assessment and cross-model comparison. We construct a benchmark dataset of 118 Android malware samples from 13 families collected in recent years, encompassing over 7.5 million distinct functions, and use CAMA to evaluate four popular open-source Code LLMs. Our experiments provide insights into how Code LLMs interpret decompiled code and quantify the sensitivity to function renaming, highlighting both their potential and current limitations in malware analysis.

arxiv情報

著者	Yiling He,Hongyu She,Xingzhi Qian,Xinran Zheng,Zhuo Chen,Zhan Qin,Lorenzo Cavallaro
発行日	2025-04-23 16:07:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Benchmarking Code LLMs for Android Malware Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー