Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning

要約

大規模な言語モデル（LLM）を検出することは、セキュリティ、知的財産、および学問の完全性に影響を与える課題が高まっています。
コードとそれを生成した対応するタスクプロンプトの両方を考慮する際に、ゼロショットLLM生成コード検出の改善における条件付き確率分布の役割を調査します。
私たちの重要な洞察は、LLMを使用してコードトークンの確率分布を評価する場合、LLMが作成したコードと人間が作成したコードの間にほとんど違いがないということです。
ただし、タスクの条件付けにより、顕著な違いが明らかになります。
これは、無条件の分布にも違いが存在する自然言語のテキストとは対照的です。
これを活用して、特定のコードスニペットを生成するために使用された元のタスクに近似し、近似タスクコンディショニング（ATC）の下でトークンレベルのエントロピーを評価する新しいゼロショット検出アプローチを提案します。
さらに、数学的な直観を提供し、以前のアプローチに対する方法をコンテキスト化します。
ATCは、ジェネレーターLLMへのアクセスも元のタスクプロンプトも必要としないため、実際のアプリケーションに実用的です。
私たちの知る限り、それはベンチマーク全体で最新の結果を達成し、Python、CPP、Javaなどのプログラミング言語全体で一般化します。
私たちの調査結果は、LLM生成コード検出のタスクレベルの条件付けの重要性を強調しています。
補足資料とコードは、この分野でのさらなる調査を促進するために、データセット収集の実装を含むhttps://github.com/maorash/atcで入手できます。

要約(オリジナル)

Detecting Large Language Model (LLM)-generated code is a growing challenge with implications for security, intellectual property, and academic integrity. We investigate the role of conditional probability distributions in improving zero-shot LLM-generated code detection, when considering both the code and the corresponding task prompt that generated it. Our key insight is that when evaluating the probability distribution of code tokens using an LLM, there is little difference between LLM-generated and human-written code. However, conditioning on the task reveals notable differences. This contrasts with natural language text, where differences exist even in the unconditional distributions. Leveraging this, we propose a novel zero-shot detection approach that approximates the original task used to generate a given code snippet and then evaluates token-level entropy under the approximated task conditioning (ATC). We further provide a mathematical intuition, contextualizing our method relative to previous approaches. ATC requires neither access to the generator LLM nor the original task prompts, making it practical for real-world applications. To the best of our knowledge, it achieves state-of-the-art results across benchmarks and generalizes across programming languages, including Python, CPP, and Java. Our findings highlight the importance of task-level conditioning for LLM-generated code detection. The supplementary materials and code are available at https://github.com/maorash/ATC, including the dataset gathering implementation, to foster further research in this area.

arxiv情報

著者	Maor Ashkenazi,Ofir Brenner,Tal Furman Shohet,Eran Treister
発行日	2025-06-06 13:23:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー