Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning

要約

最新の大規模な言語モデル（LLMS）は非常に大きなコンテキストをサポートしていますが、コード推論に長いコンテキストを利用する上での有効性は不明のままです。
このペーパーでは、大規模なリポジトリ内のコードスニペットを介したLLMの推論能力と、それがリコール能力との関係を調査します。
具体的には、語彙コードリコール（逐語的検索）とセマンティックコードリコール（コードが何をするかを思い出す）を区別します。
セマンティックリコールを測定するために、semtraceを提案します。これは、出力に対する特定のステートメントの影響が帰属し、予測不可能なコード推論手法です。
また、既存のベンチマークでセマンティックリコール感度を定量化する方法も提示します。
最先端のLLMSの評価は、コードスニペットが入力コンテキストの中央に近づくため、特にSemtraceのような高いセマンティックリコールを必要とする手法で、コードの推論の正確性の大幅な低下を明らかにしています。
さらに、語彙のリコールは粒度によって異なることがわかり、モデルは関数の検索に優れていますが、ラインごとのリコールに苦労しています。
特に、語彙とセマンティックのリコールの間には切断が存在し、異なる根本的なメカニズムを示唆しています。
最後に、我々の調査結果は、現在のコード推論ベンチマークが低セマンティックリコール感度を示し、コンテキスト内情報を活用する際にLLMの課題を過小評価する可能性があることを示しています。

要約(オリジナル)

Although modern Large Language Models (LLMs) support extremely large contexts, their effectiveness in utilizing long context for code reasoning remains unclear. This paper investigates LLM reasoning ability over code snippets within large repositories and how it relates to their recall ability. Specifically, we differentiate between lexical code recall (verbatim retrieval) and semantic code recall (remembering what the code does). To measure semantic recall, we propose SemTrace, a code reasoning technique where the impact of specific statements on output is attributable and unpredictable. We also present a method to quantify semantic recall sensitivity in existing benchmarks. Our evaluation of state-of-the-art LLMs reveals a significant drop in code reasoning accuracy as a code snippet approaches the middle of the input context, particularly with techniques requiring high semantic recall like SemTrace. Moreover, we find that lexical recall varies by granularity, with models excelling at function retrieval but struggling with line-by-line recall. Notably, a disconnect exists between lexical and semantic recall, suggesting different underlying mechanisms. Finally, our findings indicate that current code reasoning benchmarks may exhibit low semantic recall sensitivity, potentially underestimating LLM challenges in leveraging in-context information.

arxiv情報

著者	Adam Štorek,Mukur Gupta,Samira Hajizadeh,Prashast Srivastava,Suman Jana
発行日	2025-05-20 05:45:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー