Exploiting Code Symmetries for Learning Program Semantics

要約

この論文では、コードの対称性をモデルアーキテクチャに組み込むことによって、プログラム分析のためにコードセマンティクスを大規模言語モデル (LLM) に教えるという課題に取り組みます。
我々は、コード対称性をセマンティクスを保持する変換として定義するグループ理論フレームワークを導入します。コード対称性グループを形成することで、コードセマンティクスの正確かつ効率的な推論が可能になります。
私たちのソリューションである SymC は、プログラム依存関係グラフ上で定義された順列グループからのコード対称性と証明的に等変である自己注意の新しい変種を開発します。
SymC は 5 つのプログラム分析タスクで優れたパフォーマンスを実現し、事前トレーニングなしで最先端のコードモデルを上回ります。
私たちの結果は、コード対称グループを介してコード構造を事前にエンコードするコード LLM がより適切かつ高速に一般化することを示唆しています。

要約(オリジナル)

This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.

arxiv情報

著者	Kexin Pei,Weichen Li,Qirui Jin,Shuyang Liu,Scott Geng,Lorenzo Cavallaro,Junfeng Yang,Suman Jana
発行日	2024-06-06 16:35:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploiting Code Symmetries for Learning Program Semantics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー