CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation

要約

大規模な言語モデルを利用してコードを生成することは、ソフトウェア開発革命において有望な意味を示しています。
一般的な大規模言語モデルによって示されるインテリジェンスにもかかわらず、自然言語とさまざまなプログラミング言語の間に存在する構文上のギャップや語彙の不一致により、コード生成におけるモデルの特異性は依然として改善の可能性があります。
この論文では、LLM のパフォーマンスを強化するためのグラフィカル検索拡張コード生成フレームワークである CodeGRAG を提案します。
CodeGRAG は、コードブロックの制御フローとデータフローに基づいてコードブロックのグラフィカルビューを構築し、プログラミング言語と自然言語の間のギャップを埋めます。これにより、自然言語ベースの LLM が容易になり、コード構文をより深く理解できるようになり、異なるプログラミング間の橋渡しとして機能します。
言語。
抽出された構造知識を基礎モデルに取り込むために、1) 困難なグラフィック表現をチューニング不要のモデルのための有益な知識に変換するハードメタグラフプロンプトテンプレート、および 2) プログラミングのドメイン知識を注入するソフトプロンプト手法を提案します。
事前トレーニングされた GNN エキスパートモデルを利用してモデルを微調整することで、言語をモデルパラメーターに組み込みます。
C++ 言語と Python 言語の両方を含む 4 つのデータセットに対してさまざまな実験とアブレーションが実行され、ハードメタグラフプロンプト、ソフトプロンプト手法、および事前トレーニングされた GNN エキスパートの目標の有効性が検証されます。
CodeGRAG は LLM のコード生成能力を向上させ、言語をまたいだコード生成のパフォーマンス向上も実現します。
コードは https://anonymous.4open.science/r/Code-5970/ で入手できます。

要約(オリジナル)

Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In this paper, we propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language, which can facilitate natural language based LLMs for better understanding of code syntax and serve as a bridge among different programming languages. To take the extracted structural knowledge into the foundation models, we propose 1) a hard meta-graph prompt template to transform the challenging graphical representation into informative knowledge for tuning-free models and 2) a soft prompting technique that injects the domain knowledge of programming languages into the model parameters via finetuning the models with the help of a pretrained GNN expert model. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert. CodeGRAG improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation. Code is available at https://anonymous.4open.science/r/Code-5970/.

arxiv情報

著者	Kounianhua Du,Jizheng Chen,Renting Rui,Huacan Chai,Lingyue Fu,Wei Xia,Yasheng Wang,Ruiming Tang,Yong Yu,Weinan Zhang
発行日	2024-11-08 14:17:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー