Implant Global and Local Hierarchy Information to Sequence based Code Representation Models

要約

深層学習技術によるソースコード表現は重要な研究分野です。
コード表現の順序情報または構造情報を学習する研究は数多くあります。
ただし、シーケンスベースのモデルと非シーケンスモデルの両方に制限があります。
研究者は、配列ベースのモデルに構造情報を取り込もうとしていますが、トークンレベルの階層構造情報の一部しかマイニングしていません。
この論文では、完全な階層構造がコードシーケンスのトークンにどのように影響するかを分析し、この影響を階層埋め込みと呼ばれるコードトークンのプロパティとして抽象化します。
階層的な埋め込みは、ステートメントレベルのグローバル階層とトークンレベルのローカル階層にさらに分割されます。
さらに、階層トランスフォーマー (HiT) を提案します。これは、ソースコードの完全な階層的埋め込みをトランスフォーマーモデルに組み込むためのシンプルだが効果的なシーケンスモデルです。
変数スコープ検出タスクの実験により、学習コード構造に対する階層的埋め込みの有効性を示します。
さらに評価すると、HiT は SOTA ベースラインモデルよりも優れており、8 つの異なるデータセットにわたる分類および生成タスクを含む 3 つのソースコード関連タスクで安定したトレーニング効率を示しています。

要約(オリジナル)

Source code representation with deep learning techniques is an important research field. There have been many studies that learn sequential or structural information for code representation. But sequence-based models and non-sequence-models both have their limitations. Researchers attempt to incorporate structural information to sequence-based models, but they only mine part of token-level hierarchical structure information. In this paper, we analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding. The hierarchical embedding is further divided into statement-level global hierarchy and token-level local hierarchy. Furthermore, we propose the Hierarchy Transformer (HiT), a simple but effective sequence model to incorporate the complete hierarchical embeddings of source code into a Transformer model. We demonstrate the effectiveness of hierarchical embedding on learning code structure with an experiment on variable scope detection task. Further evaluation shows that HiT outperforms SOTA baseline models and show stable training efficiency on three source code-related tasks involving classification and generation tasks across 8 different datasets.

arxiv情報

著者	Kechi Zhang,Zhuo Li,Zhi Jin,Ge Li
発行日	2023-03-14 12:01:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Implant Global and Local Hierarchy Information to Sequence based Code Representation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー