Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

要約

プログラムの機能を説明する人間が読めるテキストを自動的に生成することが、ソースコードの要約の目的です。
ニューラル言語モデルはこの分野で大きなパフォーマンスを達成していますが、ニューラルモデルと外部知識を組み合わせるという新たな傾向が見られます。
これまでのアプローチのほとんどは、エンコーダ側での文レベルの検索と組み合わせパラダイム (類似のコードスニペットの検索と、対応するコードと要約のペアの使用) に依存しています。
ただし、このパラダイムは粒度が粗く、デコーダ側で取得された高品質のサマリートークンを直接利用することはできません。
この論文では、バニラニューラルモデルがより適切なコードサマリーを生成できるように、デコーダー側でのきめの細かいトークンレベルの検索拡張メカニズムを検討します。
さらに、コンテキストコードセマンティクスの取得におけるトークンレベルの取得の制限を緩和するために、コードセマンティクスをサマリートークンに統合することを提案します。
広範な実験と人間による評価により、トークンレベルの検索拡張アプローチがパフォーマンスを大幅に向上させ、より解釈しやすいことが明らかになりました。

要約(オリジナル)

Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although Neural Language Models achieve significant performance in this field, an emerging trend is combining neural models with external knowledge. Most previous approaches rely on the sentence-level retrieval and combination paradigm (retrieval of similar code snippets and use of the corresponding code and summary pairs) on the encoder side. However, this paradigm is coarse-grained and cannot directly take advantage of the high-quality retrieved summary tokens on the decoder side. In this paper, we explore a fine-grained token-level retrieval-augmented mechanism on the decoder side to help the vanilla neural model generate a better code summary. Furthermore, to mitigate the limitation of token-level retrieval on capturing contextual code semantics, we propose to integrate code semantics into summary tokens. Extensive experiments and human evaluation reveal that our token-level retrieval-augmented approach significantly improves performance and is more interpretive.

arxiv情報

著者	Tong Ye,Lingfei Wu,Tengfei Ma,Xuhong Zhang,Yangkai Du,Peiyu Liu,Wenhai Wang,Shouling Ji
発行日	2023-05-18 16:02:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー