KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

要約

コミットメッセージは、コードの変更を自然言語で説明したもので、コードの理解やメンテナンスなどのソフトウェアの進化にとって重要です。
ただし、以前のメソッドは、コミットメッセージの一部はグッドプラクティス (つまり、グッドプラクティスコミット) に準拠しているが、残りは準拠していないという事実を考慮せずに、データセット全体でトレーニングされています。
私たちの実証研究に基づいて、グッドプラクティスコミットに関するトレーニングがコミットメッセージの生成に大きく貢献していることがわかりました。
この発見に動機付けられて、私たちは KADEL と呼ばれる新しい知識認識型ノイズ除去学習方法を提案します。
グッドプラクティスのコミットはデータセットのほんの一部しか構成していないことを考慮して、残りのトレーニングサンプルをこれらのグッドプラクティスのコミットと調整します。
これを達成するために、グッドプラクティスのコミットでトレーニングすることによってコミットの知識を学習するモデルを提案します。
この知識モデルにより、グッドプラクティスに準拠していないトレーニングサンプルについて、より多くの情報を補足できるようになります。
しかし、補足情報にはノイズや予測誤差が含まれる可能性があるため、動的ノイズ除去トレーニング手法を提案します。
この方法は、分布を意識した信頼関数と動的分布リストを構成し、トレーニングプロセスの有効性を高めます。
MCMD データセット全体の実験結果は、私たちの方法が全体として、以前の方法と比較して最先端のパフォーマンスを達成していることを示しています。
私たちのソースコードとデータは https://github.com/DeepSoftwareAnalytics/KADEL で入手できます。

要約(オリジナル)

Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods. Our source code and data are available at https://github.com/DeepSoftwareAnalytics/KADEL

arxiv情報

著者	Wei Tao,Yucheng Zhou,Yanlin Wang,Hongyu Zhang,Haofen Wang,Wenqiang Zhang
発行日	2024-01-16 14:07:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー