A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

要約

バグレポートはソフトウェア開発の重要な側面であり、ソフトウェアシステムの一貫した機能を確保するには、バグを迅速に特定して解決することが重要です。
既存のデータベースから同様のバグレポートを取得すると、バグの解決に必要な時間と労力を削減できます。
この論文では、類似性スコアに基づいて類似のバグレポートを取得するための意味論的なテキスト類似性手法の有効性を比較しました。
私たちは、TF-IDF (Baseline)、FastText、Gensim、BERT、ADA などのいくつかの埋め込みモデルを調査しました。
これらのモデルのパフォーマンスを評価するために、さまざまなソフトウェアプロジェクトのバグレポートを含むソフトウェア欠陥データを使用しました。
実験結果では、BERT がリコールに関して他のモデルよりも全体的に優れており、ADA、Gensim、FastText、TFIDF がそれに続きます。
私たちの研究は、同様のバグレポートを取得するためのさまざまな埋め込み方法の有効性についての洞察を提供し、このタスクに適切な埋め込み方法を選択することの影響を強調しています。
私たちのコードは GitHub で入手できます。

要約(オリジナル)

Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.

arxiv情報

著者	Avinash Patil,Kihwan Han,Sabyasachi Mukhopadhyay
発行日	2023-08-17 21:36:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー