Combining Language and App UI Analysis for the Automated Assessment of Bug Reproduction Steps

要約

バグレポートは、開発者がソフトウェアの問題を確認し、その原因を調査し、修正を検証するために不可欠です。
残念ながら、レポートは多くの場合、重要な情報を見逃したり、不明確に書かれたりします。
問題のあるレポートの最も一般的なコンポーネントの1つは、修正に関するプログラムの障害と理由を再現するために不可欠なバグ（S2R）を再現する手順です。
報告されたS2Rの欠陥の傾向を考えると、以前の研究は、記者がS2Rの品質を書面または評価するのを支援する技術を提案しました。
ただし、S2Rの自動化された理解は困難であり、微妙な自然言語フレーズを特定の意味的に関連するプログラム情報とリンクする必要があります。
以前の手法は、言語の変動の問題とプログラム分析から収集された情報の制限のために、そのような言語をプログラムするためにそのような言語を形成するのに苦労します。
S2R品質の注釈の問題にもっと効果的に取り組むために、Astrobrと呼ばれる新しい手法を提案します。これは、LLMSの言語理解機能を活用して、バグレポートからS2Rを識別および抽出し、動的介して導出されたプログラム状態モデルでGUI相互作用にマッピングします。
分析。
Astrobrを関連する最先端のアプローチと比較しましたが、Astrobrはベースラインよりも（F1スコアに関して）S2RS 25.2％がより良い（F1スコアの点で）注釈を付けることがわかりました。
さらに、Astrobrは、ベースラインよりも正確な欠落S2Rを示唆しています（F1スコアの点で71.4％）。

要約(オリジナル)

Bug reports are essential for developers to confirm software problems, investigate their causes, and validate fixes. Unfortunately, reports often miss important information or are written unclearly, which can cause delays, increased issue resolution effort, or even the inability to solve issues. One of the most common components of reports that are problematic is the steps to reproduce the bug(s) (S2Rs), which are essential to replicate the described program failures and reason about fixes. Given the proclivity for deficiencies in reported S2Rs, prior work has proposed techniques that assist reporters in writing or assessing the quality of S2Rs. However, automated understanding of S2Rs is challenging, and requires linking nuanced natural language phrases with specific, semantically related program information. Prior techniques often struggle to form such language to program connections – due to issues in language variability and limitations of information gleaned from program analyses. To more effectively tackle the problem of S2R quality annotation, we propose a new technique called AstroBR, which leverages the language understanding capabilities of LLMs to identify and extract the S2Rs from bug reports and map them to GUI interactions in a program state model derived via dynamic analysis. We compared AstroBR to a related state-of-the-art approach and we found that AstroBR annotates S2Rs 25.2% better (in terms of F1 score) than the baseline. Additionally, AstroBR suggests more accurate missing S2Rs than the baseline (by 71.4% in terms of F1 score).

arxiv情報

著者	Junayed Mahmud,Antu Saha,Oscar Chaparro,Kevin Moran,Andrian Marcus
発行日	2025-02-06 17:40:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Combining Language and App UI Analysis for the Automated Assessment of Bug Reproduction Steps

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー