Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study

要約

ソフトウェア開発の早い段階で非機能的要件（NFR）を無視すると、重大な課題につながる可能性があります。
それらの重要性にもかかわらず、NFRはしばしば見落とされているか、特定が困難であり、ソフトウェアの品質に影響を与えます。
NFRを誘発する要件エンジニアをサポートするために、機能要件（FRS）から品質主導のNFRを導き出すために大規模な言語モデル（LLM）を活用するフレームワークを開発しました。
デノベースのパイプライン内でカスタムプロンプト手法を使用して、システムは各機能要件に関連する品質属性を識別し、対応するNFRを生成し、体系的な統合を支援します。
重要な側面は、これらの生成された要件の品質と適合性を評価することです。
LLMは高品質のNFRの提案を生成できますか？
34の機能要件を使用 – 3,964 FRS -LLMSの代表的なサブセットとして選択され、ISO/IEC 25010：2023標準に基づいて適用可能な属性を推測し、1,593 NFRを生成します。
水平評価では、NFRの妥当性、品質属性の適用可能性、および分類精度の3つの次元をカバーしました。
平均して13年の経験がある業界のソフトウェア品質評価者10人が、関連性と品質についてサブセットを評価しました。
この評価では、LLMが生成したNFRと専門家評価の間の強い整合性を示し、1〜5スケールで5.0（平均：4.63と4.59）の妥当性と適用性スコアの中央値と適用性スコアが示されました。
分類タスクでは、LLMが割り当てられた属性の80.4％が専門家の選択肢と一致し、8.3％のニアミスと11.3％のミスマッチがありました。
8つのLLMSの比較分析は、パフォーマンスの変動を強調し、Gemini-1.5-Proが最高の属性の精度を示し、Llama-3.3-70bはより高い妥当性と適用性スコアを達成しました。
これらの調査結果は、自動化されたNFR生成にLLMを使用する可能性に関する洞察を提供し、AIアシスト要件エンジニアリングのさらなる調査の基礎を築きます。

要約(オリジナル)

Neglecting non-functional requirements (NFRs) early in software development can lead to critical challenges. Despite their importance, NFRs are often overlooked or difficult to identify, impacting software quality. To support requirements engineers in eliciting NFRs, we developed a framework that leverages Large Language Models (LLMs) to derive quality-driven NFRs from functional requirements (FRs). Using a custom prompting technique within a Deno-based pipeline, the system identifies relevant quality attributes for each functional requirement and generates corresponding NFRs, aiding systematic integration. A crucial aspect is evaluating the quality and suitability of these generated requirements. Can LLMs produce high-quality NFR suggestions? Using 34 functional requirements – selected as a representative subset of 3,964 FRs-the LLMs inferred applicable attributes based on the ISO/IEC 25010:2023 standard, generating 1,593 NFRs. A horizontal evaluation covered three dimensions: NFR validity, applicability of quality attributes, and classification precision. Ten industry software quality evaluators, averaging 13 years of experience, assessed a subset for relevance and quality. The evaluation showed strong alignment between LLM-generated NFRs and expert assessments, with median validity and applicability scores of 5.0 (means: 4.63 and 4.59, respectively) on a 1-5 scale. In the classification task, 80.4% of LLM-assigned attributes matched expert choices, with 8.3% near misses and 11.3% mismatches. A comparative analysis of eight LLMs highlighted variations in performance, with gemini-1.5-pro exhibiting the highest attribute accuracy, while llama-3.3-70B achieved higher validity and applicability scores. These findings provide insights into the feasibility of using LLMs for automated NFR generation and lay the foundation for further exploration of AI-assisted requirements engineering.

arxiv情報

著者	Jomar Thomas Almonte,Santhosh Anitha Boominathan,Nathalia Nascimento
発行日	2025-03-19 14:23:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー