Specification and Evaluation of Multi-Agent LLM Systems — Prototype and Cybersecurity Applications

要約

LLMの最近の進歩は、たとえば、最新のOpenaiおよびDeepseekモデルの推論能力を通じて、新しいアプリケーションの可能性を示しています。
これらのモデルをテキスト生成を超えて特定のドメインに適用するために、LLMベースのマルチエージェントアプローチを利用して、推論技術、コード生成、およびソフトウェアの実行を組み合わせて複雑なタスクを解決できます。
アプリケーションは、これらの機能と専門のLLMエージェントの知識を利用する場合があります。
ただし、多くの評価はLLMS、推論技術、およびアプリケーションで個別に実行されますが、その共同仕様と組み合わせアプリケーションは十分に調査されていません。
マルチエージェントLLMシステムの定義された仕様は、LLMS、推論技術、および関連する側面の体系的な評価を可能にする可能性と特定のアプリケーションへの適合性を調査するために必要です。
このペーパーでは、探索的研究の結果を報告して、マルチエージェントシステムを介してこれらの側面を指定および評価します。
システムアーキテクチャとプロトタイプは以前の研究から拡張されており、マルチエージェントシステム用の仕様が導入されています。
サイバーセキュリティタスクを含むテストケースは、アーキテクチャおよび評価アプローチの実現可能性を示しています。
特に、結果は、OpenAIおよびDeepSeekのLLMSを使用してエージェントによって正しく完了した質問応答、サーバーセキュリティ、およびネットワークセキュリティタスクの評価を示しています。

要約(オリジナル)

Recent advancements in LLMs indicate potential for novel applications, e.g., through reasoning capabilities in the latest OpenAI and DeepSeek models. For applying these models in specific domains beyond text generation, LLM-based multi-agent approaches can be utilized that solve complex tasks by combining reasoning techniques, code generation, and software execution. Applications might utilize these capabilities and the knowledge of specialized LLM agents. However, while many evaluations are performed on LLMs, reasoning techniques, and applications individually, their joint specification and combined application is not explored well. Defined specifications for multi-agent LLM systems are required to explore their potential and their suitability for specific applications, allowing for systematic evaluations of LLMs, reasoning techniques, and related aspects. This paper reports the results of exploratory research to specify and evaluate these aspects through a multi-agent system. The system architecture and prototype are extended from previous research and a specification is introduced for multi-agent systems. Test cases involving cybersecurity tasks indicate feasibility of the architecture and evaluation approach. In particular, the results show the evaluation of question answering, server security, and network security tasks that were completed correctly by agents with LLMs from OpenAI and DeepSeek.

arxiv情報

著者	Felix Härer
発行日	2025-06-13 17:32:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Specification and Evaluation of Multi-Agent LLM Systems — Prototype and Cybersecurity Applications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー