Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

要約

大規模言語モデル (LLM) の分野が加速したペースで進化するにつれて、そのパフォーマンスを評価および監視する重要なニーズが生じています。
構文とエラー修正、ファクト抽出、データセット生成に対処する 3 つの課題を伴うナレッジグラフエンジニアリング (KGE) に焦点を当てたベンチマークフレームワークを紹介します。
LLM は便利なツールではありますが、ゼロショットプロンプトによるナレッジグラフ生成の支援にはまだ適していないことを示します。
その結果、当社の LLM-KG-Bench フレームワークは、LLM 応答の自動評価と保存、およびプロンプトエンジニアリングとモデルのパフォーマンスの追跡をサポートする統計データと視覚化ツールを提供します。

要約(オリジナル)

As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.

arxiv情報

著者	Lars-Peter Meyer,Johannes Frey,Kurt Junghanns,Felix Brei,Kirill Bulert,Sabine Gründer-Fahrer,Michael Martin
発行日	2023-08-31 10:31:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー