Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

要約

私たちは、オープン大規模言語モデル (LLM) が構造化データからどの程度一貫した関連性のあるテキストを生成できるかを調査します。
LLM トレーニングデータに漏洩するベンチマークによるバイアスを防ぐために、Quintd-1 を収集します。これは、パブリック API から収集された標準形式の構造化データレコードで構成される、5 つのデータからテキストへの (D2T) 生成タスク用のアドホックベンチマークです。
私たちは、参照不要の評価メトリクスと LLM のコンテキスト内学習機能を活用し、人間が作成した参照を使用せずにモデルをテストできるようにします。
私たちの評価は、ヒューマンアノテーターと GPT-4 に基づくメトリクスを組み合わせて、トークンレベルでセマンティック精度エラーに注釈を付けることに重点を置いています。
ドメインとタスクにわたるモデルの動作を体系的に調査した結果、7B パラメーターを備えた最先端のオープン LLM が、ゼロショット設定でさまざまな標準データ形式から流暢で一貫したテキストを生成できることがわかりました。
ただし、出力の意味的精度が依然として大きな問題であることも示しています。私たちのベンチマークでは、ヒューマンアノテーターによると、オープン LLM の出力の 80% に意味的エラーが含まれています (GPT-4 によると 91%)。
コード、データ、モデルの出力は https://d2t-llm.github.io で入手できます。

要約(オリジナル)

We investigate to which extent open large language models (LLMs) can generate coherent and relevant text from structured data. To prevent bias from benchmarks leaked into LLM training data, we collect Quintd-1: an ad-hoc benchmark for five data-to-text (D2T) generation tasks, consisting of structured data records in standard formats gathered from public APIs. We leverage reference-free evaluation metrics and LLMs’ in-context learning capabilities, allowing us to test the models with no human-written references. Our evaluation focuses on annotating semantic accuracy errors on token-level, combining human annotators and a metric based on GPT-4. Our systematic examination of the models’ behavior across domains and tasks suggests that state-of-the-art open LLMs with 7B parameters can generate fluent and coherent text from various standard data formats in zero-shot settings. However, we also show that semantic accuracy of the outputs remains a major issue: on our benchmark, 80% of outputs of open LLMs contain a semantic error according to human annotators (91% according to GPT-4). Our code, data, and model outputs are available at https://d2t-llm.github.io.

arxiv情報

著者	Zdeněk Kasner,Ondřej Dušek
発行日	2024-01-18 18:15:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー