RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models

要約

大規模な言語モデル（LLMS）における事実性は、持続的な課題です。
現在のベンチマークは、多くの場合、パラメトリックな知識から構造化された多録音表形式出力を生成する重要な能力を見落として、短い事実に基づいた回答を評価します。
このリレーショナルファクトの検索は、個々の事実がモデルに知られている場合でも、分離されたポイントごとのクエリよりも実質的に困難であることを実証し、出力の次元（例：属性やレコードの数）に敏感な個別の障害モードを公開します。
この不足していない機能を体系的に評価するために、LerationalFactqa、多様な自然言語の質問（SQLと組み合わせた）と金標準の表の回答を特徴とする新しいベンチマークを紹介します。
RelationalFactQAは、さまざまなクエリの複雑さ、出力サイズ、およびデータ特性にわたって分析を可能にします。
私たちの実験では、最先端のLLMでさえ、リレーショナル出力の生成における25％の事実上の精度を超えないことを大幅に争い、出力の次元が増加するにつれてパフォーマンスが著しく低下することが明らかになりました。
これらの調査結果は、構造化された事実知識を統合し、LLMの事実性の将来の進歩を測定するための重要なリソースとしてリレーショナルファクトを確立する現在のLLMSの能力に重大な制限を強調しています。

要約(オリジナル)

Factuality in Large Language Models (LLMs) is a persistent challenge. Current benchmarks often assess short factual answers, overlooking the critical ability to generate structured, multi-record tabular outputs from parametric knowledge. We demonstrate that this relational fact retrieval is substantially more difficult than isolated point-wise queries, even when individual facts are known to the model, exposing distinct failure modes sensitive to output dimensionality (e.g., number of attributes or records). To systematically evaluate this under-explored capability, we introduce RelationalFactQA, a new benchmark featuring diverse natural language questions (paired with SQL) and gold-standard tabular answers, specifically designed to assess knowledge retrieval in a structured format. RelationalFactQA enables analysis across varying query complexities, output sizes, and data characteristics. Our experiments reveal that even state-of-the-art LLMs struggle significantly, not exceeding 25% factual accuracy in generating relational outputs, with performance notably degrading as output dimensionality increases. These findings underscore critical limitations in current LLMs’ ability to synthesize structured factual knowledge and establish RelationalFactQA as a crucial resource for measuring future progress in LLM factuality.

arxiv情報

著者	Dario Satriani,Enzo Veltri,Donatello Santoro,Paolo Papotti
発行日	2025-05-27 16:33:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー