Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach

要約

私たちは、元のテキストに頼ることなく、内部知識からのみ包括的で正確な本の要約を生成する大規模な言語モデル（LLM）の能力を研究しています。
多様な本のセットと複数のLLMアーキテクチャを採用して、これらのモデルが確立された人間の解釈と一致する意味のある物語を合成できるかどうかを調べます。
評価はLLM-as-a-Judgeパラダイムで実行されます。各AIに生成された要約は、クロスモデル評価を介して高品質の人間が記述した要約と比較されます。
この方法論により、モデルが他の人よりも独自の要約スタイルを支持するための傾向など、潜在的なバイアスの識別を可能にします。
さらに、RougeとBertscoreのメトリックを使用して、人間が作成した概要とLLM生成された要約のアラインメントが定量化され、文法および意味対応の深さを評価します。
結果は、モデル間のコンテンツ表現とスタイルの好みの微妙な変動を明らかにし、要約タスクの内部知識に依存することに固有の強みと制限の両方を強調しています。
これらの調査結果は、より堅牢な自然言語生成システムの開発に影響を与える、事実情報のLLM内部エンコーディングとクロスモデル評価のダイナミクスのより深い理解に貢献します。

要約(オリジナル)

We study the ability of large language models (LLMs) to generate comprehensive and accurate book summaries solely from their internal knowledge, without recourse to the original text. Employing a diverse set of books and multiple LLM architectures, we examine whether these models can synthesize meaningful narratives that align with established human interpretations. Evaluation is performed with a LLM-as-a-judge paradigm: each AI-generated summary is compared against a high-quality, human-written summary via a cross-model assessment, where all participating LLMs evaluate not only their own outputs but also those produced by others. This methodology enables the identification of potential biases, such as the proclivity for models to favor their own summarization style over others. In addition, alignment between the human-crafted and LLM-generated summaries is quantified using ROUGE and BERTScore metrics, assessing the depth of grammatical and semantic correspondence. The results reveal nuanced variations in content representation and stylistic preferences among the models, highlighting both strengths and limitations inherent in relying on internal knowledge for summarization tasks. These findings contribute to a deeper understanding of LLM internal encodings of factual information and the dynamics of cross-model evaluation, with implications for the development of more robust natural language generative systems.

arxiv情報

著者	Javier Coronado-Blázquez
発行日	2025-03-27 15:36:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー