On Learning to Summarize with Large Language Models as References

要約

最近の研究では、人間のアノテーターは、一般的に使用される要約データセット内の元の参照要約よりも、大規模言語モデル (LLM) によって生成された要約の方が好まれることがわかっています。
したがって、小規模なテキスト要約モデルの参照としての LLM 学習設定を研究し、パフォーマンスを大幅に向上できるかどうかを調査します。
この目的を達成するために、標準的な教師あり微調整用のオラクルサマリージェネレーターと、LLM の監視信号を活用した効率的な対比学習用のオラクルサマリーエバリュエーターの両方として LLM を使用します。
私たちはソースのニュース記事を使って包括的な実験を行い、(1) LLM を参照設定としてトレーニングした要約モデルは、LLM と人間の評価の両方で大幅なパフォーマンスの向上を達成することを発見しました。
(2) 対照学習は、低リソース設定と高リソース設定の両方で、標準の教師あり微調整よりも優れたパフォーマンスを発揮します。
私たちの実験結果は、困難な環境下での LLM の要約評価能力のメタ分析も可能にし、LLM が人間の評価者と十分に連携していないことを示しています。
特に、当社の専門家による人間による評価では、LLM が把握できていない、LLM と当社の微調整されたモデルとの間に残っている微妙なパフォーマンスのギャップが明らかになりました。
したがって、要約モデル開発における LLM の使用の可能性と課題の両方について、さらなる研究が必要です。

要約(オリジナル)

Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved. To this end, we use LLMs as both oracle summary generators for standard supervised fine-tuning and oracle summary evaluators for efficient contrastive learning that leverages the LLMs’ supervision signals. We conduct comprehensive experiments with source news articles and find that (1) summarization models trained under the LLM-as-reference setting achieve significant performance improvement in both LLM and human evaluations; (2) contrastive learning outperforms standard supervised fine-tuning under both low and high resource settings. Our experimental results also enable a meta-analysis of LLMs’ summary evaluation capacities under a challenging setting, showing that LLMs are not well-aligned with human evaluators. Particularly, our expert human evaluation reveals remaining nuanced performance gaps between LLMs and our fine-tuned models, which LLMs fail to capture. Thus, we call for further studies into both the potential and challenges of using LLMs in summarization model development.

arxiv情報

著者	Yixin Liu,Kejian Shi,Katherine S He,Longtian Ye,Alexander R. Fabbri,Pengfei Liu,Dragomir Radev,Arman Cohan
発行日	2024-07-18 17:23:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Learning to Summarize with Large Language Models as References

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー