Model-based Preference Optimization in Abstractive Summarization without Human Feedback

要約

抽象的な要約では、ソース文書に含まれる膨大な量の情報から、簡潔かつ正確な要約を作成するという課題が生じます。
したがって、大規模言語モデル (LLM) は流暢なテキストを生成できますが、元のソースにない幻覚コンテンツによって不正確さが生じることがよくあります。
尤度を最大化する教師あり微調整手法はこの問題の一因となりますが、要約の忠実性を一貫して強化するわけではありません。
Direct Preference Optimization (DPO) などの好みに基づく最適化手法を使用すると、人間の好みに合わせてモデルをさらに改良できます。
ただし、これらの方法は依然として、コストのかかる人間によるフィードバックに大きく依存しています。
この研究では、人によるフィードバックなしで LLM を微調整して要約能力を向上させる、モデルベースの設定最適化 (MPO) と呼ばれる斬新で簡単なアプローチを導入します。
モデル固有の要約機能を活用することで、さまざまなデコード戦略を使用してモデルによって完全に生成される嗜好データセットを作成します。
標準的な要約データセットとさまざまなメトリクスに関する私たちの実験は、私たちが提案する MPO が人間のフィードバックに依存せずに生成される要約の品質を大幅に向上させることを示しています。

要約(オリジナル)

In abstractive summarization, the challenge of producing concise and accurate summaries arises from the vast amount of information contained in the source document. Consequently, although Large Language Models (LLMs) can generate fluent text, they often introduce inaccuracies by hallucinating content not found in the original source. While supervised fine-tuning methods that maximize likelihood contribute to this issue, they do not consistently enhance the faithfulness of the summaries. Preference-based optimization methods, such as Direct Preference Optimization (DPO), can further refine the model to align with human preferences. However, these methods still heavily depend on costly human feedback. In this work, we introduce a novel and straightforward approach called Model-based Preference Optimization (MPO) to fine-tune LLMs for improved summarization abilities without any human feedback. By leveraging the model’s inherent summarization capabilities, we create a preference dataset that is fully generated by the model using different decoding strategies. Our experiments on standard summarization datasets and various metrics demonstrate that our proposed MPO significantly enhances the quality of generated summaries without relying on human feedback.

arxiv情報

著者	Jaepill Choi,Kyubyung Chae,Jiwoo Song,Yohan Jo,Taesup Kim
発行日	2024-09-27 10:35:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Model-based Preference Optimization in Abstractive Summarization without Human Feedback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー