APPLS: A Meta-evaluation Testbed for Plain Language Summarization

要約

Plain Language Summarization (PLS) のモデルは大幅に開発されてきましたが、評価は依然として課題です。
これは、PLS が相互に関連する複数の言語変換 (背景説明の追加、専門用語の削除など) を伴うことが 1 つ理由です。
PLS 用に明示的に設計されたメトリクスはなく、他のテキスト生成評価メトリクスの適合性は不明のままです。
これらの懸念に対処するために、私たちの研究では、PLS の既存の指標を評価するように設計された詳細なメタ評価テストベッド APPLS を紹介します。
以前の研究からの洞察に基づいて、平易な言語の指標が捉えるべき 4 つの基準 (有益性、単純化、一貫性、忠実さ) に沿ってテストベッドの制御された摂動を定義します。
このテストベッドを使用したメトリクスの分析により、現在のメトリクスが単純化を捉えることができず、重大なギャップがあることが明らかになりました。
これに応えて、PLS でのテキストの簡素化を評価するために設計された新しい指標である POMME を紹介します。
単純化摂動との相関関係を実証し、さまざまなデータセットにわたって検証します。
私たちの研究は、PLS の最初のメタ評価テストベッドと既存の指標の包括的な評価に貢献し、他のテキスト生成タスクに関連する洞察を提供します。

要約(オリジナル)

While there has been significant development of models for Plain Language Summarization (PLS), evaluation remains a challenge. This is in part because PLS involves multiple, interrelated language transformations (e.g., adding background explanations, removing specialized terminology). No metrics are explicitly engineered for PLS, and the suitability of other text generation evaluation metrics remains unclear. To address these concerns, our study presents a granular meta-evaluation testbed, APPLS, designed to evaluate existing metrics for PLS. Drawing on insights from previous research, we define controlled perturbations for our testbed along four criteria that a metric of plain language should capture: informativeness, simplification, coherence, and faithfulness. Our analysis of metrics using this testbed reveals that current metrics fail to capture simplification, signaling a crucial gap. In response, we introduce POMME, a novel metric designed to assess text simplification in PLS. We demonstrate its correlation with simplification perturbations and validate across a variety of datasets. Our research contributes the first meta-evaluation testbed for PLS and a comprehensive evaluation of existing metrics, offering insights with relevance to other text generation tasks.

arxiv情報

著者	Yue Guo,Tal August,Gondy Leroy,Trevor Cohen,Lucy Lu Wang
発行日	2023-05-23 17:59:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

APPLS: A Meta-evaluation Testbed for Plain Language Summarization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー