RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

要約

BLEU、ROUGE、BERTScore、BLEURT など、自動生成された質問の品質を評価するための既存の指標は、参照質問と予測質問を比較し、候補質問と参照質問の間に大幅な語彙の重複または意味上の類似性がある場合に高いスコアを提供します。
このアプローチには 2 つの大きな欠点があります。
まず、人間が提供する高価な参考質問が必要です。
第 2 に、参照質問と語彙的または意味論的な類似性が高くない可能性がある有効な質問にペナルティが課されます。
この論文では、コンテキストを考慮した候補質問の回答可能性に基づいた新しい指標 RQUGE を提案します。
このメトリクスは、既存の文献からの事前トレーニング済みモデルを使用した質問応答モジュールとスパンスコアラーモジュールで構成されているため、追加のトレーニングなしで使用できます。
参照質問に依存せずに、RQUGE が人間の判断とより高い相関関係があることを示します。
さらに、RQUGE は、いくつかの敵対的な破損に対してより堅牢であることが示されています。
さらに、質問生成モデルによって生成され、RQUGE によって再ランク付けされた合成データを微調整することにより、ドメイン外のデータセットに対する QA モデルのパフォーマンスを大幅に向上できることを示します。

要約(オリジナル)

Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer modules, using pre-trained models from existing literature, thus it can be used without any further training. We demonstrate that RQUGE has a higher correlation with human judgment without relying on the reference question. Additionally, RQUGE is shown to be more robust to several adversarial corruptions. Furthermore, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on synthetic data generated by a question generation model and re-ranked by RQUGE.

arxiv情報

著者	Alireza Mohammadshahi,Thomas Scialom,Majid Yazdani,Pouya Yanki,Angela Fan,James Henderson,Marzieh Saeidi
発行日	2023-05-26 14:28:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー