From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications

要約

大規模言語モデル (LLM) の評価は、特に実際のアプリケーションのコンテキストにおいては基本です。
通常、主に LLM 開発のために設計された従来の評価方法では、ユーザーエクスペリエンスを無視した数値スコアが得られます。
したがって、私たちの研究では、AI を活用したライティング支援アプリケーションのコンテキストにおいて、モデル中心から人間中心の評価に焦点を移しています。
「改訂距離」と呼ばれる私たちが提案する指標は、LLM を利用して、人間の執筆プロセスを模倣した改訂編集を提案します。
これは、LLM によって生成されたリビジョン編集をカウントすることによって決定されます。
生成されたリビジョン編集の詳細を利用して、私たちのメトリクスは、コンテキストに依存しないスコアを超えて、人間が理解できる方法で説明のないテキスト評価結果を提供できます。
私たちの結果によると、書きやすいタスクでは、「改訂距離」は確立された指標 (ROUGE、Bert スコア、GPT スコア) と一致していますが、より洞察力に富んだ詳細なフィードバックが得られ、テキスト間の区別が向上していることがわかりました。
さらに、学術的なライティングという困難な課題において、他の指標では困難な傾向にある場合でも、当社の指標は信頼性の高い評価を提供します。
さらに、私たちの指標は、参照テキストが不足しているシナリオでも大きな可能性を秘めています。

要約(オリジナル)

Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. Our proposed metric, termed “Revision Distance,” utilizes LLMs to suggest revision edits that mimic the human writing process. It is determined by counting the revision edits generated by LLMs. Benefiting from the generated revision edit details, our metric can provide a self-explained text evaluation result in a human-understandable manner beyond the context-independent score. Our results show that for the easy-writing task, “Revision Distance” is consistent with established metrics (ROUGE, Bert-score, and GPT-score), but offers more insightful, detailed feedback and better distinguishes between texts. Moreover, in the context of challenging academic writing tasks, our metric still delivers reliable evaluations where other metrics tend to struggle. Furthermore, our metric also holds significant potential for scenarios lacking reference texts.

arxiv情報

著者	Yongqiang Ma,Lizhi Qing,Jiawei Liu,Yangyang Kang,Yue Zhang,Wei Lu,Xiaozhong Liu,Qikai Cheng
発行日	2024-04-11 02:36:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー