Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

要約

タイトル：ChatGPTの情報抽出能力の評価：性能、説明可能性、キャリブレーション、信頼性の評価

要約：この論文では、ChatGPTの全体的な能力を、7つの精緻な情報抽出（IE）タスクを使用して評価することに焦点を当てています。特に、ChatGPTの性能、説明可能性、キャリブレーション、信頼性を測定し、ChatGPTまたはドメイン専門家からの15のキーから結果を提示します。研究の結果、ChatGPTのパフォーマンスはStandard-IE設定では低いが、OpenIE設定では驚くべき性能を示すことがわかりました。また、ChatGPTは、その決定に対して高品質で信頼性の高い説明を提供します。ただし、ChatGPTの予測が自信過剰になる問題があり、そのために低いキャリブレーションが生じます。さらに、ChatGPTは、ほとんどの場合において原文に忠実な信頼性の高い性格を示しています。この研究のために、7つの精緻なIEタスクのテストセットを手動で注釈付けし、14のデータセットを含みます。データセットとコードはhttps://github.com/pkuserc/ChatGPT_for_IEで利用可能です。

要点：
– LLMsであるChatGPTの理解力は非常に高く、最近非常に人気があります。
– ChatGPTの性能、説明可能性、キャリブレーション、信頼性を分析して、15のキー情報を提供します。
– ChatGPTの性能は、Standard-IE設定では低いが、OpenIE設定では素晴らしい性能を発揮することがわかった。
– ChatGPTは、高品質で信頼性の高い説明を提供しますが、予測が自信過剰になりキャリブレーションが低下することがあります。
– ChatGPTは、ほとんどの場合において原文に忠実で信頼性が高いという特徴があります。
– 7つの精緻なIEタスクのテストセットを手動で注釈付けし、14のデータセットを公開し、研究を進めることを提唱しています。

要約(オリジナル)

The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately. In this paper, we focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks. Specially, we present the systematically analysis by measuring ChatGPT’s performance, explainability, calibration, and faithfulness, and resulting in 15 keys from either the ChatGPT or domain experts. Our findings reveal that ChatGPT’s performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting, as evidenced by human evaluation. In addition, our research indicates that ChatGPT provides high-quality and trustworthy explanations for its decisions. However, there is an issue of ChatGPT being overconfident in its predictions, which resulting in low calibration. Furthermore, ChatGPT demonstrates a high level of faithfulness to the original text in the majority of cases. We manually annotate and release the test sets of 7 fine-grained IE tasks contains 14 datasets to further promote the research. The datasets and code are available at https://github.com/pkuserc/ChatGPT_for_IE.

arxiv情報

著者	Bo Li,Gexiang Fang,Yang Yang,Quansen Wang,Wei Ye,Wen Zhao,Shikun Zhang
発行日	2023-04-23 12:33:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー