How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

要約

GPT-3.5 モデルは、さまざまな自然言語処理 (NLP) タスクで印象的なパフォーマンスを発揮し、強力な理解力と推論能力を示しています。
ただし、オープンワールドのさまざまな複雑さを処理する堅牢性と能力はまだ調査されていません。これは、モデルの安定性を評価する上で特に重要であり、信頼できる AI の重要な側面です。
この調査では、9 つの一般的な自然言語理解 (NLU) タスクをカバーする TextFlint からの 66 のテキスト変換を含む 21 のデータセット (約 116K テストサンプル) を使用して、GPT-3.5 の包括的な実験的分析を行い、その堅牢性を調査します。
私たちの調査結果によると、GPT-3.5 は一部のタスクで既存の微調整されたモデルよりも優れていますが、自然言語推論および感情分析タスクで平均パフォーマンスが最大 35.74\% および 43.59\% 低下するなど、依然として大幅な堅牢性の低下が見られます。
それぞれ。
また、GPT-3.5 は、堅牢性の不安定性、プロンプトの感度、数値の感度など、特定の堅牢性の課題に直面していることも示しています。
これらの洞察は、その制限を理解し、これらの課題に対処して GPT-3.5 の全体的なパフォーマンスと一般化機能を強化するための将来の研究を導くのに役立ちます。

要約(オリジナル)

The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In this study, we perform a comprehensive experimental analysis of GPT-3.5, exploring its robustness using 21 datasets (about 116K test samples) with 66 text transformations from TextFlint that cover 9 popular Natural Language Understanding (NLU) tasks. Our findings indicate that while GPT-3.5 outperforms existing fine-tuned models on some tasks, it still encounters significant robustness degradation, such as its average performance dropping by up to 35.74\% and 43.59\% in natural language inference and sentiment analysis tasks, respectively. We also show that GPT-3.5 faces some specific robustness challenges, including robustness instability, prompt sensitivity, and number sensitivity. These insights are valuable for understanding its limitations and guiding future research in addressing these challenges to enhance GPT-3.5’s overall performance and generalization abilities.

arxiv情報

著者	Xuanting Chen,Junjie Ye,Can Zu,Nuo Xu,Rui Zheng,Minlong Peng,Jie Zhou,Tao Gui,Qi Zhang,Xuanjing Huang
発行日	2023-03-01 07:39:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー