Measuring the Robustness of NLP Models to Domain Shifts

要約

ドメイン堅牢性 (DR) に関する既存の研究は、セットアップが異質であること、タスクの多様性が欠如していること、最近のモデルや少数ショット学習などの機能に関する研究が不足していることなどの問題があります。
さらに、DR を測定する一般的な手法が全体像をさらに曖昧にする可能性があると主張します。
現在の研究はチャレンジセットに焦点を当てており、ソースドロップ (SD) のみに依存しています。つまり、ソースドメイン内のパフォーマンスを劣化の基準点として使用しています。
ただし、ターゲットドロップ (TD) は補完的な観点として使用する必要があります。
最新の NLP モデルにおける DR の課題を理解するために、分類、QA、生成を含む 7 つの NLP タスクで構成されるベンチマークを開発しました。
私たちのベンチマークは自然な話題のドメインの変化に焦点を当てており、SD と TD の両方の測定を可能にします。
18 の微調整されたショット数の少ないモデルにわたる 14,000 を超えるドメインシフトを含む当社の包括的な調査では、どちらのモデルもドメインシフト時にドロップが発生することが示されています。
微調整されたモデルはドメイン内では優れていますが、ショット数の少ない LLM はクロスドメインでそれらを上回ることが多く、より優れた堅牢性を示しています。
さらに、SD が大きい場合は、本物の DR の課題ではなく、より困難な領域に移行することで説明できることもわかりました。
したがって、TD はより信頼性の高い指標となります。

要約(オリジナル)

Existing research on Domain Robustness (DR) suffers from disparate setups, lack of task variety, and scarce research on recent models and capabilities such as few-shot learning. Furthermore, we claim that the common practice of measuring DR might further obscure the picture. Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation. However, the Target Drop (TD) should be used as a complementary point of view. To understand the DR challenge in modern NLP models, we developed a benchmark comprised of seven NLP tasks, including classification, QA, and generation. Our benchmark focuses on natural topical domain shifts and enables measuring both the SD and the TD. Our comprehensive study, involving over 14,000 domain shifts across 18 fine-tuned and few-shot models, shows that both models suffer from drops upon domain shifts. While fine-tuned models excel in-domain, few-shot LLMs often surpass them cross-domain, showing better robustness. In addition, we found that a large SD can be explained by shifting to a harder domain rather than a genuine DR challenge. Thus, the TD is a more reliable metric.

arxiv情報

著者	Nitay Calderon,Naveh Porat,Eyal Ben-David,Alexander Chapanin,Zorik Gekhman,Nadav Oved,Vitaly Shalumov,Roi Reichart
発行日	2024-01-19 13:05:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Measuring the Robustness of NLP Models to Domain Shifts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー