Bridging Topic, Domain, and Language Shifts: An Evaluation of Comprehensive Out-of-Distribution Scenarios

要約

言語モデル (LM) は、トレーニングデータとテストデータが独立しており、同一に分散される分散内 (ID) シナリオに優れています。
ただし、引数マイニングなどの実際のアプリケーションではパフォーマンスが低下することがよくあります。
このような劣化は、新しいトピックが出現したり、他のテキストドメインや言語が関連したりしたときに発生します。
このような分布外 (OOD) シナリオにおける LM の汎化能力を評価するために、ソーシャルメディアドメインやトピックの太陽エネルギーなど、テストのために特定のインスタンスを意図的に差し控えることによって、そのような分布の変化をシミュレートします。
特定のシフトと指標に個別に焦点を当てた先行研究とは異なり、OOD の一般化を包括的に分析します。
一般化の欠陥を正確に指摘するために 3 つの指標を定義し、トピック、ドメイン、言語の変化をカバーする 11 の分類タスクを提案します。
全体として、特にトレーニングとテストの分割が主に意味的に異なる場合、プロンプトベースの微調整のパフォーマンスが優れていることがわかります。
同時に、トレーニングデータがテストデータと比較してラベル分布に大きな不一致を示している場合、コンテキスト内学習はプロンプトベースまたはバニラのタスク微調整よりも効果的です。
これは、勾配ベースの学習の重大な欠点を明らかにしています。つまり、このような構造的な障害に関して LM にバイアスがかかるということです。

要約(オリジナル)

Language models (LMs) excel in in-distribution (ID) scenarios where train and test data are independent and identically distributed. However, their performance often degrades in real-world applications like argument mining. Such degradation happens when new topics emerge, or other text domains and languages become relevant. To assess LMs’ generalization abilities in such out-of-distribution (OOD) scenarios, we simulate such distribution shifts by deliberately withholding specific instances for testing, as from the social media domain or the topic Solar Energy. Unlike prior studies focusing on specific shifts and metrics in isolation, we comprehensively analyze OOD generalization. We define three metrics to pinpoint generalization flaws and propose eleven classification tasks covering topic, domain, and language shifts. Overall, we find superior performance of prompt-based fine-tuning, notably when train and test splits primarily differ semantically. Simultaneously, in-context learning is more effective than prompt-based or vanilla fine-tuning for tasks when training data embodies heavy discrepancies in label distribution compared to testing data. This reveals a crucial drawback of gradient-based learning: it biases LMs regarding such structural obstacles.

arxiv情報

著者	Andreas Waldis,Iryna Gurevych
発行日	2023-09-15 11:15:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bridging Topic, Domain, and Language Shifts: An Evaluation of Comprehensive Out-of-Distribution Scenarios

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー