Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization

要約

機械学習モデルは、ベンチマークの自然言語処理 (NLP) データセットで高いパフォーマンスを達成できますが、より困難な設定では失敗します。
事前トレーニング済みのモデルが自然言語推論 (NLI) でデータセットアーティファクトを学習するときに、この問題を調査します。これは、テキストシーケンスのペア間の論理関係を調査するトピックです。
クラウドソーシングされたスタンフォード自然言語推論 (SNLI) コーパス内のデータセットアーティファクトを分析および検索するためのさまざまな手法を提供します。
SNLI のデータセットアーティファクトのスタイルパターンを研究します。
データセットのアーティファクトを軽減するために、文レベルでの行動テストチェックリストと単語レベルでの語彙同義語基準という 2 つの異なるフレームワークを備えた独自のマルチスケールデータ拡張手法を採用しています。
具体的には、組み合わせ方法により、モデルの摂動テストに対する耐性が強化され、事前トレーニングされたベースラインを継続的に上回ることができます。

要約(オリジナル)

Machine learning models can reach high performance on benchmark natural language processing (NLP) datasets but fail in more challenging settings. We study this issue when a pre-trained model learns dataset artifacts in natural language inference (NLI), the topic of studying the logical relationship between a pair of text sequences. We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference (SNLI) corpus. We study the stylistic pattern of dataset artifacts in the SNLI. To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks: a behavioral testing checklist at the sentence level and lexical synonym criteria at the word level. Specifically, our combination method enhances our model’s resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.

arxiv情報

著者	Zhenyuan Lu
発行日	2023-03-16 21:31:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Scales Data Augmentation Approach In Natural Language Inference For Artifacts Mitigation And Pre-Trained Model Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー