BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification

要約

このホワイトペーパーでは、Semeval-2025タスク9：The Food Hazard Detection Challenge向けに開発されたシステムを紹介します。
共有タスクの目的は、Food Recallインシデントレポートから2つのレベルの粒度で危険と製品を分類するための説明可能な分類システムを評価することです。
この作業では、マイノリティクラスのパフォーマンスの低下を改善し、さまざまな変圧器および機械学習モデルに対する各カテゴリの効果を比較する方法として、テキスト増強技術を提案します。
3つの単語レベルのデータ増強技術、すなわち同義語の置換、ランダムワードスワッピング、およびコンテキストワード挿入を調べます。
結果は、トランスモデルが全体的なパフォーマンスを向上させる傾向があることを示しています。
3つの拡張技術のいずれも、危険と製品を分類するための全体的なパフォーマンスを一貫して改善するものではありませんでした。
BERTモデルを使用してベースラインを各拡張モデルと比較すると、細粒カテゴリで統計的に有意な改善（P <0.05）が観察されました。ベースラインと比較して、文脈的な単語の挿入の増強により、少数派のハザードクラスの予測の精度が6％向上しました。これは、少数派クラスのターゲット増強が変圧器モデルのパフォーマンスを改善できることを示唆しています。

要約(オリジナル)

This paper presents our system developed for the SemEval-2025 Task 9: The Food Hazard Detection Challenge. The shared task’s objective is to evaluate explainable classification systems for classifying hazards and products in two levels of granularity from food recall incident reports. In this work, we propose text augmentation techniques as a way to improve poor performance on minority classes and compare their effect for each category on various transformer and machine learning models. We explore three word-level data augmentation techniques, namely synonym replacement, random word swapping, and contextual word insertion. The results show that transformer models tend to have a better overall performance. None of the three augmentation techniques consistently improved overall performance for classifying hazards and products. We observed a statistically significant improvement (P < 0.05) in the fine-grained categories when using the BERT model to compare the baseline with each augmented model. Compared to the baseline, the contextual words insertion augmentation improved the accuracy of predictions for the minority hazard classes by 6%. This suggests that targeted augmentation of minority classes can improve the performance of transformer models.

arxiv情報

著者	Foteini Papadopoulou,Osman Mutlu,Neris Özen,Bas H. M. van der Velden,Iris Hendrickx,Ali Hürriyetoğlu
発行日	2025-04-29 12:34:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー