Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

要約

アクティブラーニング (AL) を使用すると、モデルはユーザーのフィードバックから対話的に学習できます。
このペーパーでは、AL に対する反事実的なデータ拡張アプローチを紹介し、特に、データ効率を高める上で極めて重要な、ユーザーのクエリのためのデータポイントの選択に対処します。
私たちのアプローチは、変化理論に触発されています。変化理論は、何が変わらないのか、何が変化するのかに焦点を当てることで、概念の本質的な特徴を強調する人間の概念学習の理論です。
私たちのアプローチでは、既存のデータポイントを単にクエリするのではなく、大規模言語モデル (LLM) とルールベースのモデルを組み合わせた神経記号パイプラインを使用して、ラベル間の潜在的な重要な類似点と相違点を強調する人工データポイントを合成します。
テキスト分類のサンプルドメインでの実験を通じて、注釈付きデータが少ない場合に、このアプローチが大幅に高いパフォーマンスを達成することを示します。
注釈付きトレーニングデータが大きくなるにつれて、生成されたデータの影響は減少し始め、AL のコールドスタート問題に対処する能力が示されます。
この研究は、人間の学習理論を AL の最適化に統合することに光を当てています。

要約(オリジナル)

Active Learning (AL) allows models to learn interactively from user feedback. This paper introduces a counterfactual data augmentation approach to AL, particularly addressing the selection of datapoints for user querying, a pivotal concern in enhancing data efficiency. Our approach is inspired by Variation Theory, a theory of human concept learning that emphasizes the essential features of a concept by focusing on what stays the same and what changes. Instead of just querying with existing datapoints, our approach synthesizes artificial datapoints that highlight potential key similarities and differences among labels using a neuro-symbolic pipeline combining large language models (LLMs) and rule-based models. Through an experiment in the example domain of text classification, we show that our approach achieves significantly higher performance when there are fewer annotated data. As the annotated training data gets larger the impact of the generated data starts to diminish showing its capability to address the cold start problem in AL. This research sheds light on integrating theories of human learning into the optimization of AL.

arxiv情報

著者	Simret Araya Gebreegziabher,Kuangshi Ai,Zheng Zhang,Elena L. Glassman,Toby Jia-Jun Li
発行日	2024-08-07 14:55:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー