A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

要約

教師ありニューラルアプローチは、細心の注意を払って注釈が付けられた大規模なデータセットへの依存によって妨げられます。この要件は、逐次的なタスクでは特に面倒です。
アノテーションの品質は、専門家ベースのラベル付けからクラウドソーシングによるラベル付けへの移行に伴って低下する傾向があります。
これらの課題に対処するために、逐次的な複数出力問題に合わせたプールベースのアクティブラーニングフレームワークである CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning) を紹介します。
CAMEL には 2 つの核となる機能があります。(1) 選択したシーケンスの一部のみにラベルを付けるために専門のアノテーターが必要です。(2) シーケンスの残りの部分の自己監視を容易にします。
ラベル修正メカニズムを導入することで、CAMEL をデータクリーニングにも利用できます。
我々は、限られたノイズの多いデータセットの制約に悩まされるタスクである対話信念追跡に特に重点を置いた、2 つの連続タスクで CAMEL を評価します。
私たちの実験では、CAMEL が効率の点でベースラインを大幅に上回っていることが実証されています。
さらに、私たちの方法によって提案されたデータ修正は、結果として得られるデータセットの品質の全体的な向上に貢献します。

要約(オリジナル)

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.

arxiv情報

著者	Carel van Niekerk,Christian Geishauser,Michael Heck,Shutong Feng,Hsien-chin Lin,Nurul Lubis,Benjamin Ruppik,Renato Vukovic,Milica Gašić
発行日	2024-11-21 08:50:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー