Learning from Ambiguous Data with Hard Labels

要約

実世界のデータには、一般的な単一ハードラベルによるアノテーションパラダイムが無視するような、本質的な曖昧さが含まれていることが多い。このようなハードラベルを用いた曖昧なデータを用いた標準的な学習では、過信したモデルが生成され、その結果、汎化がうまくいかなくなる可能性がある。本稿では、この問題を軽減するために、量子化ラベル学習（QLL）と呼ばれる新しいフレームワークを提案する。第一に、我々はQLLを、ハードラベルを持つ（非常に）曖昧なデータからの学習として定式化する：理想的には、各曖昧なインスタンスは、各クラスにおける対応する確率的重みを記述する、グラウンドトゥルースのソフトラベル分布と関連付けられるべきであるが、これは通常アクセスできない。第二に、量子化されたラベルを持つ曖昧なデータのみから正確な分類器を訓練することを可能にするクラスワイズ正ラベルなし（CPU）リスク推定器を提案する。第三に、実世界におけるラベルが量子化された曖昧データセットをシミュレートするために、混合に基づく曖昧データ生成手順を設計し、実証的評価を行う。実験により、我々のCPU手法がモデルの汎化性能を大幅に改善し、ベースラインを上回ることが実証された。

要約(オリジナル)

Real-world data often contains intrinsic ambiguity that the common single-hard-label annotation paradigm ignores. Standard training using ambiguous data with these hard labels may produce overly confident models and thus leading to poor generalization. In this paper, we propose a novel framework called Quantized Label Learning (QLL) to alleviate this issue. First, we formulate QLL as learning from (very) ambiguous data with hard labels: ideally, each ambiguous instance should be associated with a ground-truth soft-label distribution describing its corresponding probabilistic weight in each class, however, this is usually not accessible; in practice, we can only observe a quantized label, i.e., a hard label sampled (quantized) from the corresponding ground-truth soft-label distribution, of each instance, which can be seen as a biased approximation of the ground-truth soft-label. Second, we propose a Class-wise Positive-Unlabeled (CPU) risk estimator that allows us to train accurate classifiers from only ambiguous data with quantized labels. Third, to simulate ambiguous datasets with quantized labels in the real world, we design a mixing-based ambiguous data generation procedure for empirical evaluation. Experiments demonstrate that our CPU method can significantly improve model generalization performance and outperform the baselines.

arxiv情報

著者	Zeke Xie,Zheng He,Nan Lu,Lichen Bai,Bao Li,Shuo Yang,Mingming Sun,Ping Li
発行日	2025-01-03 14:54:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Learning from Ambiguous Data with Hard Labels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー