Learning with Positive and Imperfect Unlabeled Data

要約

肯定的なデータ分布がシフトされたときに、陽性および非標識データからバイナリ分類子を学習する問題を研究します。
共変量シフトがない場合、つまり完全に無効なデータを使用して、Denis（1998）はこの問題をMassARTノイズの下で学習に減らしました。
ただし、その減少はわずかなシフトでも失敗します。
PIU学習に関する主な結果は、PIU学習のサンプルの複雑さの特性と、誤分類エラー$ \ varepsilon $を達成する計算的およびサンプル効率の高いアルゴリズムです。
さらに、結果がいくつかの関連する問題の新しいアルゴリズムにつながることを示します。
1.スムーズな分布からの学習：スムーズな特徴分布の下での正のサンプルのみから興味深い概念クラスを学習するアルゴリズムを提供し、既知の既存の不可能性の結果をバイパスし、スムーズ化した学習の最近の進歩に貢献しています（Haghtalab et al、j.acm’24）（Chandrasekaran et al。、colt’24）。
2。非標識分布のリストを使用した学習：私たちは、学習者に知られていない非標識分布のリストが与えられているという仮定の下で、幅広いクラスの概念クラスに適用される新しいアルゴリズムを設計します。
3。不明な切り捨ての存在下での推定：$ L_1 $ -NORMの多項式によって近似可能な未知のセットに切り捨てられたサンプルからの指数関数的なファミリー分布のパラメーターを推定するための最初の多項式サンプルと時間アルゴリズムを与えます。
これにより、Leeらによるアルゴリズムが改善されます。
（Focs’24）$ l_2 $ -normの近似が必要です。
4.切り捨ての検出：与えられたサンプルが非生産分布を含む幅広いクラスの非製品分布のために切り捨てられた（またはそうでない）かどうかを検出するための新しいアルゴリズムを提示し、De et al。
（Stoc’24）。

要約(オリジナル)

We study the problem of learning binary classifiers from positive and unlabeled data when the unlabeled data distribution is shifted, which we call Positive and Imperfect Unlabeled (PIU) Learning. In the absence of covariate shifts, i.e., with perfect unlabeled data, Denis (1998) reduced this problem to learning under Massart noise; however, that reduction fails under even slight shifts. Our main results on PIU learning are the characterizations of the sample complexity of PIU learning and a computationally and sample-efficient algorithm achieving a misclassification error $\varepsilon$. We further show that our results lead to new algorithms for several related problems. 1. Learning from smooth distributions: We give algorithms that learn interesting concept classes from only positive samples under smooth feature distributions, bypassing known existing impossibility results and contributing to recent advances in smoothened learning (Haghtalab et al, J.ACM’24) (Chandrasekaran et al., COLT’24). 2. Learning with a list of unlabeled distributions: We design new algorithms that apply to a broad class of concept classes under the assumption that we are given a list of unlabeled distributions, one of which–unknown to the learner–is $O(1)$-close to the true feature distribution. 3. Estimation in the presence of unknown truncation: We give the first polynomial sample and time algorithm for estimating the parameters of an exponential family distribution from samples truncated to an unknown set approximable by polynomials in $L_1$-norm. This improves the algorithm by Lee et al. (FOCS’24) that requires approximation in $L_2$-norm. 4. Detecting truncation: We present new algorithms for detecting whether given samples have been truncated (or not) for a broad class of non-product distributions, including non-product distributions, improving the algorithm by De et al. (STOC’24).

arxiv情報

著者	Jane H. Lee,Anay Mehrotra,Manolis Zampetakis
発行日	2025-04-14 17:19:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning with Positive and Imperfect Unlabeled Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー