Learning Mixtures of Gaussians Using Diffusion Models


$ k $ gausianiansの混合物を学習するための新しいアルゴリズム($ \ mathbb {r}^n $のID共分散{r}^n $)をテレビエラー$ \ varepsilon $で、
($ o(n^{\ text {poly \、log} \ left(\ frac {n+k} {\ varepsilon} \ right)})$)$とサンプルの複雑さ、最小重量の仮定の下で。
私たちの結果は、一定の半径の$ k $ボールの結合で混合分布がサポートされるガウスの連続混合物にまで及びます。


We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.


著者 Khashayar Gatmiry,Jonathan Kelner,Holden Lee
発行日 2025-03-04 15:36:34+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.DS, cs.LG, math.PR, math.ST, stat.ML, stat.TH パーマリンク