Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies

要約

データサンプリングはニューラルネットワークのトレーニング速度を向上させる効果的な方法であり、最近の結果では、データサンプリングがニューラルスケーリングの法則を破ることさえできることが実証されています。
これらの結果は、ネットワークへの入力の重要性を推定するための高品質スコアに大きく依存しています。
トレーニング前にスコアが決定される静的サンプリングと、スコアがモデルの重みに依存する動的サンプリングの 2 つの主要な戦略があることがわかります。
静的アルゴリズムは計算コストが低いですが、動的アルゴリズムに比べて効率が低く、損失を明示的に計算する必要があるため、エンドツーエンドの速度低下を引き起こす可能性があります。
この問題に対処するために、ニューラルネットワークのトレーニング中に効果的な重要度スコアを学習する、ノンパラメトリックカーネル回帰に基づく新しいサンプリング分布を提案します。
ただし、ノンパラメトリック回帰モデルは計算コストが高すぎるため、エンドツーエンドのトレーニングを高速化できません。
したがって、Nadaraya-Watson 推定量に対する効率的なスケッチベースの近似を開発します。
高次元統計とランダム化アルゴリズムによる最新の手法を使用して、Nadaraya-Watson スケッチが指数収束を保証して推定量を近似していることを証明します。
私たちのサンプリングアルゴリズムは、4 つのデータセットの実時間と精度の点でベースラインを上回っています。

要約(オリジナル)

Data sampling is an effective method to improve the training speed of neural networks, with recent results demonstrating that it can even break the neural scaling laws. These results critically rely on high-quality scores to estimate the importance of an input to the network. We observe that there are two dominant strategies: static sampling, where the scores are determined before training, and dynamic sampling, where the scores can depend on the model weights. Static algorithms are computationally inexpensive but less effective than their dynamic counterparts, which can cause end-to-end slowdown due to their need to explicitly compute losses. To address this problem, we propose a novel sampling distribution based on nonparametric kernel regression that learns an effective importance score as the neural network trains. However, nonparametric regression models are too computationally expensive to accelerate end-to-end training. Therefore, we develop an efficient sketch-based approximation to the Nadaraya-Watson estimator. Using recent techniques from high-dimensional statistics and randomized algorithms, we prove that our Nadaraya-Watson sketch approximates the estimator with exponential convergence guarantees. Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.

arxiv情報

著者	Shabnam Daghaghi,Benjamin Coleman,Benito Geordie,Anshumali Shrivastava
発行日	2023-11-22 18:40:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー