A Negative Result on Gradient Matching for Selective Backprop

要約

モデルとデータセットのサイズが大きくなるにつれて、ディープニューラルネットワークのトレーニングは膨大な計算負荷になります。
トレーニングプロセスを高速化する 1 つのアプローチは、選択的バックプロップです。
このアプローチでは、フォワードパスを実行して、ミニバッチ内の各データポイントの損失値を取得します。
その後、逆方向パスはそのミニバッチのサブセットに制限され、高損失の例が優先されます。
私たちはこのアプローチに基づいて構築していますが、ミニバッチ全体の平均勾配に最もよく一致する (重み付けされた) サブセットを選択することにより、サブセット選択メカニズムの改善を目指しています。
グラデーションを使用します。
モデルの最後の層は安価なプロキシとして機能するため、フォワードパス以外のオーバーヘッドは実質的に発生しません。
同時に、実験のために、以前の研究にはなかった単純なランダム選択ベースラインを追加します。
驚くべきことに、損失ベースの戦略と勾配マッチング戦略の両方がランダムベースラインを一貫して上回ることができないことがわかりました。

要約(オリジナル)

With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain a loss value for each data point in a minibatch. The backward pass is then restricted to a subset of that minibatch, prioritizing high-loss examples. We build on this approach, but seek to improve the subset selection mechanism by choosing the (weighted) subset which best matches the mean gradient over the entire minibatch. We use the gradients w.r.t. the model’s last layer as a cheap proxy, resulting in virtually no overhead in addition to the forward pass. At the same time, for our experiments we add a simple random selection baseline which has been absent from prior work. Surprisingly, we find that both the loss-based as well as the gradient-matching strategy fail to consistently outperform the random baseline.

arxiv情報

著者	Lukas Balles,Cedric Archambeau,Giovanni Zappella
発行日	2023-12-08 13:03:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Negative Result on Gradient Matching for Selective Backprop

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー