Dynamic Sparsity Is Channel-Level Sparsity Learner

要約

スパーストレーニングは、推論だけでなくトレーニングプロセス全体の節約にもつながる可能性があるため、機械学習への関心が高まっています。
動的スパーストレーニング (DST) は、主要なスパーストレーニングアプローチであり、高スパース性でディープニューラルネットワークを最初からトレーニングして、高密度の対応物のパフォーマンスに匹敵することができます。
しかし、すべてではないにしても、ほとんどの DST の従来技術は、一般的なハードウェアでのサポートが限定されている、非常に不規則なスパースパターンを伴う非構造化スパース性に対して効果を示しています。
この制限により、実際の DST の使用が妨げられます。
この論文では、チャネル認識ダイナミックスパース (Chase) を提案します。これは、非構造化ダイナミックスパース性の約束を、1 回の実行中に GPU に適したチャネルレベルのスパース性 (きめ細かい N:M またはグループスパース性ではない) に初めてシームレスに変換します。
アドホックな操作を行わない、エンドツーエンドのトレーニングプロセス。
結果として生じる小規模なスパースネットワークは、特にスパース性を意識したハードウェアアクセラレータを使用せずに、汎用ハードウェアによって直接高速化できます。
この魅力的な結果は、動的なスパース性という隠れた現象によって部分的に動機付けられています。つまり、既製の非構造化 DST には暗黙的にチャネル間での偏ったパラメータの再割り当てが含まれており、チャネルの大部分 (最大 60%) が他のチャネルよりもスパースです。
トレーニング中にこれらのチャネルを段階的に特定して削除することで、私たちのアプローチは非構造化スパース性をチャネルごとのスパース性に変換します。
私たちの実験結果は、Chase が ImageNet 上の ResNet-50 で精度を損なうことなく、一般的な GPU デバイス上で 1.7 倍の推論スループットの高速化を達成していることを示しています。
コードは https://github.com/luuyin/chase で公開しています。

要約(オリジナル)

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrate their effectiveness on unstructured sparsity with highly irregular sparse patterns, which receives limited support in common hardware. This limitation hinders the usage of DST in practice. In this paper, we propose Channel-aware dynamic sparse (Chase), which for the first time seamlessly translates the promise of unstructured dynamic sparsity to GPU-friendly channel-level sparsity (not fine-grained N:M or group sparsity) during one end-to-end training process, without any ad-hoc operations. The resulting small sparse networks can be directly accelerated by commodity hardware, without using any particularly sparsity-aware hardware accelerators. This appealing outcome is partially motivated by a hidden phenomenon of dynamic sparsity: off-the-shelf unstructured DST implicitly involves biased parameter reallocation across channels, with a large fraction of channels (up to 60%) being sparser than others. By progressively identifying and removing these channels during training, our approach translates unstructured sparsity to channel-wise sparsity. Our experimental results demonstrate that Chase achieves 1.7 X inference throughput speedup on common GPU devices without compromising accuracy with ResNet-50 on ImageNet. We release our codes in https://github.com/luuyin/chase.

arxiv情報

著者	Lu Yin,Gen Li,Meng Fang,Li Shen,Tianjin Huang,Zhangyang Wang,Vlado Menkovski,Xiaolong Ma,Mykola Pechenizkiy,Shiwei Liu
発行日	2023-11-10 16:42:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Sparsity Is Channel-Level Sparsity Learner

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー