TinyTrain: Deep Neural Network Training at the Extreme Edge

要約

オンデバイストレーニングは、ユーザーのパーソナライゼーションとプライバシーにとって不可欠です。
IoT デバイスとマイクロコントローラーユニット (MCU) の普及に伴い、メモリとコンピューティングリソースの制約、およびラベル付きユーザーデータの利用可能性の制限により、このタスクはさらに困難になっています。
それにもかかわらず、従来の研究ではデータ不足の問題が無視され、過度に長いトレーニング時間 (例: 数時間) が必要になったり、大幅な精度損失 ($\geq$10\%) が引き起こされたりしています。
私たちは、モデルの一部を選択的に更新し、データ不足に明示的に対処することでトレーニング時間を大幅に短縮するオンデバイストレーニングアプローチである TinyTrain を提案します。
TinyTrain は、ユーザーデータ、メモリ、ターゲットデバイスのコンピューティング能力を共同でキャプチャする多目的基準に基づいてレイヤー/チャネルを動的に選択するタスク適応型スパース更新メソッドを導入し、目に見えないタスクでの高精度を実現します。
計算量とメモリ使用量が削減されます。
TinyTrain は、ネットワーク全体の通常の微調整よりも精度で 3.6 ～ 5.0\% 優れており、逆方向パスのメモリと計算コストをそれぞれ最大 2,286$\times$ と 7.68$\times$ 削減します。
広く使用されている現実世界でのエッジデバイスをターゲットとする TinyTrain は、現状維持のアプローチと比べて 9.5 倍の高速化と 3.5 倍のエネルギー効率の高いトレーニングを実現し、SOTA アプローチよりも 2.8 倍小さいメモリ使用量を実現します。
MCU グレードのプラットフォームの 1 MB メモリエンベロープ。

要約(オリジナル)

On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCU), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss ($\geq$10\%). We propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0\% in accuracy, while reducing the backward-pass memory and computation cost by up to 2,286$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.8$\times$ smaller memory footprint than SOTA approaches, while remaining within the 1 MB memory envelope of MCU-grade platforms.

arxiv情報

著者	Young D. Kwon,Rui Li,Stylianos I. Venieris,Jagmohan Chauhan,Nicholas D. Lane,Cecilia Mascolo
発行日	2023-07-19 13:49:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TinyTrain: Deep Neural Network Training at the Extreme Edge

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー