HydraViT: Stacking Heads for a Scalable ViT

要約

ビジョントランスフォーマー (ViT) のアーキテクチャ、特にマルチヘッドアテンション (MHA) メカニズムは、相当なハードウェア要求を課します。
携帯電話など、さまざまな制約があるデバイスに ViT を導入するには、サイズの異なる複数のモデルが必要です。
ただし、このアプローチには、必要な各モデルを個別にトレーニングして保存するなどの制限があります。
このペーパーでは、スケーラブルな ViT を達成するためにアテンションヘッドを積み重ねることによってこれらの制限に対処する新しいアプローチである HydraViT を紹介します。
HydraViT は、トレーニング中に各層全体の埋め込みディメンションのサイズと、それに対応する MHA のアテンションヘッドの数を繰り返し変更することにより、複数のサブネットワークを誘導します。
これにより、HydraViT はパフォーマンスを維持しながら、幅広いハードウェア環境への適応性を実現します。
私たちの実験結果は、幅広いリソース制約をカバーし、最大 10 のサブネットワークでスケーラブルな ViT を実現する HydraViT の有効性を示しています。
HydraViT は最大 5 pp. を達成します。
同じ GMAC で最大 7 pp の精度が向上します。
ImageNet-1K では、ベースラインと比較して同じスループットでより高い精度が得られるため、ハードウェアの可用性が多様であるか、時間の経過とともに変化するシナリオにとって効果的なソリューションになります。
ソースコードは https://github.com/ds-kiel/HydraViT で入手できます。

要約(オリジナル)

The architecture of Vision Transformers (ViTs), particularly the Multi-head Attention (MHA) mechanism, imposes substantial hardware demands. Deploying ViTs on devices with varying constraints, such as mobile phones, requires multiple models of different sizes. However, this approach has limitations, such as training and storing each required model separately. This paper introduces HydraViT, a novel approach that addresses these limitations by stacking attention heads to achieve a scalable ViT. By repeatedly changing the size of the embedded dimensions throughout each layer and their corresponding number of attention heads in MHA during training, HydraViT induces multiple subnetworks. Thereby, HydraViT achieves adaptability across a wide spectrum of hardware environments while maintaining performance. Our experimental results demonstrate the efficacy of HydraViT in achieving a scalable ViT with up to 10 subnetworks, covering a wide range of resource constraints. HydraViT achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on ImageNet-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. Source code available at https://github.com/ds-kiel/HydraViT.

arxiv情報

著者	Janek Haberer,Ali Hojjat,Olaf Landsiedel
発行日	2024-09-26 15:52:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HydraViT: Stacking Heads for a Scalable ViT

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー