Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

要約

デバイス上の推論は、Edge MLシステムのエネルギー効率、応答性、プライバシーの向上の大きな可能性を秘めています。
ただし、リソースに制限されたデバイスに埋め込むことができるMLモデルが少ないため、ユースケースは視覚キーワードスポッティング、ジェスチャー認識、予測分析などの単純な推論タスクに限定されます。
これに関連して、選択したサンプルをリモートML推論のためにエッジサーバーまたはクラウドにオフロードすることにより、ローカルMLの機能を強化する有望なソリューションとして階層推論（HI）システムが浮上しています。
既存の作品は、HIが精度を向上させることをシミュレーションを通じて実証しています。
ただし、デバイスのレイテンシとエネルギー消費を考慮しておらず、MLシステムを特徴付ける3つの重要な不均一な寸法、つまりハードウェア、ネットワーク接続、モデルを考慮しません。
対照的に、このペーパーでは、異なる機能と3つの画像分類データセットを備えた5つのデバイスで埋め込まれたMLモデルを実行するための精度、レイテンシ、およびエネルギーの測定に基づいて、HIのパフォーマンスをオンデバイス推論と体系的に比較します。
特定の精度要件に対して、設計したHIシステムは、デバイス推論システムよりも最大73％低いレイテンシと最大77％のデバイスエネルギー消費を達成しました。
効率的なHIシステムを構築するための鍵は、リモート推論を必要とするサンプルに対して出力を効果的に区別できる小型で合理的に正確なオンデバイスモデルの可用性です。
パフォーマンスの向上にもかかわらず、HIはすべてのサンプルに対してデバイス上の推論を必要とします。これは、そのレイテンシとエネルギー消費に固定オーバーヘッドを追加します。
したがって、ハイブリッドシステムを設計し、HI（EE-HI）で早期出口を設計し、HIと比較して、EE-HIが最大59.7％減少し、デバイスのエネルギー消費量を最大60.4％削減することを実証します。

要約(オリジナル)

On-device inference holds great potential for increased energy efficiency, responsiveness, and privacy in edge ML systems. However, due to less capable ML models that can be embedded in resource-limited devices, use cases are limited to simple inference tasks such as visual keyword spotting, gesture recognition, and predictive analytics. In this context, the Hierarchical Inference (HI) system has emerged as a promising solution that augments the capabilities of the local ML by offloading selected samples to an edge server or cloud for remote ML inference. Existing works demonstrate through simulation that HI improves accuracy. However, they do not account for the latency and energy consumption on the device, nor do they consider three key heterogeneous dimensions that characterize ML systems: hardware, network connectivity, and models. In contrast, this paper systematically compares the performance of HI with on-device inference based on measurements of accuracy, latency, and energy for running embedded ML models on five devices with different capabilities and three image classification datasets. For a given accuracy requirement, the HI systems we designed achieved up to 73% lower latency and up to 77% lower device energy consumption than an on-device inference system. The key to building an efficient HI system is the availability of small-size, reasonably accurate on-device models whose outputs can be effectively differentiated for samples that require remote inference. Despite the performance gains, HI requires on-device inference for all samples, which adds a fixed overhead to its latency and energy consumption. Therefore, we design a hybrid system, Early Exit with HI (EE-HI), and demonstrate that compared to HI, EE-HI reduces the latency by up to 59.7% and lowers the device’s energy consumption by up to 60.4%.

arxiv情報

著者	Adarsh Prasad Behera,Paulius Daubaris,Iñaki Bravo,José Gallego,Roberto Morabito,Joerg Widmer,Jaya Prakash Varma Champati
発行日	2025-04-17 14:53:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー