FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge

要約

エッジ AI コンピューティングボックスは、AI 業界に革命を起こすことを目的とした新しいクラスのコンピューティングデバイスです。
これらのコンパクトで堅牢なハードウェアユニットは、AI 処理の能力をネットワークのエッジにあるデータソースに直接もたらします。
一方、オンデマンドのサーバーレス推論サービスは、中小企業向けの DNN モデルのホスティングと実行に関連するインフラストラクチャコストを最小限に抑えることができるため、ますます人気が高まっています。
ただし、これらのコンピューティングデバイスには、リソースの可用性の点で依然として制約があります。
そのため、サービスプロバイダーは、増大する需要に応えるために、モデルを効率的にロードおよびアンロードする必要があります。
このペーパーでは、エッジ上のオンデマンドのサーバーレス推論サービスの DNN モデルを効率的に交換する FusedInf を紹介します。
FusedInf は、複数のモデルを単一の直接非巡回グラフ (DAG) に結合して、モデルを GPU メモリに効率的にロードし、実行を高速化します。
一般的な DNN モデルを評価したところ、単一の DAG を作成すると、モデルの実行が最大 14\% 高速になり、メモリ要件が最大 17\% 削減できることがわかりました。
プロトタイプの実装は https://github.com/SifatTaj/FusedInf で入手できます。

要約(オリジナル)

Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data–on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory and make execution faster. Our evaluation of popular DNN models showed that creating a single DAG can make the execution of the models up to 14\% faster while reducing the memory requirement by up to 17\%. The prototype implementation is available at https://github.com/SifatTaj/FusedInf.

arxiv情報

著者	Sifat Ut Taki,Arthi Padmanabhan,Spyridon Mastorakis
発行日	2024-10-28 15:21:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー