ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

要約

パーソナライズされたレコメンデーションシステムの継続的なサイズ拡大により、モデル推論に新たな課題が生じています。
埋め込みテーブルの容量を削減するために重み共有アルゴリズムが提案されていますが、メモリアクセスが増加します。
Processing-in-Memory (PIM) の最近の進歩は、メモリの並列処理を利用することでレコメンデーションシステムのスループットを向上させることに成功しましたが、私たちの分析では、これらのアルゴリズムが以前の PIM システムに CPU-PIM 通信のオーバーヘッドを導入し、PIM スループットを損なうことを示しています。
私たちは、重み共有アルゴリズムを高速化するために調整された PIM テクノロジーと統合された特殊なメモリアーキテクチャである ProactivePIM を提案します。
ProacitvePIM は、PIM 内の SRAM キャッシュを効率的なプリフェッチスキームと統合して、アルゴリズムの独自の局所性を活用し、CPU-PIM 通信を排除します。

要約(オリジナル)

The personalized recommendation system’s continuous size growth poses new challenges for model inference. Although weight-sharing algorithms have been proposed to reduce embedding table capacity, they increase memory access. Recent advancements in processing-in-memory (PIM) successfully enhance the recommendation system’s throughput by exploiting memory parallelism, but our analysis shows that those algorithms introduce CPU-PIM communication overhead into prior PIM systems, compromising the PIM throughput. We propose ProactivePIM, a specialized memory architecture integrated with PIM technology tailored to accelerate the weight-sharing algorithms. ProacitvePIM integrates an SRAM cache within the PIM with an efficient prefetching scheme to leverage a unique locality of the algorithm and eliminate CPU-PIM communication.

arxiv情報

著者	Youngsuk Kim,Junghwan Lim,Hyuk-Jae Lee,Chae Eun Rhee
発行日	2024-11-13 16:33:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー