MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

要約

ディープラーニングレコメンデーションシステムは、さまざまなテールレイテンシターゲットと入力クエリ負荷の下で、パーソナライズされたコンテンツを提供します。
そのために、最先端のレコメンデーションモデルは、テラバイト規模の埋め込みテーブルに依存して、大量のコンテンツに対するユーザーの好みを学習します。
埋め込みテーブルの固定埋め込み表現への依存は、かなりのメモリ容量と帯域幅の要件を課すだけでなく、互換性のあるシステムソリューションの範囲も制限します。
このホワイトペーパーでは、埋め込み表現とハードウェアプラットフォームの間の相乗効果がアルゴリズムとシステムの両方のパフォーマンスの向上につながることを示すことにより、固定埋め込み表現の仮定に挑戦します。
さまざまな埋め込み表現の特徴付けに基づいて、メモリと計算要件の増加を犠牲にして、より高品質の埋め込みを実現するハイブリッド埋め込み表現を提案します。
ハイブリッド表現のシステムパフォーマンスの課題に対処するために、MP-Rec を提案します。これは、異質性と、埋め込み表現と基礎となるハードウェアプラットフォームの動的な選択を活用する協調設計手法です。
実際のシステムハードウェアでは、カスタムアクセラレータ (GPU、TPU、IPU) を互換性のある埋め込み表現と一致させると、パフォーマンスが 16.65 倍高速化されることを示しています。
さらに、クエリサービスシナリオでは、MP-Rec は、Kaggle およびテラバイトデータセットの CPU-GPU システムで、それぞれ 2.49 倍および 3.76 倍の正確な予測スループットと 0.19% および 0.22% 優れたモデル品質を達成します。

要約(オリジナル)

Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandwidth requirements but also limits the scope of compatible system solutions. This paper challenges the assumption of fixed embedding representations by showing how synergies between embedding representations and hardware platforms can lead to improvements in both algorithmic- and system performance. Based on our characterization of various embedding representations, we propose a hybrid embedding representation that achieves higher quality embeddings at the cost of increased memory and compute requirements. To address the system performance challenges of the hybrid representation, we propose MP-Rec — a co-design technique that exploits heterogeneity and dynamic selection of embedding representations and underlying hardware platforms. On real system hardware, we demonstrate how matching custom accelerators, i.e., GPUs, TPUs, and IPUs, with compatible embedding representations can lead to 16.65x performance speedup. Additionally, in query-serving scenarios, MP-Rec achieves 2.49x and 3.76x higher correct prediction throughput and 0.19% and 0.22% better model quality on a CPU-GPU system for the Kaggle and Terabyte datasets, respectively.

arxiv情報

著者	Samuel Hsia,Udit Gupta,Bilge Acun,Newsha Ardalani,Pan Zhong,Gu-Yeon Wei,David Brooks,Carole-Jean Wu
発行日	2023-02-21 18:38:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー