LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers

要約

コミュニティは、サーバークライアント設定でトランスフォーマーベースの大規模言語モデル (LLM) 用のプライベート推論フレームワークを構築することを検討しました。サーバーはモデルパラメーターを保持し、クライアントは推論用のプライベートデータ (またはプロンプト) を入力します。
ただし、これらのフレームワークは、プライベート入力が元の LLM を介して順方向に伝播されるときに、重大なオーバーヘッドを課します。
この論文では、トランスフォーマーアーキテクチャ内の計算と通信の負荷が高い演算子をプライバシーコンピューティングに優しい近似に置き換えることで、モデルのパフォーマンスへの影響を最小限に抑えながら、プライベート推論のコストを大幅に削減できることを示します。
最先端の Iron (NeurIPS 2022) と比較して、当社のプライバシーコンピューティングに優しいモデル推論パイプラインは、ほぼ同等の精度を維持しながら、計算で 5 倍の高速化と通信オーバーヘッドの 80% 削減を達成します。

要約(オリジナル)

The community explored to build private inference frameworks for transformer-based large language models (LLMs) in a server-client setting, where the server holds the model parameters and the client inputs its private data (or prompt) for inference. However, these frameworks impose significant overhead when the private inputs are forward propagated through the original LLMs. In this paper, we show that substituting the computation- and communication-heavy operators in the transformer architecture with privacy-computing friendly approximations can greatly reduce the private inference costs while incurring very minor impact on model performance. Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing friendly model inference pipeline achieves a $5\times$ acceleration in computation and an 80% reduction in communication overhead, while retaining nearly identical accuracy.

arxiv情報

著者	Xuanqi Liu,Zhuotao Liu
発行日	2023-12-15 02:03:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー