EdgeRAG: Online-Indexed RAG for Edge Devices

要約

メモリと処理能力が限られているため、リソースに制約のあるエッジデバイスに取得拡張生成 (RAG) を導入するのは困難です。
この研究では、クラスタ内のエンベディングをプルーニングし、取得中にオンデマンドでエンベディングを生成することでメモリ制約に対処する EdgeRAG を提案します。
大規模なテールクラスターのエンベディング生成のレイテンシーを回避するために、EdgeRAG はこれらのクラスターのエンベディングを事前計算して保存し、残りのエンベディングを適応的にキャッシュして冗長な計算を最小限に抑え、レイテンシーをさらに最適化します。
BEIR スイートの結果は、EdgeRAG がベースライン IVF インデックスと比較してレイテンシーを大幅に削減しながら、評価したすべてのデータセットをメモリに適合させながら、同様の生成品質を実現していることを示しています。

要約(オリジナル)

Deploying Retrieval Augmented Generation (RAG) on resource-constrained edge devices is challenging due to limited memory and processing power. In this work, we propose EdgeRAG which addresses the memory constraint by pruning embeddings within clusters and generating embeddings on-demand during retrieval. To avoid the latency of generating embeddings for large tail clusters, EdgeRAG pre-computes and stores embeddings for these clusters, while adaptively caching remaining embeddings to minimize redundant computations and further optimize latency. The result from BEIR suite shows that EdgeRAG offers significant latency reduction over the baseline IVF index, but with similar generation quality while allowing all of our evaluated datasets to fit into the memory.

arxiv情報

著者	Korakit Seemakhupt,Sihang Liu,Samira Khan
発行日	2024-12-31 20:40:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EdgeRAG: Online-Indexed RAG for Edge Devices

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー