LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

要約

ガードレールは、大規模言語モデル (LLM) のコンテンツ管理のための安全調整の代替手段として登場しました。
既存のモデルベースのガードレールは、携帯電話など、リソースに制約のある計算用ポータブルデバイス向けに設計されておらず、LLM ベースのアプリケーションをローカルで実行するデバイスが増えています。
LLM とガードレールモデル間の知識共有に依存する、パラメーター効率の高いガードレール適応手法である LoRA-Guard を紹介します。
LoRA-Guard は、LLM から言語機能を抽出し、低ランクのアダプターを使用してコンテンツモデレーションタスクに適応させます。また、デュアルパス設計により、生成タスクでのパフォーマンスの低下を防ぎます。
LoRA-Guard は精度を維持しながらパラメーターオーバーヘッドを 100 ～ 1000 分の 1 に抑え、既存のアプローチよりも優れたパフォーマンスを示し、オンデバイスのコンテンツモデレーションを可能にすることを示します。

要約(オリジナル)

Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.

arxiv情報

著者	Hayder Elesedy,Pedro M. Esperança,Silviu Vlad Oprea,Mete Ozay
発行日	2024-12-18 16:07:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー