Shh, don’t say that! Domain Certification in LLMs

要約

大規模な言語モデル（LLM）は、狭いドメインを備えた制約付きタスクを実行するために展開されることがよくあります。
たとえば、カスタマーサポートボットはLLMSの上に構築でき、パフォーマンスを向上させるための幅広い言語の理解と能力に依存することができます。
ただし、これらのLLMは敵対的に感受性が高く、意図したドメインの外側の出力を潜在的に生成します。
このリスクを正式化、評価、および軽減するために、ドメイン認証を導入します。
言語モデルのドメイン外の動作を正確に特徴付ける保証。
次に、シンプルでありながら効果的なアプローチを提案します。これは、証明書として敵対的な境界を提供する有効なアプローチを提案します。
最後に、多様なデータセットのセットでメソッドを評価し、意味のある証明書を生成することを実証し、拒否行動に対して最小限のペナルティでドメイン外サンプルの確率をしっかりと結びつけます。

要約(オリジナル)

Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains. For example, customer support bots can be built on top of LLMs, relying on their broad language understanding and capabilities to enhance performance. However, these LLMs are adversarially susceptible, potentially generating outputs outside the intended domain. To formalize, assess, and mitigate this risk, we introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models. We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate. Finally, we evaluate our method across a diverse set of datasets, demonstrating that it yields meaningful certificates, which bound the probability of out-of-domain samples tightly with minimum penalty to refusal behavior.

arxiv情報

著者	Cornelius Emde,Alasdair Paren,Preetham Arvind,Maxime Kayser,Tom Rainforth,Thomas Lukasiewicz,Bernard Ghanem,Philip H. S. Torr,Adel Bibi
発行日	2025-02-26 17:13:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Shh, don’t say that! Domain Certification in LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー