A Watermark for Large Language Models

要約

大規模な言語モデルの潜在的な害は、モデル出力に透かしを入れることによって軽減できます。つまり、人間には見えないが、短いトークンスパンからアルゴリズムで検出可能な、生成されたテキストに信号を埋め込むことです。
私たちは独自の言語モデルのための透かしフレームワークを提案します。
ウォーターマークは、テキストの品質にほとんど影響を与えずに埋め込むことができ、言語モデル API やパラメーターにアクセスせずに、効率的なオープンソースアルゴリズムを使用して検出できます。
ウォーターマークは、単語が生成される前にランダム化された一連の「緑色」トークンを選択し、サンプリング中に緑色のトークンの使用を穏やかに促進することによって機能します。
解釈可能な p 値で透かしを検出するための統計的検定を提案し、透かしの感度を分析するための情報理論的枠組みを導き出します。
Open Pretrained Transformer (OPT) ファミリの数十億パラメータモデルを使用してウォーターマークをテストし、堅牢性とセキュリティについて説明します。

要約(オリジナル)

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of ‘green’ tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

arxiv情報

著者	John Kirchenbauer,Jonas Geiping,Yuxin Wen,Jonathan Katz,Ian Miers,Tom Goldstein
発行日	2023-06-06 17:50:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Watermark for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー