Mark My Words: Analyzing and Evaluating Language Model Watermarks

要約

大規模な言語モデルの機能は近年大幅に向上しており、その誤用に対する懸念も高まっています。
この文脈では、機械が生成したテキストと人間が作成したコンテンツを区別する能力が重要になります。
これまでの研究では、テキストに透かしを入れる多数のスキームが提案されており、体系的な評価フレームワークの恩恵を受けることができます。
この研究では、画像の透かしではなくテキストの透かし技術に焦点を当て、さまざまなタスクや実際の攻撃におけるそれらの包括的なベンチマークである MARKMYWORDS を提案しています。
私たちは、品質、サイズ (透かしの検出に必要なトークンの数など)、および耐改ざん性という 3 つの主要な指標に焦点を当てています。
現在の透かし技術は導入するには十分です。Kirchenbauer et al.
[1] は、目に見える品質の低下なしに Llama2-7B チャットに透かしを入れることができ、透かしは 100 個未満のトークンで検出でき、このスキームは単純な攻撃に対して優れた耐改ざん性を提供します。
我々は、いくつかの先行研究で強調されていた基準である透かしの識別不能性は要件が強すぎると主張する。つまり、ロジット分布をわずかに変更するスキームは、生成品質に目立った損失を与えることなく、識別できない対応するスキームよりも優れたパフォーマンスを発揮する。
ベンチマークを一般公開します (https://github.com/wagner-group/MarkMyWords)

要約(オリジナル)

The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse. In this context, the ability to distinguish machine-generated text from human-authored content becomes important. Prior works have proposed numerous schemes to watermark text, which would benefit from a systematic evaluation framework. This work focuses on text watermarking techniques – as opposed to image watermarks – and proposes MARKMYWORDS, a comprehensive benchmark for them under different tasks as well as practical attacks. We focus on three main metrics: quality, size (e.g. the number of tokens needed to detect a watermark), and tamper-resistance. Current watermarking techniques are good enough to be deployed: Kirchenbauer et al. [1] can watermark Llama2-7B-chat with no perceivable loss in quality, the watermark can be detected with fewer than 100 tokens, and the scheme offers good tamper-resistance to simple attacks. We argue that watermark indistinguishability, a criteria emphasized in some prior works, is too strong a requirement: schemes that slightly modify logit distributions outperform their indistinguishable counterparts with no noticeable loss in generation quality. We publicly release our benchmark (https://github.com/wagner-group/MarkMyWords)

arxiv情報

著者	Julien Piet,Chawin Sitawarin,Vivian Fang,Norman Mu,David Wagner
発行日	2023-12-07 04:37:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mark My Words: Analyzing and Evaluating Language Model Watermarks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー