DE-COP: Detecting Copyrighted Content in Language Models Training Data

要約

トレーニングデータは通常非公開であることを考慮すると、著作権で保護されたコンテンツが言語モデルのトレーニングプロセスで使用されたかどうかをどのように検出できるでしょうか?
私たちは、言語モデルがトレーニングテキストからの逐語的な抜粋を識別する可能性が高いという前提によって動機づけられています。
私たちは、著作権で保護されたコンテンツがトレーニングに含まれているかどうかを判断する方法である DE-COP を提案します。
DE-COP の中心的なアプローチは、多肢選択式の質問で LLM を調査することであり、そのオプションには逐語的なテキストとその言い換えの両方が含まれます。
モデルのトレーニングカットオフの前後に出版された 165 冊の書籍からの抜粋とその言い換えを含むベンチマークである BookTection を構築します。
私たちの実験によると、DE-COP は、ロジットが利用可能なモデルでの検出パフォーマンス (AUC) において、以前の最良の方法を 9.6% 上回っています。
さらに、DE-COP は、完全なブラックボックスモデルで疑わしい書籍を検出する平均精度 72% も達成していますが、従来の方法では精度が約 4% でした。
コードとデータセットは https://github.com/avduarte333/DE-COP_Method で入手できます。

要約(オリジナル)

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP’s core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model’s training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give $\approx$ 4% accuracy. Our code and datasets are available at https://github.com/avduarte333/DE-COP_Method

arxiv情報

著者	André V. Duarte,Xuandong Zhao,Arlindo L. Oliveira,Lei Li
発行日	2024-02-15 12:17:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DE-COP: Detecting Copyrighted Content in Language Models Training Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー