Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

要約

プライバシーポリシーテキスト内のコンテキスト整合性 (CI) と管理ナレッジコモンズ (GKC) パラメーターを特定すると、規範的なプライバシー分析が容易になります。
ただし、GKC-CI アノテーションにはこれまで手動またはクラウドソーシングによる作業が必要でした。
この論文は、大規模な言語モデルを使用して、プライバシーポリシーの高精度 GKC-CI パラメータアノテーションを自動的に実行できることを実証します。
16 のグラウンドトゥルースプライバシーポリシーからの 21,588 の GKC-CI アノテーションに基づいて、50 のオープンソースおよび独自のモデルを微調整します。
最もパフォーマンスの高いモデルの精度は 90.65% で、これは同じタスクの専門家の精度に匹敵します。
私たちは、さまざまなオンラインサービスの 456 個のプライバシーポリシーに最高のパフォーマンスのモデルを適用し、プライバシーポリシーの調査と分析における GKC-CI アノテーションのスケーリングの有効性を実証しています。
私たちは、将来の GKC-CI 研究のために、モデルトレーニングコード、トレーニングおよびテストデータ、アノテーションビジュアライザー、およびすべてのアノテーション付きポリシーを一般公開します。

要約(オリジナル)

Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 GKC-CI annotations from 16 ground truth privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.

arxiv情報

著者	Jake Chanenson,Madison Pickering,Noah Apthorpe
発行日	2024-08-23 03:28:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー