Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification


EU WBD などの法的措置は、その範囲と有効性が限られています。
典型的なリスクの高い単語 (人名やその他の NE ラベルなど) とその組み合わせをプレースホルダーに置き換えることによって、識別リスクを軽減することを目的としています。
次に、言い換え用に微調整した LLM を使用して、このテキストを一貫性があり、スタイルに依存しないものにします。
ECHR の裁判例と実際の内部告発者の証言からの抜粋を使用してツールの有効性を評価し、人気のある IMDb62 映画レビュー データセットを使用して、作者帰属 (AA) 攻撃と効用損失に対する保護を統計的に測定します。
私たちの方法では、元のコンテンツのセマンティクスを最大 73.1% 維持しながら、AA の精度を 98.81% から 31.22% に大幅に低下させることができます。


Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool’s effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content’s semantics.


著者 Dimitri Staufer,Frank Pallas,Bettina Berendt
発行日 2024-05-02 08:52:29+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.CL, cs.CY, cs.HC, cs.IR, cs.SE, H.3 パーマリンク