Robust AI-Generated Text Detection by Restricted Embeddings

要約

AI によって生成されたテキストの量と質が増大することで、そのようなコンテンツの検出がより困難になっています。
現実世界のほとんどのシナリオでは、生成されるデータのドメイン (スタイルとトピック) とジェネレーターモデルは事前にはわかりません。
この研究では、AI が生成したテキストの分類子ベースの検出器の堅牢性、つまり、目に見えないジェネレーターまたは意味論的ドメインに転送する能力に焦点を当てます。
私たちは、Transformer ベースのテキストエンコーダの埋め込み空間のジオメトリを調査し、有害な線形部分空間を除去することが、ドメイン固有の偽の特徴を無視して堅牢な分類器をトレーニングするのに役立つことを示します。
私たちは、いくつかの部分空間分解と特徴選択戦略を調査し、クロスドメインおよびクロスジェネレーター転送における最先端の方法に比べて大幅な改善を達成しました。
ヘッドワイズおよび座標ベースの部分空間除去に対する当社の最良のアプローチは、RoBERTa 埋め込みと BERT 埋め込みの特定の設定で平均分布外 (OOD) 分類スコアをそれぞれ最大 9% と 14% 増加させます。
コードとデータをリリースします: https://github.com/SilverSolver/RobustATD

要約(オリジナル)

Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD

arxiv情報

著者	Kristian Kuznetsov,Eduard Tulchinskii,Laida Kushnareva,German Magai,Serguei Barannikov,Sergey Nikolenko,Irina Piontkovskaya
発行日	2024-10-10 16:58:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Robust AI-Generated Text Detection by Restricted Embeddings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー