When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text

要約

AIに生成されたテキストを検出することは、そもそも難しい問題です。
ソーシャルメディアでAIに生成されたテキストの検出は、インターネットのテキストの長さと非公式の特異な言語のために、さらに困難になります。
それにもかかわらず、ソーシャルメディアはオンライン影響キャンペーンの重要な攻撃ベクトルを表しているため、この問題に取り組むことが重要です。これは、特定のポリシー、決定、またはイベントをサポートする（または反対）大量生成されたAIで生成された投稿を使用することで強化される可能性があります。
合理的に洗練された脅威アクターの考え方とリソースでこの問題にアプローチし、オープンソース、クローズドソース、および微調整されたLLMSの組み合わせから505,159のAI生成されたソーシャルメディア投稿のデータセットを作成し、11の異なる物議を醸すトピックをカバーしています。
攻撃者が微調整されたモデルを一般にリリースしないというより現実的な仮定の下で、生成モデルの知識とアクセスに関する典型的な研究の仮定の下で投稿を検出できるが、検出可能性は劇的に低下することを示します。
この結果は、人間の研究で確認されています。
アブレーション実験では、さまざまな検出アルゴリズムの微細チューニングLLMSの脆弱性を強調しています。
微調整は一般に適用可能で現実的なLLMユースケースであるため、この結果はすべての検出ドメインに影響を及ぼします。

要約(オリジナル)

Detecting AI-generated text is a difficult problem to begin with; detecting AI-generated text on social media is made even more difficult due to the short text length and informal, idiosyncratic language of the internet. It is nonetheless important to tackle this problem, as social media represents a significant attack vector in online influence campaigns, which may be bolstered through the use of mass-produced AI-generated posts supporting (or opposing) particular policies, decisions, or events. We approach this problem with the mindset and resources of a reasonably sophisticated threat actor, and create a dataset of 505,159 AI-generated social media posts from a combination of open-source, closed-source, and fine-tuned LLMs, covering 11 different controversial topics. We show that while the posts can be detected under typical research assumptions about knowledge of and access to the generating models, under the more realistic assumption that an attacker will not release their fine-tuned model to the public, detectability drops dramatically. This result is confirmed with a human study. Ablation experiments highlight the vulnerability of various detection algorithms to fine-tuned LLMs. This result has implications across all detection domains, since fine-tuning is a generally applicable and realistic LLM use case.

arxiv情報

著者	Hillary Dawkins,Kathleen C. Fraser,Svetlana Kiritchenko
発行日	2025-06-11 17:51:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー