LLM Echo Chamber: personalized and automated disinformation

要約

最近の進歩により、要約、翻訳、コンテンツレビューなどのタスクにおける GPT4 や Llama2 などの大規模言語モデルの機能が実証されました。
しかし、その広範な使用は、特に LLM が説得力のある人間らしい誤った情報を大規模に拡散し、世論に大きな影響を与える可能性について懸念を引き起こしています。
この研究では、誤った情報を事実として広めるLLMの能力に焦点を当てて、これらのリスクを調査します。
これを調査するために、私たちは誤った情報が拡散することが多いソーシャルメディアのチャットルームをシミュレートする管理されたデジタル環境である LLM エコーチェンバーを構築しました。
個人が同じ考えを持つ人々とのみ交流するエコーチェンバーでは、信念がさらに固定化されます。
この環境で誤った情報を広める悪意のあるボットを研究することで、この現象をより深く理解できるようになります。
現在の LLM をレビューし、誤った情報のリスクを調査し、SOTA の微調整テクニックを適用しました。
カスタムデータセットで微調整された Microsoft phi2 モデルを使用して、有害なコンテンツを生成してエコーチェンバーを作成しました。
GPT4 によって説得力と有害性が評価されたこの設定は、LLM を取り巻く倫理的懸念に光を当て、誤った情報に対するより強力な保護手段の必要性を強調しています。

要約(オリジナル)

Recent advancements have showcased the capabilities of Large Language Models like GPT4 and Llama2 in tasks such as summarization, translation, and content review. However, their widespread use raises concerns, particularly around the potential for LLMs to spread persuasive, humanlike misinformation at scale, which could significantly influence public opinion. This study examines these risks, focusing on LLMs ability to propagate misinformation as factual. To investigate this, we built the LLM Echo Chamber, a controlled digital environment simulating social media chatrooms, where misinformation often spreads. Echo chambers, where individuals only interact with like minded people, further entrench beliefs. By studying malicious bots spreading misinformation in this environment, we can better understand this phenomenon. We reviewed current LLMs, explored misinformation risks, and applied sota finetuning techniques. Using Microsoft phi2 model, finetuned with our custom dataset, we generated harmful content to create the Echo Chamber. This setup, evaluated by GPT4 for persuasiveness and harmfulness, sheds light on the ethical concerns surrounding LLMs and emphasizes the need for stronger safeguards against misinformation.

arxiv情報

著者	Tony Ma
発行日	2024-09-24 17:04:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLM Echo Chamber: personalized and automated disinformation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー