Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

要約

フェイクニュースは、現代社会における世論と社会の安定に重大な脅威をもたらします。
この研究では、フェイクニュース検出のための BERT のようなエンコーダーのみのモデルと自己回帰デコーダーのみの大規模言語モデル (LLM) の比較評価を示します。
GPT-4 支援 (AI ラベル付け手法) でラベル付けされ、信頼性を確保するために人間の専門家によって検証されたニュース記事のデータセットを紹介します。
BERT のようなエンコーダー専用モデルと LLM は両方とも、このデータセットで微調整されました。
さらに、ラベル生成の推論中に多数決を使用する命令調整型 LLM アプローチを開発しました。
私たちの分析により、BERT のようなモデルは一般的に分類タスクにおいて LLM よりも優れたパフォーマンスを発揮する一方、LLM はテキストの摂動に対して優れた堅牢性を示していることが明らかになりました。
この結果は、弱いラベル (遠隔監視) データと比較して、人間による監視を伴う AI ラベルの方がより優れた分類結果を達成できることを示しています。
この研究では、AI ベースのアノテーションと人間の監視を組み合わせる有効性を強調し、フェイクニュース検出のための機械学習モデルのさまざまなファミリーのパフォーマンスを実証しています。

要約(オリジナル)

Fake news poses a significant threat to public opinion and social stability in modern society. This study presents a comparative evaluation of BERT-like encoder-only models and autoregressive decoder-only large language models (LLMs) for fake news detection. We introduce a dataset of news articles labeled with GPT-4 assistance (an AI-labeling method) and verified by human experts to ensure reliability. Both BERT-like encoder-only models and LLMs were fine-tuned on this dataset. Additionally, we developed an instruction-tuned LLM approach with majority voting during inference for label generation. Our analysis reveals that BERT-like models generally outperform LLMs in classification tasks, while LLMs demonstrate superior robustness against text perturbations. Compared to weak labels (distant supervision) data, the results show that AI labels with human supervision achieve better classification results. This study highlights the effectiveness of combining AI-based annotation with human oversight and demonstrates the performance of different families of machine learning models for fake news detection

arxiv情報

著者	Shaina Raza,Drai Paulen-Patterson,Chen Ding
発行日	2024-12-20 12:45:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー