A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

要約

大規模言語モデル (LLM) は、現実世界の多数のアプリケーションをサポートし、社会にポジティブな影響を与える大きな可能性を秘めていますが、プライバシー漏洩、幻覚出力、価値観の不一致といった固有のリスクという点で依然として大きな課題に直面しており、悪意を持って使用される可能性があります。
ジェイルブレイク後に有害なコンテンツや非倫理的な目的を生成するため。
したがって、この調査では、LLM の開発と使用の 4 つのフェーズ (データ収集と事前トレーニング、微調整と調整、プロンプトと推論、ポスト) にわたって整理された、これらの問題の軽減を目的とした最近の進歩の包括的なレビューを示します。
-処理と監査。
プライバシー保護、幻覚の軽減、価値の調整、毒性の除去、ジェイルブレイク防御の観点から LLM のパフォーマンスを強化するための最近の進歩について詳しく説明します。
責任ある LLM の 1 つの側面に焦点を当てたこれまでの調査とは対照的に、この調査では、これらの多様な側面を包含する統一フレームワークが提示され、実世界のアプリケーションをより適切に提供するために LLM を強化するための包括的なビューが提供されます。

要約(オリジナル)

While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.

arxiv情報

著者	Huandong Wang,Wenjie Fu,Yingzhou Tang,Zhilong Chen,Yuxi Huang,Jinghua Piao,Chen Gao,Fengli Xu,Tao Jiang,Yong Li
発行日	2025-01-16 09:59:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー