Large Language Model Safety: A Holistic Survey

要約

大規模言語モデル (LLM) の急速な開発と展開により、自然言語の理解と生成における前例のない機能を特徴とする人工知能の新境地がもたらされました。
ただし、これらのモデルの重要なアプリケーションへの統合が進むにつれて、安全性に関する重大な懸念が生じ、潜在的なリスクと関連する緩和戦略を徹底的に検討する必要があります。
この調査は、価値の不一致、敵対的攻撃に対する堅牢性、誤用、および自律型 AI リスクの 4 つの主要カテゴリをカバーする、LLM の安全性の現在の状況の包括的な概要を提供します。
これら 4 つの側面に関する緩和方法と評価リソースの包括的なレビューに加えて、LLM の安全性に関連する 4 つのトピックをさらに調査します。LLM エージェントの安全性への影響、LLM の安全性を高める際の解釈可能性の役割、提案され遵守された技術ロードマップです。
LLM の安全性に関する AI 企業および研究所のリスト、および国際協力、政策提案、および将来の規制の方向性についての議論を伴う、LLM の安全性を目的とした AI ガバナンスのリストによる。
私たちの調査結果は、LLM の安全性に対する積極的かつ多面的なアプローチの必要性を強調し、技術的ソリューション、倫理的配慮、堅牢なガバナンスフレームワークの統合を強調しています。
この調査は、学会の研究者、業界関係者、政策立案者にとって基礎的なリソースとして機能し、LLM の社会への安全な統合に関連する課題と機会についての洞察を提供することを目的としています。
最終的には、社会の進歩と福祉のために AI を活用するという包括的な目標に沿って、LLM の安全で有益な開発に貢献することを目指しています。
関連論文の厳選されたリストは、https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers で公開されています。

要約(オリジナル)

The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence, marked by unprecedented capabilities in natural language understanding and generation. However, the increasing integration of these models into critical applications raises substantial safety concerns, necessitating a thorough examination of their potential risks and associated mitigation strategies. This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks. In addition to the comprehensive review of the mitigation methodologies and evaluation resources on these four aspects, we further explore four topics related to LLM safety: the safety implications of LLM agents, the role of interpretability in enhancing LLM safety, the technology roadmaps proposed and abided by a list of AI companies and institutes for LLM safety, and AI governance aimed at LLM safety with discussions on international cooperation, policy proposals, and prospective regulatory directions. Our findings underscore the necessity for a proactive, multifaceted approach to LLM safety, emphasizing the integration of technical solutions, ethical considerations, and robust governance frameworks. This survey is intended to serve as a foundational resource for academy researchers, industry practitioners, and policymakers, offering insights into the challenges and opportunities associated with the safe integration of LLMs into society. Ultimately, it seeks to contribute to the safe and beneficial development of LLMs, aligning with the overarching goal of harnessing AI for societal advancement and well-being. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers.

arxiv情報

著者	Dan Shi,Tianhao Shen,Yufei Huang,Zhigen Li,Yongqi Leng,Renren Jin,Chuang Liu,Xinwei Wu,Zishan Guo,Linhao Yu,Ling Shi,Bojian Jiang,Deyi Xiong
発行日	2024-12-23 16:11:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Model Safety: A Holistic Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー