LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

要約

安全なアクセスと言語の多様性の両方を確保するには、複数の言語にわたって安全な大規模言語モデル (LLM) を構築することが不可欠です。
この目的を達成するために、英語、フランス語、ドイツ語、イタリア語、スペイン語の 5 か国語で LLM の安全性を評価する多言語ベンチマークである M-ALERT を導入します。
M-ALERT には、詳細な ALERT 分類に従って、言語ごとに 15,000 個、合計 75,000 個の高品質プロンプトが含まれています。
10 個の最先端 LLM に対する私たちの広範な実験は、言語固有の安全性分析の重要性を強調し、モデルが言語やカテゴリ間で安全性に重大な矛盾を示すことが多いことを明らかにしました。
たとえば、Llama3.2 は、イタリア語のカテゴリ crime_tax で高い危険性を示していますが、他の言語では依然として安全です。
同様の違いがすべてのモデルで観察されます。
対照的に、substance_cannabis や crime_propaganda などの特定のカテゴリは、モデルや言語を超えて一貫して危険な応答を引き起こします。
これらの調査結果は、多様なユーザーコミュニティ全体で安全かつ責任ある使用を保証するために、LLM における堅牢な多言語安全慣行の必要性を強調しています。

要約(オリジナル)

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

arxiv情報

著者	Felix Friedrich,Simone Tedeschi,Patrick Schramowski,Manuel Brack,Roberto Navigli,Huu Nguyen,Bo Li,Kristian Kersting
発行日	2024-12-19 16:46:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー