NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models

要約

グローバルなユーザー集団に効果的かつ安全に展開するには、大規模な言語モデル（LLM）がユーザーの値や文化に出力を適応させる必要がある場合があります。
LLMSの文化的適応性を評価する評価フレームワークであるノーマッドを紹介し、抽象的な価値から明示的な社会的規範まで、さまざまなレベルの文化的規範の特異性を超えて社会的受容性を判断する能力を測定します。
私たちのフレームワークのインスタンス化として、私たちは、75か国からの社会的エチケットに関連する文化的規範を表す2.6kの状況的説明のベンチマークであるノーマッド・エティを作成します。
ノルマド-ETIに関する包括的な実験を通じて、LLMSは、これらのさまざまな程度の文化的文脈にわたって社会的受容性を正確に判断し、グローバルな南部の文化よりも英語中心の文化に対するより強い適応性を示すのに苦労していることがわかります。
関連する社会的規範が提供される最も単純な設定でさえ、最高のLLMSのパフォーマンス（<82 \％）が人間（> 95 \％）に遅れています。
抽象的な値と国情報を持つ設定では、モデルのパフォーマンスは大幅に低下します（<60 \％）が、人間の精度は高いままです（> 90 \％）。
さらに、モデルは、社会的に受け入れられると受け入れられない状況を認識するのに優れていることがわかります。
私たちの調査結果は、LLMSの社会文化的推論における現在の落とし穴を示しており、それが世界の視聴者への適応性を妨げています。

要約(オリジナル)

To be effectively and safely deployed to global user populations, large language models (LLMs) may need to adapt outputs to user values and cultures, not just know about them. We introduce NormAd, an evaluation framework to assess LLMs’ cultural adaptability, specifically measuring their ability to judge social acceptability across varying levels of cultural norm specificity, from abstract values to explicit social norms. As an instantiation of our framework, we create NormAd-Eti, a benchmark of 2.6k situational descriptions representing social-etiquette related cultural norms from 75 countries. Through comprehensive experiments on NormAd-Eti, we find that LLMs struggle to accurately judge social acceptability across these varying degrees of cultural contexts and show stronger adaptability to English-centric cultures over those from the Global South. Even in the simplest setting where the relevant social norms are provided, the best LLMs’ performance (< 82\%) lags behind humans (> 95\%). In settings with abstract values and country information, model performance drops substantially (< 60\%), while human accuracy remains high (> 90\%). Furthermore, we find that models are better at recognizing socially acceptable versus unacceptable situations. Our findings showcase the current pitfalls in socio-cultural reasoning of LLMs which hinder their adaptability for global audiences.

arxiv情報

著者	Abhinav Rao,Akhila Yerukola,Vishwa Shah,Katharina Reinecke,Maarten Sap
発行日	2025-02-24 15:50:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー