She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

要約

社会内で大規模言語モデル (LLM) の使用が増加するにつれて、その誤用のリスクも増加します。
LLM の出力が社会の倫理基準を確実に遵守するには、適切な保護措置を講じる必要があり、人工知能テクノロジーが持つ可能性のある前向きな役割が強調されます。
最近の出来事は、従来の方法でトレーニングされた LLM に関する倫理的懸念を示しており、全体的に安全でないユーザーエクスペリエンスにつながっています。
これが私たちの研究課題の動機となっています。LLM の整合性を確保するにはどうすればよいでしょうか?
この作業では、公正、安全、堅牢な調整された LLM の開発を促進するための、独自のプロンプトのテストスイートを導入します。
データキュレーション、事前トレーニング、微調整など、開発パイプラインのあらゆる段階で LLM を促すことで、全体的により信頼性の高いモデルが得られることを示します。
当社のテストスイートは、GPT-3.5、GPT-4、OPT、LLaMA-2 の 4 つの最先端の言語モデルからの出力を評価します。
この文書で提示された評価は、社会の連携と現在の LLM の能力との間のギャップを浮き彫りにしています。
さらに、私たちのようなテストスイートを実装すると、モデルを安全かつ公平にするための環境上のオーバーヘッドが削減されます。

要約(オリジナル)

As the use of large language models (LLMs) increases within society, as does the risk of their misuse. Appropriate safeguards must be in place to ensure LLM outputs uphold the ethical standards of society, highlighting the positive role that artificial intelligence technologies can have. Recent events indicate ethical concerns around conventionally trained LLMs, leading to overall unsafe user experiences. This motivates our research question: how do we ensure LLM alignment? In this work, we introduce a test suite of unique prompts to foster the development of aligned LLMs that are fair, safe, and robust. We show that prompting LLMs at every step of the development pipeline, including data curation, pre-training, and fine-tuning, will result in an overall more responsible model. Our test suite evaluates outputs from four state-of-the-art language models: GPT-3.5, GPT-4, OPT, and LLaMA-2. The assessment presented in this paper highlights a gap between societal alignment and the capabilities of current LLMs. Additionally, implementing a test suite such as ours lowers the environmental overhead of making models safe and fair.

arxiv情報

著者	Veronica Chatrath,Oluwanifemi Bamgbose,Shaina Raza
発行日	2023-11-24 18:58:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー