Evaluating the Prompt Steerability of Large Language Models

要約

多元的 AI を構築するには、幅広い価値体系や文化を表現できるモデルを設計する必要があります。
これを達成するには、まず、特定のモデルがさまざまなペルソナをどの程度反映できるかを評価できる必要があります。
この目的を達成するために、モデルペルソナの操作性をプロンプトの関数として評価するためのベンチマークを提案します。
私たちの設計は、プロンプトステアビリティの正式な定義に基づいており、モデルの関節動作分布がそのベースライン動作からどの程度シフトできるかを分析します。
ステアビリティ指数を定義し、これらのインデックスがステアリング力の関数としてどのように変化するかを検査することで、さまざまなペルソナの次元と方向にわたってモデルのステアビリティを推定できます。
私たちのベンチマークでは、多くの現在のモデルの操縦性が制限されていることが明らかになりました。これは、ベースライン動作の偏りと、多くのペルソナ次元にわたる操縦性の非対称性の両方が原因です。
ベンチマークの実装は https://github.com/IBM/prompt-steering でリリースされます。

要約(オリジナル)

Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model’s joint behavioral distribution can be shifted from its baseline behavior. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited — due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark at https://github.com/IBM/prompt-steering.

arxiv情報

著者	Erik Miehling,Michael Desmond,Karthikeyan Natesan Ramamurthy,Elizabeth M. Daly,Pierre Dognin,Jesus Rios,Djallel Bouneffouf,Miao Liu
発行日	2024-11-19 10:41:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating the Prompt Steerability of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー