Aligning Black-box Language Models with Human Judgments

要約

大規模な言語モデル（LLM）は、推奨システム、検索エンジン、およびその他の主観的なタスクを評価するための自動審査員としてますます使用されています。人間の評価者に依存することは、費用がかかり、時間がかかり、無視できません。
LLMSは、継続的で自動化された評価のための効率的なソリューションを提供します。
ただし、これらの判断で構築および改善されたシステムは、最終的に人間の使用のために設計されているため、LLMの判断が人間の評価者と密接に整合して、そのようなシステムが人間中心のままであることを保証することが重要です。
一方、人間の評価者とLLM判断を合わせると、人間の判断のバイアスとバイアスがあるため、挑戦的です。
LLMを再訓練または微調整することなく、LLMの判断を個々の人間の評価者またはそれらの集計された判断と整列させるためのシンプルでありながら効果的なフレームワークを提案します。
私たちのアプローチは、LLMの出力と人間の判断の間の線形マッピングを学習し、29のタスクで142％以上の平均改善を達成し、トレーニングに使用される少数のキャリブレーションの例のみがあります。
特に、この方法はゼロショットおよび少数のショット設定で機能し、6つのタスクのうち4つのタスクの人間間契約を超え、より小さなLLMがより大きなモデルのパフォーマンスに匹敵するパフォーマンスを実現できるようにします。

要約(オリジナル)

Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks, where relying on human evaluators can be costly, time-consuming, and unscalable. LLMs offer an efficient solution for continuous, automated evaluation. However, since the systems that are built and improved with these judgments are ultimately designed for human use, it is crucial that LLM judgments align closely with human evaluators to ensure such systems remain human-centered. On the other hand, aligning LLM judgments with human evaluators is challenging due to individual variability and biases in human judgments. We propose a simple yet effective framework to align LLM judgments with individual human evaluators or their aggregated judgments, without retraining or fine-tuning the LLM. Our approach learns a linear mapping between the LLM’s outputs and human judgments, achieving over 142% average improvement in agreement across 29 tasks with only a small number of calibration examples used for training. Notably, our method works in zero-shot and few-shot settings, exceeds inter-human agreement on four out of six tasks, and enables smaller LLMs to achieve performance comparable to that of larger models.

arxiv情報

著者	Gerrit J. J. van den Burg,Gen Suzuki,Wei Liu,Murat Sensoy
発行日	2025-02-07 15:19:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Aligning Black-box Language Models with Human Judgments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー