HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

要約

優先データセットは、人間のフィードバック（RLHF）からの強化学習を備えた一般的なドメイン、指導に従う言語モデルのトレーニングに不可欠です。
その後の各データリリースは、将来のデータ収集に対する期待を高めます。つまり、公然と利用可能な選好データの品質と多様性を前進させる必要があります。
このニーズに対処するために、40,000を超えるサンプルで構成される高品質で人間が注目した選好データセットであるHelpSteer3-Preference（CC-by-4.0）、高品質の人間が注目した選好データセットを紹介します。
これらのサンプルは、STEM、コーディング、多言語シナリオに関連するタスクを含む、大規模な言語モデル（LLM）の多様な現実世界のアプリケーションに及びます。
helpSteer3-preferenceを使用して、RMベンチ（82.4％）とJudgeBench（73.7％）でトップパフォーマンスを達成する報酬モデル（RMS）をトレーニングします。
これは、既存のRMSから以前に最も報告されていた結果に対する大幅な改善（〜10％の絶対）を表しています。
また、HelpSteer3-Preferenceを適用して、生成RMSを訓練するために適用でき、RMSを使用してRLHFとポリシーモデルをどのように整合させるかを示します。
データセット（CC-BY-4.0）：https：//huggingface.co/datasets/nvidia/helpsteer3#preference

要約(オリジナル)

Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0), high-quality, human-annotated preference dataset comprising of over 40,000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs. Dataset (CC-BY-4.0): https://huggingface.co/datasets/nvidia/HelpSteer3#preference

arxiv情報

著者	Zhilin Wang,Jiaqi Zeng,Olivier Delalleau,Hoo-Chang Shin,Felipe Soares,Alexander Bukharin,Ellie Evans,Yi Dong,Oleksii Kuchaiev
発行日	2025-05-16 17:31:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー