R3: Robust Rubric-Agnostic Reward Models

要約

報酬モデルは、言語モデルの出力を人間の好みに合わせるために不可欠ですが、既存のアプローチには、制御可能性と解釈可能性の両方に欠けていることがよくあります。
これらのモデルは通常、狭い目標のために最適化されており、一般化可能性をより広範なダウンストリームタスクに制限します。
さらに、それらのスカラー出力は、文脈上の推論なしに解釈することが困難です。
これらの制限に対処するために、R3を紹介します。R3は、評価のディメンション全体で一般化可能で、解釈可能な合理的なスコア割り当てを提供するルーブリックに依存しない新しい報酬モデリングフレームワークを紹介します。
R3は、言語モデルのより透明で柔軟な評価を可能にし、多様な人間の価値とユースケースとの堅牢な整合をサポートします。
私たちのモデル、データ、およびコードは、https：//github.com/rubricreward/r3でオープンソースとして入手できます

要約(オリジナル)

Reward models are essential for aligning language model outputs with human preferences, yet existing approaches often lack both controllability and interpretability. These models are typically optimized for narrow objectives, limiting their generalizability to broader downstream tasks. Moreover, their scalar outputs are difficult to interpret without contextual reasoning. To address these limitations, we introduce R3, a novel reward modeling framework that is rubric-agnostic, generalizable across evaluation dimensions, and provides interpretable, reasoned score assignments. R3 enables more transparent and flexible evaluation of language models, supporting robust alignment with diverse human values and use cases. Our models, data, and code are available as open source at https://github.com/rubricreward/r3

arxiv情報

著者	David Anugraha,Zilu Tang,Lester James V. Miranda,Hanyang Zhao,Mohammad Rifqi Farhansyah,Garry Kuwanto,Derry Wijaya,Genta Indra Winata
発行日	2025-05-19 17:29:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

R3: Robust Rubric-Agnostic Reward Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー