Crowd-SFT: Crowdsourcing for LLM Alignment

要約

大規模な言語モデル（LLMS）は、人間のフィードバック（RLHF）からの監視付き微調整（SFT）と強化学習にますます依存して、モデルの応答を人間の好みに合わせています。
RLHFは別の報酬モデルを使用して強化学習アプローチを採用していますが、SFTは監視された学習のために人間がキュレーションしたデータセットを使用します。
どちらのアプローチも伝統的に、アノテーターの小さな吟味されたグループに依存しており、コストがかかり、バイアスが発生しやすく、スケーラビリティが制限されています。
大規模なアノテータートレーニングなしでSFTのより広範なフィードバックコレクションを可能にすることにより、これらの制限に対処する、クラウドソーシングされた微調整フレームワークを提案します。
当社のフレームワークは、Shapley値と相関するポイントベースの報酬システムを介してインセンティブの公平性を促進し、モデルの収束を繰り返し介してモデルの収束をガイドします。
マルチモデル選択フレームワークは、シングルモデル選択にわたって目標距離が最大55％減少することを示しており、その後の実験を可能にし、ポイントベースの報酬メカニズムのShapley値との密接な整合（個々の貢献を帰属させるための確立された方法）を検証し、公正でスケーラブルな参加をサポートします。

要約(オリジナル)

Large Language Models (LLMs) increasingly rely on Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) to align model responses with human preferences. While RLHF employs a reinforcement learning approach with a separate reward model, SFT uses human-curated datasets for supervised learning. Both approaches traditionally depend on small, vetted groups of annotators, making them costly, prone to bias, and limited in scalability. We propose an open, crowd-sourced fine-tuning framework that addresses these limitations by enabling broader feedback collection for SFT without extensive annotator training. Our framework promotes incentive fairness via a point-based reward system correlated with Shapley values and guides model convergence through iterative model updates. Our multi-model selection framework demonstrates up to a 55% reduction in target distance over single-model selection, enabling subsequent experiments that validate our point-based reward mechanism’s close alignment with Shapley values (a well-established method for attributing individual contributions) thereby supporting fair and scalable participation.

arxiv情報

著者	Alex Sotiropoulos,Sulyab Thottungal Valapu,Linus Lei,Jared Coleman,Bhaskar Krishnamachari
発行日	2025-06-04 15:26:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Crowd-SFT: Crowdsourcing for LLM Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー