AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

要約

AI エージェントを人間の意図や価値観に合わせることは、安全で展開可能な AI アプリケーションを構築する上での重要なボトルネックです。
しかし、AI エージェントは誰の価値観に合わせるべきでしょうか?
ヒューマンフィードバックを伴う強化学習 (RLHF) は、AI 調整のための重要なフレームワークとして浮上しています。
RLHF は人間の強化者からのフィードバックを使用して出力を微調整します。
広く導入されているすべての大規模言語モデル (LLM) は、RLHF を使用して出力を人間の価値観に合わせています。
RLHF の限界を理解し、これらの限界から生じる政策上の課題を考慮することが重要です。
この論文では、民主的規範を尊重する RLHF システムを構築する際の具体的な課題を調査します。
社会的選択理論における不可能な結果に基づいて、かなり広範な仮定の下で、民主的なプロセスを通じて RLHF を使用して AI システムを普遍的に調整する独自の投票プロトコルは存在しないことを示します。
さらに、AI エージェントをすべての個人の価値観に合わせると、常に個々のユーザーの特定の個人的な倫理的好みに違反することになります。つまり、RLHF を使用した普遍的な AI 調整は不可能であることを示します。
RLHF を使用して構築された AI システムのガバナンスに対する政策の影響について議論します。まず、モデル構築者に責任を負わせるための透明な投票ルールを義務付ける必要性について説明します。
2 つ目は、モデル構築者が特定のユーザーグループに厳密に合わせた AI エージェントの開発に集中する必要があることです。

要約(オリジナル)

Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups.

arxiv情報

著者	Abhilash Mishra
発行日	2023-10-24 17:59:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー