Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

要約

人間のフィードバックからの学習は、大規模言語モデル (LLM) などの生成モデルを調整する際に重要な役割を果たします。
ただし、このアプローチの有効性は、望ましくないまたは有害な方向に出力を操作するために意図的に誤解を招く設定を提供する敵対者の影響を受ける可能性があります。
この課題に取り組むために、私たちはこの問題領域内の特定のモデル、つまり敵対的フィードバックを伴うコンテキスト決闘バンディットを研究します。このモデルでは、真の優先ラベルが敵対者によって反転される可能性があります。
我々は、不確実性を重み付けした最尤推定に基づいた、ロバストなコンテキストデュエルバンディット (\algo) というアルゴリズムを提案します。
私たちのアルゴリズムは $\tilde O(d\sqrt{T}+dC)$ リグレット限界を達成します。ここで、$T$ はラウンド数、$d$ はコンテキストの次元、$ 0 \le C \le
T$ は敵対的フィードバックの総数です。
また、敵対的フィードバックがあるシナリオとないシナリオ ($C=0$) の両方で、後悔限界がほぼ最適であることを示す下限も証明します。
さらに、さまざまな種類の敵対的フィードバックに対して提案したアルゴリズムを評価するための実験を実施します。
実験結果は、敵対的フィードバックの存在下で最先端の決闘バンディットアルゴリズムよりも優れていることを示しています。

要約(オリジナル)

Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preferences to manipulate the output in an undesirable or harmful direction. To tackle this challenge, we study a specific model within this problem domain–contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (\algo), which is based on uncertainty-weighted maximum likelihood estimation. Our algorithm achieves an $\tilde O(d\sqrt{T}+dC)$ regret bound, where $T$ is the number of rounds, $d$ is the dimension of the context, and $ 0 \le C \le T$ is the total number of adversarial feedback. We also prove a lower bound to show that our regret bound is nearly optimal, both in scenarios with and without ($C=0$) adversarial feedback. Additionally, we conduct experiments to evaluate our proposed algorithm against various types of adversarial feedback. Experimental results demonstrate its superiority over the state-of-the-art dueling bandit algorithms in the presence of adversarial feedback.

arxiv情報

著者	Qiwei Di,Jiafan He,Quanquan Gu
発行日	2024-04-16 17:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー