Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

要約

優先順位ベースのフィードバックを備えたBayesian Optimization（BO）は、最近、新たなアプリケーションのために大きな注目を集めています。
この問題は、人間のフィードバック（BOHF）からのベイジアンの最適化と呼ばれます。これは、削減されたフィードバックモデルから最良のアクションを学習することで従来のBOとは異なります。ここでは、2つのアクション間の優先順位のみが各時間ステップで学習者に明らかになります。
目的は、通常、費用のかかる人間のフィードバックを通じて得られる限られた数の優先クエリを使用して、最良のアクションを特定することです。
Bradley-Terry-Luce（BTL）フィードバックモデルを採用する既存の作業は、いくつかのアルゴリズムのパフォーマンスの後悔の範囲を提供します。
この作業では、同じフレームワーク内で、パフォーマンス保証をより強く開発します。
具体的には、$ \ tilde {\ mathcal {o}}（\ sqrt {\ gamma（t）t}）$の後悔の境界線を導き出します。ここで、$ \ gamma（t）$は最大情報ゲイン$ \ unicode {x2014}
クエリ。
私たちの結果は、既存の境界を大幅に改善します。
特に、一般的なカーネルの場合、より豊富なフィードバックモデルで達成された従来のBO $ \ Unicode {x2014} $の注文最適なサンプルの複雑さは$ \ unicode {x2014} $が回収されることを示します。
言い換えれば、スカラー値サンプルと同じ数の優先サンプルでは、ほぼ最適なソリューションを見つけるのに十分です。

要約(オリジナル)

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain$\unicode{x2014}$a kernel-specific complexity term$\unicode{x2014}$and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO$\unicode{x2014}$achieved with richer feedback models$\unicode{x2014}$are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.

arxiv情報

著者	Aya Kayal,Sattar Vakili,Laura Toni,Da-shan Shiu,Alberto Bernacchia
発行日	2025-05-29 17:17:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー