Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

要約

Lipschitz Banditsは、$ [0,1] $間隔などの大きく構造化されたアクション空間を研究するマルチアームの盗賊の顕著なバージョンであり、同様のアクションが同様の報酬を持つことが保証されています。
ここでの中心的なテーマは、アクション空間の適応的な離散化であり、そのより有望な領域で徐々に「ズームイン」します。
目標は、「より良い」問題のインスタンスを利用しながら、最適に近い最悪のパフォーマンスを保持することです。
問題の確率的バージョンはよく理解されていますが、敵対的な報酬を持つ一般的なバージョンはそうではありません。
敵対的なバージョンで適応的な離散化のために、最初のアルゴリズム（\ emponsial Zooming}）を提供し、インスタンス依存の後悔の境界を導き出します。
特に、敵対バージョンに縛られた最悪の最適な後悔と、確率的バージョンに拘束されたインスタンス依存の後悔を回復します。
アルゴリズムを、動的価格設定やオークションリザーブチューニングなど、すべて敵対的な報酬モデルの下にあるいくつかの基本的なアプリケーションに適用します。
これらのドメインはしばしばLipschitznessに違反しますが、分析にはその弱いバージョンのみが必要であり、追加の滑らかさの仮定なしで意味のある後悔の境界を可能にします。
特に、結果は、片側のリプシッツネスを満たさない設定である非滑らかな報酬構造を使用して、マルチプロダクトの動的価格設定に拡張します。

要約(オリジナル)

Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the $[0,1]$ interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually “zooms in” on the more promising regions thereof. The goal is to take advantage of “nicer” problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarial rewards is not. We provide the first algorithm (\emph{Adversarial Zooming}) for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. We apply our algorithm to several fundamental applications — including dynamic pricing and auction reserve tuning — all under adversarial reward models. While these domains often violate Lipschitzness, our analysis only requires a weaker version thereof, allowing for meaningful regret bounds without additional smoothness assumptions. Notably, we extend our results to multi-product dynamic pricing with non-smooth reward structures, a setting which does not even satisfy one-sided Lipschitzness.

arxiv情報

著者	Chara Podimata,Aleksandrs Slivkins
発行日	2025-06-12 17:48:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー