Robert D. Kleinberg and Frank Thomson Leighton:
The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions
This outstanding paper initiated the field of learning from experience with posted prices. It considers a seller who has an unlimited supply of some item (such as a song) and wants to maximize their profit when interacting with a pool of buyers, each of whom wants at most one copy. The benchmark is regret minimization, the difference in expected revenue between a seller who fixes a single price in advance, based on knowledge about how much the buyers will pay, and a seller who sets the price for each buyer using an online strategy. The paper derives upper and lower bounds, which match to within a small factor, in three different settings: when all buyers have the same valuation, when buyers’ valuations are independently chosen from the same probability distribution, and when buyers are adversarial, but oblivious to the random choices made by the seller.
The paper makes a clear connection between posted pricing and the multi-armed bandit problem and started several lines of work in bandit learning, including bandit learning for dynamic pricing with limited supply, bandit learning for contextual dynamic pricing, and bandits with continuous actions. It was the first paper to consider regret minimization in the context of mechanism design, showing the power of this technique in a new and important context.