Second Order Methods for Bandit Optimization and Control
APA
(2025). Second Order Methods for Bandit Optimization and Control. SciVideos. https://scivideos.org/icts-tifr/32502
MLA
Second Order Methods for Bandit Optimization and Control. SciVideos, Aug. 13, 2025, https://scivideos.org/icts-tifr/32502
BibTex
@misc{ scivideos_ICTS:32502, doi = {}, url = {https://scivideos.org/icts-tifr/32502}, author = {}, keywords = {}, language = {en}, title = {Second Order Methods for Bandit Optimization and Control}, publisher = {}, year = {2025}, month = {aug}, note = {ICTS:32502 see, \url{https://scivideos.org/icts-tifr/32502}} }
Abstract
Bandit convex optimization is a powerful framework for sequential decision-making, but existing algorithms with optimal regret guarantees are often too computationally expensive for high-dimensional problems. This talk introduces a simple and practical BCO algorithm, the Bandit Newton Step, which leverages second-order information for decision-making. We will show that our algorithm obtains an optimal $O(T^{1/2})$ regret bound for a large and practical class of functions that satisfy a condition we call “$\kappa$-convexity,” which includes linear, quadratic, and generalized linear losses. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression.
Furthermore, we'll discuss the extension of our method to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an optimal regret algorithm for the bandit Linear-Quadratic (LQ) control problem under a fully adversarial noise model, resolving a key open question in the field. Finally, we contrast this result by proving that bandit problems with more general memory structures are fundamentally harder, establishing a tight $\Omega(T^{2/3})$ lower bound on regret.