ICTS:32502

Second Order Methods for Bandit Optimization and Control

APA

(2025). Second Order Methods for Bandit Optimization and Control. SciVideos. https://scivideos.org/icts-tifr/32502

MLA

Second Order Methods for Bandit Optimization and Control. SciVideos, Aug. 13, 2025, https://scivideos.org/icts-tifr/32502

BibTex

          @misc{ scivideos_ICTS:32502,
            doi = {},
            url = {https://scivideos.org/icts-tifr/32502},
            author = {},
            keywords = {},
            language = {en},
            title = {Second Order Methods for Bandit Optimization and Control},
            publisher = {},
            year = {2025},
            month = {aug},
            note = {ICTS:32502 see, \url{https://scivideos.org/icts-tifr/32502}}
          }
          
Arun Sai Suggala
Talk numberICTS:32502
Source RepositoryICTS-TIFR

Abstract

Bandit convex optimization is a powerful framework for sequential decision-making, but existing algorithms with optimal regret guarantees are often too computationally expensive for high-dimensional problems. This talk introduces a simple and practical BCO algorithm, the Bandit Newton Step, which leverages second-order information for decision-making. We will show that our algorithm obtains an optimal $O(T^{1/2})$ regret bound for a large and practical class of functions that satisfy a condition we call “$\kappa$-convexity,” which includes linear, quadratic, and generalized linear losses. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression.

Furthermore, we'll discuss the extension of our method to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an optimal regret algorithm for the bandit Linear-Quadratic (LQ) control problem under a fully adversarial noise model, resolving a key open question in the field. Finally, we contrast this result by proving that bandit problems with more general memory structures are fundamentally harder, establishing a tight $\Omega(T^{2/3})$ lower bound on regret.