Video URL

https://simons.berkeley.edu/talks/beating-curse-dimensionality-high-dimensional-optimal-stopping

Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping

(2020). Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/beating-curse-dimensionality-high-dimensional-optimal-stopping

Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping. The Simons Institute for the Theory of Computing, Oct. 29, 2020, https://simons.berkeley.edu/talks/beating-curse-dimensionality-high-dimensional-optimal-stopping

          @misc{ scivideos_16700,
            doi = {},
            url = {https://simons.berkeley.edu/talks/beating-curse-dimensionality-high-dimensional-optimal-stopping},
            author = {},
            keywords = {},
            language = {en},
            title = {Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {oct},
            note = {16700 see, \url{https://scivideos.org/index.php/Simons-Institute/16700}}
          }

David Goldberg (Cornell ORIE)

October 29, 2020

Talk number16700

Source RepositorySimons Institute

Subject

Computer Science

Abstract

The sample and computational complexity of reinforcement learning algorithms under the generative model has been a central focus of the literature. Most existing approaches which can guarantee epsilon-optimality require a complexity either dependent on the size of the state space, or which scales exponentially in the time horizon T, in both cases suffering from the curse of dimensionality. Other approaches may be more efficient, but typically either make strong structural assumptions, or do not come with optimality guarantees. We reconcile this dilemma for the celebrated problem of high-dimensional optimal stopping, a special case of RL with important applications to financial engineering, operations research, economics and computer science, and well-known to be computationally challenging when the state space and time horizon are large. We propose the first algorithm with sample and computational complexity polynomial in T, and effectively independent of the underlying state space, which outputs epsilon-optimal stopping policies and value function approximations. Our algorithm is inspired by a novel connection between network flow theory in combinatorial optimization and the classical martingale duality theory for stopping problems, and can be viewed as a higher-order generalization of the celebrated prophet inequalities. Our approach yields an expansion for the value functions as an infinite sum, which is fundamentally different from standard expansions based on backwards induction, with the following properties : 1. each term is the expectation of an elegant recursively defined infimum; 2. the first k terms only require T^k samples to approximate; and 3. truncating at the first k terms yields a (normalized) error of 1/k. The terms of the associated sum can also be viewed as a provably good set of basis functions, which can be efficiently evaluated by simulation, in contrast with past approximate dynamic programming approaches which take a heuristic approach to deriving basis functions. Our work shows that although the state space is exponentially large, one need only explore it to the extent that one can compute these (few) basis functions, which can be done efficiently. Time permitting, we will also explore applications of our approach to other high-dimensional RL problems (including those arising in dynamic pricing and online combinatorial optimization), and discuss connections to parallel and quantum computing. Joint work with Ph.D. student Yilun Chen.

Supported by

Video URL

Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production