Video URL

https://old.simons.berkeley.edu/node/22740

A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning

(2022). A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning. The Simons Institute for the Theory of Computing. https://old.simons.berkeley.edu/node/22740

A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning. The Simons Institute for the Theory of Computing, Oct. 10, 2022, https://old.simons.berkeley.edu/node/22740

          @misc{ scivideos_22740,
            doi = {},
            url = {https://old.simons.berkeley.edu/node/22740},
            author = {},
            keywords = {},
            language = {en},
            title = {A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {oct},
            note = {22740 see, \url{https://scivideos.org/simons-institute/22740}}
          }

Zaiwei Chen (Caltech)

October 10, 2022

Talk number22740

Source RepositorySimons Institute

Subject

Computer Science

Abstract

Reinforcement learning (RL) is a learning paradigm for large-scale sequential decision making problems in complex stochastic systems. Many modern RL algorithms solve the underlying Bellman fixed point equation using Stochastic Approximation (SA). This two-part tutorial presents an overview of our results on SA, and illustrate how they can be used to obtain sample complexity results of a large class of RL algorithms. Part I of the tutorial focuses on SA, a popular approach for solving fixed point equations when the information is corrupted by noise. We consider a type of SA algorithms for operators that are contractive under arbitrary norms (especially the l-infinity norm). We present finite sample bounds on the mean square error, which are established using a Lyapunov framework based on infimal convolution and generalized Moreau envelope. We then present our recent result on exponential concentration of the tail error, even when the iterates are not bounded by a constant. These tail bounds are obtained using exponential supermartingales in conjunction with the Moreau envelop and bootstrapping. Part II of the tutorial focuses on RL. We briefly illustrate the connection between RL algorithms and SA of contractive operators, and highlight the importance of the infinity norm. We then exploit the results from Part I, to present finite sample bounds of various RL algorithms including on policy and off policy algorithms, both in tabular and linear function approximation settings.

Supported by

Video URL

A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production