Video URL

https://simons.berkeley.edu/talks/tbd-252

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

(2020). Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/tbd-252

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation. The Simons Institute for the Theory of Computing, Dec. 03, 2020, https://simons.berkeley.edu/talks/tbd-252

          @misc{ scivideos_16833,
            doi = {},
            url = {https://simons.berkeley.edu/talks/tbd-252},
            author = {},
            keywords = {},
            language = {en},
            title = {Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16833 see, \url{https://scivideos.org/index.php/Simons-Institute/16833}}
          }

Devavrat Shah (MIT)

December 03, 2020

Talk number16833

Source RepositorySimons Institute

Subject

Computer Science

Abstract

We consider the question of learning Q-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If Q-function is Lipschitz continuous, then the minimal sample complexity for estimating œµ-optimal Q-function is known to scale as O(eps^{-(d1+d2+2)}) per classical non-parametric learning theory, where d1 and d2 denote the dimensions of the state and action spaces respectively. The Q-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of Q-functions parameterized by its "rank" r, which contains all Lipschitz Q-functions as r‚Üí‚àû. As our key contribution, we develop a simple, iterative learning algorithm that finds œµ-optimal Q-function with sample complexity of O(eps^{-max(d1,d2)+2}) when the optimal Q-function has low rank r and the discounting factor is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix with respect to max-norm sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms. This is based on joint work with Dogyoon Song, Zhi Xu, Yuzhe Yang. The manuscript is available at https://arxiv.org/abs/2006.06135.

Supported by

Video URL

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production