Video URL

https://simons.berkeley.edu/talks/tbd-249

Statistical Efficiency in Offline Reinforcement Learning

(2020). Statistical Efficiency in Offline Reinforcement Learning. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/tbd-249

Statistical Efficiency in Offline Reinforcement Learning. The Simons Institute for the Theory of Computing, Dec. 03, 2020, https://simons.berkeley.edu/talks/tbd-249

          @misc{ scivideos_16830,
            doi = {},
            url = {https://simons.berkeley.edu/talks/tbd-249},
            author = {},
            keywords = {},
            language = {en},
            title = {Statistical Efficiency in Offline Reinforcement Learning},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16830 see, \url{https://scivideos.org/Simons-Institute/16830}}
          }

Nathan Kallus (Cornell)

December 03, 2020

Talk number16830

Source RepositorySimons Institute

Subject

Computer Science

Abstract

Offline RL is crucial in applications where experimentation is limited, such as medicine, but it is also notoriously difficult because the similarity between the trajectories observed and those generated by any proposed policy diminishes exponentially as horizon grows, known as the curse of horizon. To better understand this limitation, we study the statistical efficiency limits of two central tasks in offline reinforcement learning: estimating the policy value and the policy gradient from off-policy data. The efficiency bounds reveal that the curse is generally insurmountable without assuming additional structure and as such plagues many standard estimators that work in general problems, but it may be overcome in Markovian settings and even further attenuated in stationary settings. We develop the first estimators achieving the efficiency limits in finite- and infinite-horizon MDPs using a meta-algorithm we term Double Reinforcement Learning (DRL). We provide favorable guarantees for DRL and for off-policy policy optimization via efficiently-estimated policy gradient ascent.

Supported by

Video URL

Statistical Efficiency in Offline Reinforcement Learning

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

Statistical Efficiency in Offline Reinforcement Learning

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production