19277

Reinforcement Learning (Part II)

APA

(2022). Reinforcement Learning (Part II). The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/reinforcement-learning-part-ii

MLA

Reinforcement Learning (Part II). The Simons Institute for the Theory of Computing, Jan. 28, 2022, https://simons.berkeley.edu/talks/reinforcement-learning-part-ii

BibTex

          @misc{ scivideos_19277,
            doi = {},
            url = {https://simons.berkeley.edu/talks/reinforcement-learning-part-ii},
            author = {},
            keywords = {},
            language = {en},
            title = {Reinforcement Learning (Part II)},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {jan},
            note = {19277 see, \url{https://scivideos.org/index.php/Simons-Institute/19277}}
          }
          
Dylan Foster (Microsoft Research)
Talk number19277
Source RepositorySimons Institute

Abstract

This tutorial will give an overview of the theoretical foundations of reinforcement learning, a promising paradigm for developing AI systems capable of making data-driven decisions in unknown environments. The first part of the tutorial will cover introductory concepts such as problem formulations, planning in Markov decision processes (MDPs), exploration, and generalization; no prior background will be assumed. Building on these concepts, the main aim of the tutorial will be to give a bird's-eye view of the statistical landscape of reinforcement learning (e.g., what modeling assumptions lead to sample-efficient algorithms), with a focus on algorithmic principles and fundamental limits. Topics covered will range from basic challenges and solutions (exploration in tabular RL, policy gradient methods, contextual bandits) to the current frontier of understanding. A running theme will be connections and parallels between supervised learning and reinforcement learning. Time permitting, we will touch on additional topics such as reinforcement learning with offline data.