22743

A Game-Theoretic Approach to Offline Reinforcement Learning

APA

(2022). A Game-Theoretic Approach to Offline Reinforcement Learning. The Simons Institute for the Theory of Computing. https://old.simons.berkeley.edu/talks/game-theoretic-approach-offline-reinforcement-learning

MLA

A Game-Theoretic Approach to Offline Reinforcement Learning. The Simons Institute for the Theory of Computing, Oct. 11, 2022, https://old.simons.berkeley.edu/talks/game-theoretic-approach-offline-reinforcement-learning

BibTex

          @misc{ scivideos_22743,
            doi = {},
            url = {https://old.simons.berkeley.edu/talks/game-theoretic-approach-offline-reinforcement-learning},
            author = {},
            keywords = {},
            language = {en},
            title = {A Game-Theoretic Approach to Offline Reinforcement Learning},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {oct},
            note = {22743 see, \url{https://scivideos.org/index.php/simons-institute/22743}}
          }
          
Ching-An Cheng (Microsoft Research)
Talk number22743
Source RepositorySimons Institute

Abstract

Offline reinforcement learning (RL) is a paradigm for designing agents that can learn from existing datasets. Because offline RL can learn policies without collecting new data or expensive expert demonstrations, it offers great potentials for solving real-world problems. However, offline RL faces a fundamental challenge: oftentimes data in real world can only be collected by policies meeting certain criteria (e.g., on performance, safety, or ethics). As a result, existing data, though being large, could lack diversity and have limited usefulness. In this talk, I will introduce a generic game-theoretic approach to offline RL. It frames offline RL as a two-player game where a learning agent competes with an adversary that simulates the uncertain decision outcomes due to missing data coverage. By this game analogy, I will present a systematic and provably correct framework to design offline RL algorithms that can learn good policies with state-of-the-art empirical performance. In addition, I will show that this framework reveals a natural connection between offline RL and imitation learning, which ensures the learned policies to be always no worse than the data collection policies regardless of hyperparameter choices.