19620

Video URL

https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robus…

Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control

(2022). Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robust-control

Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control. The Simons Institute for the Theory of Computing, Feb. 11, 2022, https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robust-control

          @misc{ scivideos_19620,
            doi = {},
            url = {https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robust-control},
            author = {},
            keywords = {},
            language = {en},
            title = {Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {feb},
            note = {19620 see, \url{https://scivideos.org/index.php/Simons-Institute/19620}}
          }

Kaiqing Zhang (MIT)

February 11, 2022

Talk number19620

Source RepositorySimons Institute

Subject

Computer Science

Abstract

Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.

Supported by

Video URL

Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production