Reinforcement learning (RL) has made substantial empirical progress in solving hard AI challenges in the past few years. A large fraction of these progresses—Go, Dota 2, Starcraft 2, economic simulation, social behavior learning, and so on—come from multi-agent RL, that is, sequential decision making involving more than one agent. While the theoretical study of single-agent RL has a long history and a vastly growing recent interest, multi-agent RL theory is arguably a newer and less developed field, with its own unique challenges and opportunities.
In this tutorial, we present an overview of recent theoretical developments in multi-agent RL. We will focus on the model of Markov games, and cover basic formulations, objectives, as well as learning algorithms with their theoretical guarantees. We will discuss the inefficiency of classical algorithms---self-play, fictitious play, etc., and then proceed with recent advances in provably efficient learning algorithms under various regimes including two-player zero-sum games, multiplayer general-sum games, exploration, and large state space (function approximation).
In over the last two decades we have developed good understanding how to quantify the impact of strategic user behavior on overall performance in many games (including traffic routing as well as online auctions). Early work focused on evaluating the quality of Nash equilibria. However, it turns out that the resulting bounds extend to learning out comes in repeated games under pretty general assumptions.
In this talk we will review these results, which mostly assume that learners satisfy versions of the no-regret property after a long enough play. We will also discuss pros and cons of no-regret as a behavioral assumption on learning outcomes, as well as limitations of assuming no-regret as the main condition players achieve as they are learning.
Reinforcement learning (RL) has made substantial empirical progress in solving hard AI challenges in the past few years. A large fraction of these progresses—Go, Dota 2, Starcraft 2, economic simulation, social behavior learning, and so on—come from multi-agent RL, that is, sequential decision making involving more than one agent. While the theoretical study of single-agent RL has a long history and a vastly growing recent interest, multi-agent RL theory is arguably a newer and less developed field, with its own unique challenges and opportunities.
In this tutorial, we present an overview of recent theoretical developments in multi-agent RL. We will focus on the model of Markov games, and cover basic formulations, objectives, as well as learning algorithms with their theoretical guarantees. We will discuss the inefficiency of classical algorithms---self-play, fictitious play, etc., and then proceed with recent advances in provably efficient learning algorithms under various regimes including two-player zero-sum games, multiplayer general-sum games, exploration, and large state space (function approximation).
In over the last two decades we have developed good understanding how to quantify the impact of strategic user behavior on overall performance in many games (including traffic routing as well as online auctions). Early work focused on evaluating the quality of Nash equilibria. However, it turns out that the resulting bounds extend to learning out comes in repeated games under pretty general assumptions.
In this talk we will review these results, which mostly assume that learners satisfy versions of the no-regret property after a long enough play. We will also discuss pros and cons of no-regret as a behavioral assumption on learning outcomes, as well as limitations of assuming no-regret as the main condition players achieve as they are learning.
This tutorial will give an overview of the theoretical foundations of reinforcement learning, a promising paradigm for developing AI systems capable of making data-driven decisions in unknown environments. The first part of the tutorial will cover introductory concepts such as problem formulations, planning in Markov decision processes (MDPs), exploration, and generalization; no prior background will be assumed. Building on these concepts, the main aim of the tutorial will be to give a bird's-eye view of the statistical landscape of reinforcement learning (e.g., what modeling assumptions lead to sample-efficient algorithms), with a focus on algorithmic principles and fundamental limits. Topics covered will range from basic challenges and solutions (exploration in tabular RL, policy gradient methods, contextual bandits) to the current frontier of understanding. A running theme will be connections and parallels between supervised learning and reinforcement learning. Time permitting, we will touch on additional topics such as reinforcement learning with offline data.
This tutorial will give an overview of the theoretical foundations of reinforcement learning, a promising paradigm for developing AI systems capable of making data-driven decisions in unknown environments. The first part of the tutorial will cover introductory concepts such as problem formulations, planning in Markov decision processes (MDPs), exploration, and generalization; no prior background will be assumed. Building on these concepts, the main aim of the tutorial will be to give a bird's-eye view of the statistical landscape of reinforcement learning (e.g., what modeling assumptions lead to sample-efficient algorithms), with a focus on algorithmic principles and fundamental limits. Topics covered will range from basic challenges and solutions (exploration in tabular RL, policy gradient methods, contextual bandits) to the current frontier of understanding. A running theme will be connections and parallels between supervised learning and reinforcement learning. Time permitting, we will touch on additional topics such as reinforcement learning with offline data.
Classically, the outcome of a learning algorithm is considered in isolation from the effects that it may have on the process that generates the data or the party who is interested in learning. In today's world, increasingly more people and organizations interact with learning systems, making it necessary to consider these effects. This tutorial will cover the mathematical foundation for addressing learning and learnability in the presence of economic and social forces.
We will cover recent advances in the theory of machine learning and algorithmic economics. In the first half of this tutorial, we will consider strategic and adversarial interactions between learning algorithms and those affected by algorithmic actions. What makes these interactions especially powerful is that they often occur over a long-term basis and can corrupt data patterns that are essential for machine learning. In this part of the talk, we work with online decision processes (such as no-regret learning) and solution concepts (such as stackelberg and minmax equilibria) to discuss statistical and computational aspects of learning and learnability in the presence of such interactions. In the second half of the tutorial, we will consider collaborative interactions in machine learning. What makes these interactions especially beneficial is their ability to learn across multiple stakeholders' tasks. In this part of the talk, we see how online algorithms act as a medium for effective collaborations. We also discuss challenges involved in designing efficient data sharing mechanisms that fully account for learner's incentives.
Classically, the outcome of a learning algorithm is considered in isolation from the effects that it may have on the process that generates the data or the party who is interested in learning. In today's world, increasingly more people and organizations interact with learning systems, making it necessary to consider these effects. This tutorial will cover the mathematical foundation for addressing learning and learnability in the presence of economic and social forces.
We will cover recent advances in the theory of machine learning and algorithmic economics. In the first half of this tutorial, we will consider strategic and adversarial interactions between learning algorithms and those affected by algorithmic actions. What makes these interactions especially powerful is that they often occur over a long-term basis and can corrupt data patterns that are essential for machine learning. In this part of the talk, we work with online decision processes (such as no-regret learning) and solution concepts (such as stackelberg and minmax equilibria) to discuss statistical and computational aspects of learning and learnability in the presence of such interactions. In the second half of the tutorial, we will consider collaborative interactions in machine learning. What makes these interactions especially beneficial is their ability to learn across multiple stakeholders' tasks. In this part of the talk, we see how online algorithms act as a medium for effective collaborations. We also discuss challenges involved in designing efficient data sharing mechanisms that fully account for learner's incentives.
Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.
In this talk, I will survey some of my dissertation work on algorithmic problems arising in the design and analysis of randomized experiments. I hope to give a sense of the style of problems and technical work that I enjoy. During my dissertation work, I was asking: How can we design sampling algorithms to achieve desired levels of covariate balance in a randomized experiment? How can we estimate the variance of a treatment effect estimator in the presence of general interference? How should we analyze and design so-called "bipartite" experiments where units which receive treatment are distinct from units on which outcomes are measured?