I will present recent work exploring how and when can confounded offline data be used to improve online reinforcement learning. We will explore conditions of partial observability and distribution shifts between the offline and online environments, and present results for contextual bandits, imitation learning and reinforcement learning.

In this talk, I will discuss recent work on reasoning and learning with soft interventions, including the problem of identification, extrapolation/transportability, and structural learning. I will also briefly discuss a new calculus, which generalizes the do-calculus, as well as algorithmic and graphical conditions.
Supporting material:
General Transportability of Soft Interventions: Completeness Results .
J. Correa, E. Bareinboim.
In Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
https://causalai.net/r68.pdf
Causal Discovery from Soft Interventions with Unknown Targets: Characterization & Learning.
A. Jaber, M. Kocaoglu, K. Shanmugam, E. Bareinboim.
In Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
https://causalai.net/r67.pdf
A Calculus For Stochastic Interventions: Causal Effect Identification and Surrogate Experiments
J. Correa, E. Bareinboim.
In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2019.
https://causalai.net/r55.pdf

Learning causal representations from observational data can be viewed as a task of identifying where and how the interventions were applied--this reveals information of the causal representations at the same time. Given that this learning task is a typical inverse problem, an essential issue is the establishment of identifiability results: one has to guarantee that the learned representations are consistent with the underlying causal process. Dealing with this issue generally involves appropriate assumptions. In this talk, I focus on learning latent causal variables and their causal relations, together with their relations with the measured variables, from observational data. I show what assumptions, together with instantiations of the "minimal change" principle, render the underlying causal representations identifiable across several settings. Specifically, in the i.i.d. case, the identifiability benefits from appropriate parametric assumptions on the causal relations and a certain type of "minimality" assumption. Temporal dependence makes it possible to recover latent temporally causal processes from time series data without parametric assumptions, and nonstationarity further improves the identifiability. I then draw the connection between recent advances in nonlinear independent component analysis and the minimal change principle. Finally, concerning the nonparametric setting with changing instantaneous causal relations, I show how to learn the latent variables with changing causal relations in light of the minimal change principle.

We provide a critical assessment of the account of causal emergence presented in Hoel (2017). The account integrates causal and information theoretic concepts to explain under what circumstances there can be causal descriptions of a system at multiple scales of analysis. We show that the causal macro variables implied by this account result in interventions with significant ambiguity, and that the operations of marginalization and abstraction do not commute. Both of these are desiderata that, we argue, any account of multi-scale causal analysis should be sensitive to. The problems we highlight in Hoel's definition of causal emergence derive from the use of various averaging steps and the introduction of a maximum entropy distribution that is extraneous to the system under investigation. (This is joint work with Lin Lin Lee.)

Since their introduction in Goodfellow et al. (2014) as sampling algorithms, Generative Adversarial Networks (GANs) have evolved to produce remarkable results in several tasks e.g. image generation, text-to-image translation, etc. Statistically, a GAN may be viewed as a density estimate constructed by optimizing over an Integral Probability Metric (IPM) encoded by its discriminator. I will present our work on estimating a nonparametric density under IPMs defined by Besov spaces. Such IPMs are a rich class of losses and include, e.g., Lp distances, the total variation distance, and generalizations of both the Wasserstein and the Kolmogorov-Smirnov distances. Our results generalize, unify, or improve several results, both recent and classical. Consequently, we imply bounds on the statistical error of a GAN, showing that GANs are minimax optimal and in some cases, strictly outperform the best linear estimator (e.g. the empirical estimator, kernel density estimator).
Further, we study the above framework of nonparametric density estimation under the Huber contamination model, in which a proportion of the data comes from an unknown outlier distribution. We provide a minimax optimal estimator that adapts to both an unknown contamination proportion and the unknown smoothness of the true density. We use this to imply that certain GAN architectures are robustly minimax optimal.

Motivated by recent advances in both theoretical and applied aspects of multiplayer games, spanning from e-sports to multi-agent generative adversarial networks, we focus on min-max optimization in team zero-sum games. In this class of games, players are split into two teams with payoffs equal within the same team and of opposite sign across the opponent team. Unlike the textbook two-player zero-sum games, finding a Nash equilibrium in our class can be shown to be CLS-hard, i.e., it is unlikely to have a polynomial-time algorithm for computing Nash equilibria. Moreover, in this generalized framework, we establish that even asymptotic last iterate or time average convergence to a Nash Equilibrium is not possible using Gradient Descent Ascent (GDA), its optimistic variant, and extra gradient. Specifically, we present a family of team games whose induced utility is \emph{non} multi-linear with \emph{non} attractive \emph{per-se} mixed Nash Equilibria, as strict saddle points of the underlying optimization landscape. Leveraging techniques from control theory, we complement these negative results by designing a modified GDA that converges locally to Nash equilibria. Finally, we discuss connections of our framework with AI architectures with team competition structures like multi-agent generative adversarial networks.

What will happen to Y if we do A? A variety of meaningful socio-economic and engineering questions can be formulated this way. To name a few: What will happen to a patient's health if they are given a new therapy? What will happen to a country's economy if policy-makers legislate a new tax? What will happen to a data center's latency if a new congestion control protocol is used? In this talk, we will explore how to answer such counterfactual questions using observational data---which is increasingly available due to digitization and pervasive sensors---and/or very limited experimental data. The two key challenges in doing so are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. Towards this goal, the key framework we introduce is connecting causal inference with tensor completion, a very active area of research across a variety of fields. In particular, we show how to represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. The efficacy of the proposed estimators is shown on high-impact real-world applications. These include working with: (i) TaurRx Therapeutics to propose novel clinical trial designs to reduce the number of patients recruited for a trial and to correct for bias from patient dropouts; (ii) Uber Technologies on evaluating the impact of certain driver engagement policies without having to run an A/B test.

The ability to learn from data and make decisions in real-time has led to the rapid deployment of machine learning algorithms across many aspects of everyday life. While this has enabled new services and technologies, the fact that algorithms are increasingly interacting with people and other algorithms marks a distinct shift away from the traditional machine learning paradigm. Indeed, little is known about how these algorithms--- that were designed to operate in isolation--- behave when confronted with strategic behaviors on the part of people, and the extent to which strategic agents can game the algorithms to achieve better outcomes. In this talk, I will give an overview of my work on learning games and in the presence of strategic agents and multi-agent reinforcement learning.

Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.

In this talk, I will survey some of my dissertation work on algorithmic problems arising in the design and analysis of randomized experiments. I hope to give a sense of the style of problems and technical work that I enjoy. During my dissertation work, I was asking: How can we design sampling algorithms to achieve desired levels of covariate balance in a randomized experiment? How can we estimate the variance of a treatment effect estimator in the presence of general interference? How should we analyze and design so-called "bipartite" experiments where units which receive treatment are distinct from units on which outcomes are measured?

What is the best we can do with the amount of data at our disposal with a given learning task? Modern learning problems---with a modest amount of data or subject to data processing constraints---frequently raise the need to understand the fundamental limits and make judicious use of the available small or imperfect data. This talk will cover several examples of learning where exploiting the key structure, as well as optimally trading between real-world resources, are vital to achieve statistical efficiency.

Current methods for causal discovery typically report a single directed acyclic graph (DAG). Through an example, I hope to convince you that this might not be the best practice. In fact, depending on how two DAGs intersect and the local geometry at the intersection, the hardness of this problem can vary dramatically.