Format results
- Ananya Uppal (University of Texas Austin)
Teamwork Makes The Von Neumann Work: Two Team Zero Sum Games
Emmanouil Vlatakis (Columbia University)Causal Inference For Socio-Economic & Engineering Systems
Anish Agarwal (MIT EECS)Learning In The Presence Of Strategic Agents: Dynamics, Equilibria, And Convergence
Eric Mazumdar (Caltech)Algorithmic Advances For The Design And Analysis Of Randomized Control Trials
Christopher Harshaw (Yale)Fundamental Limits Of Learning In Data-Driven Problems
Yanjun Han (Simons Institute)Harish-Chandra bimodules in complex rank
Aleksandra Utiralova Massachusetts Institute of Technology (MIT)
Nonparametric Density Estimation and Convergence of GANs under Besov IPMs
Ananya Uppal (University of Texas Austin)Since their introduction in Goodfellow et al. (2014) as sampling algorithms, Generative Adversarial Networks (GANs) have evolved to produce remarkable results in several tasks e.g. image generation, text-to-image translation, etc. Statistically, a GAN may be viewed as a density estimate constructed by optimizing over an Integral Probability Metric (IPM) encoded by its discriminator. I will present our work on estimating a nonparametric density under IPMs defined by Besov spaces. Such IPMs are a rich class of losses and include, e.g., Lp distances, the total variation distance, and generalizations of both the Wasserstein and the Kolmogorov-Smirnov distances. Our results generalize, unify, or improve several results, both recent and classical. Consequently, we imply bounds on the statistical error of a GAN, showing that GANs are minimax optimal and in some cases, strictly outperform the best linear estimator (e.g. the empirical estimator, kernel density estimator). Further, we study the above framework of nonparametric density estimation under the Huber contamination model, in which a proportion of the data comes from an unknown outlier distribution. We provide a minimax optimal estimator that adapts to both an unknown contamination proportion and the unknown smoothness of the true density. We use this to imply that certain GAN architectures are robustly minimax optimal.Teamwork Makes The Von Neumann Work: Two Team Zero Sum Games
Emmanouil Vlatakis (Columbia University)Motivated by recent advances in both theoretical and applied aspects of multiplayer games, spanning from e-sports to multi-agent generative adversarial networks, we focus on min-max optimization in team zero-sum games. In this class of games, players are split into two teams with payoffs equal within the same team and of opposite sign across the opponent team. Unlike the textbook two-player zero-sum games, finding a Nash equilibrium in our class can be shown to be CLS-hard, i.e., it is unlikely to have a polynomial-time algorithm for computing Nash equilibria. Moreover, in this generalized framework, we establish that even asymptotic last iterate or time average convergence to a Nash Equilibrium is not possible using Gradient Descent Ascent (GDA), its optimistic variant, and extra gradient. Specifically, we present a family of team games whose induced utility is \emph{non} multi-linear with \emph{non} attractive \emph{per-se} mixed Nash Equilibria, as strict saddle points of the underlying optimization landscape. Leveraging techniques from control theory, we complement these negative results by designing a modified GDA that converges locally to Nash equilibria. Finally, we discuss connections of our framework with AI architectures with team competition structures like multi-agent generative adversarial networks.Causal Inference For Socio-Economic & Engineering Systems
Anish Agarwal (MIT EECS)What will happen to Y if we do A? A variety of meaningful socio-economic and engineering questions can be formulated this way. To name a few: What will happen to a patient's health if they are given a new therapy? What will happen to a country's economy if policy-makers legislate a new tax? What will happen to a data center's latency if a new congestion control protocol is used? In this talk, we will explore how to answer such counterfactual questions using observational data---which is increasingly available due to digitization and pervasive sensors---and/or very limited experimental data. The two key challenges in doing so are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. Towards this goal, the key framework we introduce is connecting causal inference with tensor completion, a very active area of research across a variety of fields. In particular, we show how to represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. The efficacy of the proposed estimators is shown on high-impact real-world applications. These include working with: (i) TaurRx Therapeutics to propose novel clinical trial designs to reduce the number of patients recruited for a trial and to correct for bias from patient dropouts; (ii) Uber Technologies on evaluating the impact of certain driver engagement policies without having to run an A/B test.Learning In The Presence Of Strategic Agents: Dynamics, Equilibria, And Convergence
Eric Mazumdar (Caltech)The ability to learn from data and make decisions in real-time has led to the rapid deployment of machine learning algorithms across many aspects of everyday life. While this has enabled new services and technologies, the fact that algorithms are increasingly interacting with people and other algorithms marks a distinct shift away from the traditional machine learning paradigm. Indeed, little is known about how these algorithms--- that were designed to operate in isolation--- behave when confronted with strategic behaviors on the part of people, and the extent to which strategic agents can game the algorithms to achieve better outcomes. In this talk, I will give an overview of my work on learning games and in the presence of strategic agents and multi-agent reinforcement learning.Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control
Kaiqing Zhang (MIT)Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.Algorithmic Advances For The Design And Analysis Of Randomized Control Trials
Christopher Harshaw (Yale)In this talk, I will survey some of my dissertation work on algorithmic problems arising in the design and analysis of randomized experiments. I hope to give a sense of the style of problems and technical work that I enjoy. During my dissertation work, I was asking: How can we design sampling algorithms to achieve desired levels of covariate balance in a randomized experiment? How can we estimate the variance of a treatment effect estimator in the presence of general interference? How should we analyze and design so-called "bipartite" experiments where units which receive treatment are distinct from units on which outcomes are measured?Fundamental Limits Of Learning In Data-Driven Problems
Yanjun Han (Simons Institute)What is the best we can do with the amount of data at our disposal with a given learning task? Modern learning problems---with a modest amount of data or subject to data processing constraints---frequently raise the need to understand the fundamental limits and make judicious use of the available small or imperfect data. This talk will cover several examples of learning where exploiting the key structure, as well as optimally trading between real-world resources, are vital to achieve statistical efficiency.Choosing Between Dags Can Be Hard
Richard Guo (University of Cambridge)Current methods for causal discovery typically report a single directed acyclic graph (DAG). Through an example, I hope to convince you that this might not be the best practice. In fact, depending on how two DAGs intersect and the local geometry at the intersection, the hardness of this problem can vary dramatically.Causal Inference With Corrupted Data: Measurement Error, Missing Values, Discretization, And Differential Privacy
Rahul Singh (MIT)Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. The rate of Gaussian approximation is $n^{-1/2}$ for global parameters such as average treatment effect, and it degrades gracefully for local parameters such as heterogeneous treatment effect for a specific demographic. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations calibrated to resemble differential privacy as implemented in the 2020 US Census.Causal Inference And Data-Driven Decision-Making
Angela Zhou (UC Berkeley)I will provide a brief overview of previous work on credible inference in the context of causal inference and statistical machine learning, and discuss ongoing directions on interfaces of causal inference embedded in complex operational systems.Possibility of causal loops without superluminal signalling -- a general framework
Vilasini Venkatesh University of York
Causality is fundamental to science, but it appears in several different forms. One is relativistic causality, which is tied to a space-time structure and forbids signalling outside the future. On the other hand, causality can be defined operationally using causal models by considering the flow of information within a network of physical systems and interventions on them. From both a foundational and practical viewpoint, it is useful to establish the class of causal models that can coexist with relativistic principles such as no superluminal signalling, noting that causation and signalling are not equivalent. We develop such a general framework that allows these different notions of causality to be independently defined and for connections between them to be established. The framework first provides an operational way to model causation in the presence of cyclic, fine-tuned and non-classical causal influences. We then consider how a causal model can be embedded in a space-time structure and propose a mathematical condition (compatibility) for ensuring that the embedded causal model does not allow signalling outside the space-time future. We identify several distinct classes of causal loops that can arise in our framework, showing that compatibility with a space-time can rule out only some of them. We then demonstrate the mathematical possibility of causal loops embedded in Minkowski space-time that can be operationally detected through interventions, without leading to superluminal signalling. Our framework provides conditions for preventing superluminal signalling within arbitrary (possibly cyclic) causal models and also allows us to model causation in post-quantum theories admitting jamming correlations. Applying our framework to such scenarios, we show that post-quantumjamming can indeed lead to superluminal signalling contrary to previous claims. Finally, this work introduces a new causal modelling concept of ``higher-order affects relations'' and several related technical results, which have applications for causal discovery in fined-tuned causal models.
Harish-Chandra bimodules in complex rank
Aleksandra Utiralova Massachusetts Institute of Technology (MIT)
Deligne tensor categories are defined as an interpolation of the categories of representations of groups GL_n, O_n, Sp_{2n} or S_n to the complex values of the parameter n. One can extend many classical representation-theoretic notions and constructions to this context. These complex rank analogs of classical objects provide insights into their stable behavior patterns as n goes to infinity.
I will talk about some of my results on Harish-Chandra bimodules in Deligne categories. It is known that in the classical case simple Harish-Chandra bimodules admit a classification in terms of W-orbits of certain pairs of weights. However, the notion of weight is not well-defined in the setting of Deligne categories. I will explain how in complex rank the above-mentioned classification translates to a condition on the corresponding (left and right) central characters.
Another interesting phenomenon arising in complex rank is that there are two ways to define Harish-Chandra bimodules. That is, one can either require that the center acts locally finitely on a bimodule M or that M has a finite K-type. The two conditions are known to be equivalent for a semi-simple Lie algebra in the classical setting, however, in Deligne categories that is no longer the case. I will talk about a way to construct examples of Harish-Chandra bimodules of finite K-type using the ultraproduct realization of Deligne categories.Zoom Link: https://pitp.zoom.us/j/93951304913?pwd=WVk1Uk54ODkyT3ZIT2ljdkwxc202Zz09