Elias Bareinboim (Columbia University), Frederick Eberhardt (Caltech), Kun Zhang (Carnegie Mellon University), and Uri Shalit (Technion - Israel Institute of Technology)

In this talk, I will discuss recent work on reasoning and learning with soft interventions, including the problem of identification, extrapolation/transportability, and structural learning. I will also briefly discuss a new calculus, which generalizes the do-calculus, as well as algorithmic and graphical conditions.
Supporting material:
General Transportability of Soft Interventions: Completeness Results .
J. Correa, E. Bareinboim.
In Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
https://causalai.net/r68.pdf
Causal Discovery from Soft Interventions with Unknown Targets: Characterization & Learning.
A. Jaber, M. Kocaoglu, K. Shanmugam, E. Bareinboim.
In Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
https://causalai.net/r67.pdf
A Calculus For Stochastic Interventions: Causal Effect Identification and Surrogate Experiments
J. Correa, E. Bareinboim.
In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2019.
https://causalai.net/r55.pdf

We provide a critical assessment of the account of causal emergence presented in Hoel (2017). The account integrates causal and information theoretic concepts to explain under what circumstances there can be causal descriptions of a system at multiple scales of analysis. We show that the causal macro variables implied by this account result in interventions with significant ambiguity, and that the operations of marginalization and abstraction do not commute. Both of these are desiderata that, we argue, any account of multi-scale causal analysis should be sensitive to. The problems we highlight in Hoel's definition of causal emergence derive from the use of various averaging steps and the introduction of a maximum entropy distribution that is extraneous to the system under investigation. (This is joint work with Lin Lin Lee.)

Complete randomization allows for consistent estimation of the average treatment effect based on the difference in means of the outcomes without strong modeling assumptions on the outcome-generating process. Appropriate use of the pretreatment covariates can further improve the estimation efficiency. However, missingness in covariates is common in experiments and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysis adjusts for all completely observed covariates and improves the efficiency of the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain of adjusting for covariates subject to missingness? A key insight is that the missingness indicators act as fully observed pretreatment covariates as long as missingness is not affected by the treatment, and can thus be used in covariate adjustment to bring additional estimation efficiency. This motivates adding the missingness indicators to the regression adjustment, yielding the missingness-indicator method as a well-known but not so popular strategy in the literature of missing data. We recommend it due to its many advantages. We also propose modifications to the missingness-indicator method based on asymptotic and finite-sample considerations. To reconcile the conflicting recommendations in the missing data literature, we analyze and compare various strategies for analyzing randomized experiments with missing covariates under the design-based framework. This framework treats randomization as the basis for inference and does not impose any modeling assumptions on the outcome-generating process and missing-data mechanism.

A well-known limitation of modeling causal systems via DAGs is their inability to encode context-specific information. Among the several proposed representations for context-specific causal information are the staged tree models, which are colored probability trees capable of expressing highly diverse context-specific information. The expressive power of staged trees comes at the cost of easy interpretability and the admittance of desirable properties useful in the development of causal discovery algorithms. In this talk, we consider a subfamily of staged trees, which we call CStrees, that admit an alternative representation via a sequence of DAGs. This alternate representation allows us to prove a Verma-Pearl-type characterization of model equivalence for CStrees which extends to the interventional setting, providing a graphical characterization of interventional CStree model equivalence. We will discuss these results and their potential applications to causal discovery algorithms for context-specific models based on interventional and observational data.

Identifying which genetic variants influence medically relevant phenotypes is an important task both for therapeutic development and for risk prediction. In the last decade, genome wide association studies have been the most widely-used instrument to tackle this question. One challenge that they encounter is in the interplay between genetic variability and the structure of human populations. In this talk, we will focus on some opportunities that arise when one collects data from diverse populations and present statistical methods that allow us to leverage them. The presentation will be based on joint work with M. Sesia, S. Li, Z. Ren, Y. Romano and E. Candes.

I will present recent work exploring how and when can confounded offline data be used to improve online reinforcement learning. We will explore conditions of partial observability and distribution shifts between the offline and online environments, and present results for contextual bandits, imitation learning and reinforcement learning.

Learning causal representations from observational data can be viewed as a task of identifying where and how the interventions were applied--this reveals information of the causal representations at the same time. Given that this learning task is a typical inverse problem, an essential issue is the establishment of identifiability results: one has to guarantee that the learned representations are consistent with the underlying causal process. Dealing with this issue generally involves appropriate assumptions. In this talk, I focus on learning latent causal variables and their causal relations, together with their relations with the measured variables, from observational data. I show what assumptions, together with instantiations of the "minimal change" principle, render the underlying causal representations identifiable across several settings. Specifically, in the i.i.d. case, the identifiability benefits from appropriate parametric assumptions on the causal relations and a certain type of "minimality" assumption. Temporal dependence makes it possible to recover latent temporally causal processes from time series data without parametric assumptions, and nonstationarity further improves the identifiability. I then draw the connection between recent advances in nonlinear independent component analysis and the minimal change principle. Finally, concerning the nonparametric setting with changing instantaneous causal relations, I show how to learn the latent variables with changing causal relations in light of the minimal change principle.

We consider testing and learning problems on causal Bayesian networks where the variables take values from a bounded domain. We address two problems: (i) Given access to observations and experiments on two unknown environments X and Y, test whether X=Y or X is far from Y. Here, two environments are equal if no intervention can distinguish between them. (ii) Given access to observations and experiments on an unknown environment X, learn a DAG that admits a causal model M such that X is close to M. For problem (i), we show that under natural sparsity assumptions on the underlying DAG, only O(log n) interventions and O~(n) samples/intervention is sufficient. This is joint work with Jayadev Acharya, Constantinos Daskalakis and Saravanan Kandasamy. For problem (ii), we consider the setting where there are two variables, and the goal is to learn whether X causes Y, Y causes X, or there is a hidden variable confounding the two. Under natural assumptions, we obtain a nearly tight characterization of the sample complexity that is sublinear in k. Moreover, there is a tradeoff between the number of observational samples and interventional samples. This is joint work with Jayadev Acharya, Sourbh Bhadane, Saravanan Kandasamy, and Ziteng Sun.

We consider the problem of counterfactual inference in sequentially designed experiments wherein a collection of units undergo a sequence of interventions based on policies adaptive over time, and outcomes are observed based on the assigned interventions. Our goal is counterfactual inference, i.e., estimate what would have happened if alternate policies were used, a problem that is inherently challenging due to the heterogeneity in the outcomes across users and time. In this work, we identify structural assumptions that allow us to impute the missing potential outcomes in sequential experiments, where the policy is allowed to adapt simultaneously to all users' past data. We prove that under suitable assumptions on the latent factors and temporal dynamics, a variant of the nearest neighbor strategy allows us to impute the missing information using the observed outcome across time and users. Under mild assumptions on the adaptive policy and the underlying latent factor model, we prove that using data till time t for N users in the study, our estimate for the missing potential outcome at time t+1 admits a mean squared-error that scales as t^{-1/2+\delta} + N^{-1+\delta} for any \delta>0, for any fixed user. We also provide an asymptotic confidence interval for each outcome under suitable growth conditions on N and t, which can then be used to build confidence intervals for individual treatment effects. Our work extends the recent literature on inference with adaptively collected data by allowing for policies that pool across users, the matrix completion literature for missing at random settings by allowing for adaptive sampling mechanisms, and missing data problems in multivariate time series by allowing for a generic non-parametric model.

The design of experiments involves an inescapable compromise between covariate balance and robustness. In this talk, we describe a formalization of this trade-off and introduce a new style of experimental design that allows experimenters to navigate it. The design is specified by a robustness parameter that bounds the worst-case mean squared error of an estimator of the average treatment effect. Subject to the experimenter’s desired level of robustness, the design aims to simultaneously balance all linear functions of potentially many covariates. The achieved level of balance is better than previously known possible, considerably better than what a fully random assignment would produce, and close to optimal given the desired level of robustness. We show that the mean squared error of the estimator is bounded by the minimum of the loss function of an implicit ridge regression of the potential outcomes on the covariates. The estimator does not itself conduct covariate adjustment, so one can interpret the approach as regression adjustment by design. Finally, we provide non-asymptotic tail bounds for the estimator, which facilitate the construction of conservative confidence intervals.

Elias Bareinboim (Columbia University), Frederick Eberhardt (Caltech), Kun Zhang (Carnegie Mellon University), and Uri Shalit (Technion - Israel Institute of Technology)

Elias Bareinboim (Columbia University), Frederick Eberhardt (Caltech), Kun Zhang (Carnegie Mellon University), and Uri Shalit (Technion - Israel Institute of Technology)