RCTs are potentially useful in many ways other than standard confirmatory intent to treat (ITT) analyses, but to succeed difficult problems must be overcome.I will discuss some or (time-permitting) all of the following problems :
1. The problem of transportability of the trial results to other populations: I will explain why transportability is much more difficult in trials comparing longitudinal dynamic treatment regimes rather than in simple point treatment trials.
2. The problematic use of RCT data in micro-simulation models used in cost-benefit analyses
3.The problem of combining data from large, often confounded, administrative or electronic medical records , with data from smaller underpowered randomized trials in estimating individualized treatment strategies.
4. The problem of using the results of RCTs to benchmark the ability of observational analyses to 'get it right', with the goal of providing evidence that causal analyses of observational data are sufficiently reliable to contribute to decision making
5.The problem noncompliance with assigned protocol in trials in which the per-protocol effect rather than the ITT effect is of substantive importance .
6. The problem of leveraging the prior knowledge that diagnostic tests have "no direct effect on the outcome except through the treatment delivered" to greatly increase the power of trials designed to estimate the cost vs benefit of competing testing strategies.

Since their introduction in Goodfellow et al. (2014) as sampling algorithms, Generative Adversarial Networks (GANs) have evolved to produce remarkable results in several tasks e.g. image generation, text-to-image translation, etc. Statistically, a GAN may be viewed as a density estimate constructed by optimizing over an Integral Probability Metric (IPM) encoded by its discriminator. I will present our work on estimating a nonparametric density under IPMs defined by Besov spaces. Such IPMs are a rich class of losses and include, e.g., Lp distances, the total variation distance, and generalizations of both the Wasserstein and the Kolmogorov-Smirnov distances. Our results generalize, unify, or improve several results, both recent and classical. Consequently, we imply bounds on the statistical error of a GAN, showing that GANs are minimax optimal and in some cases, strictly outperform the best linear estimator (e.g. the empirical estimator, kernel density estimator).
Further, we study the above framework of nonparametric density estimation under the Huber contamination model, in which a proportion of the data comes from an unknown outlier distribution. We provide a minimax optimal estimator that adapts to both an unknown contamination proportion and the unknown smoothness of the true density. We use this to imply that certain GAN architectures are robustly minimax optimal.

The talk will survey recent progress on the computational complexity of binary classification in the presence of benign label noise. In particular, we will will give an overview of the key ideas behind known algorithms and computational hardness results for learning halfspaces (and other concept classes) in both the distribution free and the distribution specific PAC model.

We study sublinear and local computation algorithms for decision trees, focusing on the related tasks of testing and reconstruction. In testing, we would like to determine if an unknown function $f$ is close to a size-$s$ decision tree. In reconstruction, we are given query access to a function $f$ that is promised to be close to a size-$s$ decision tree, and we would like to provide "on-the-fly" query access to a decision tree, ideally of size not much larger than $s$, that is close to $f$.
We give the first reconstruction algorithm for decision trees, and from it we derive a new tester for decision trees. By known relationships, our results yield reconstruction algorithms for numerous other boolean function properties---Fourier degree, randomized and quantum query complexities, certificate complexity, sensitivity, etc.---which in turn yield new testers for these properties.
Joint work with Guy Blanc (Stanford) and Jane Lange (MIT).

Let H be a concept class. We prove that H can be PAC-learned by an (approximate) differentially-private algorithm if and only if it has a finite Littlestone dimension. This implies an equivalence between online learnability and private PAC learnability.

We present a new technique for learning latent variable models with many parameters. The basic idea is to use the method of moments to find a large vector space of polynomials which nearly vanish on all of the parameters of the problem and to use this along with a new structural result to find a small cover of the set of relevant parameters after which exhaustive algorithms can be applied. Using this technique we develop substantially improved algorithms for learning mixtures of spherical Gaussians, mixtures of linear regressions and non-negative linear combinations of ReLUs.

We study the identity testing problem in the context of spin systems where it takes the following form: given the parameter specification of the model M and a sampling oracle for the distribution μ of an unknown model M*, can we efficiently determine if the two models M and M* are the same? We establish hardness results for this problem in two prototypical cases: the q-state ferromagnetic Potts model and Restricted Boltzmann Machines (RBMs). Daskalakis, Dikkala and Kamath (2018) presented a polynomial-time algorithm for identity testing for the ferromagnetic Ising model, which corresponds to the special case where there are only two spins (q=2). In this talk, I will present recent, complementary hardness results for the Potts model when q > 2 and for RBMs (i.e., mixed Ising models on bipartite graphs). Our proofs lay out a methodology to reduce approximate counting to identity testing, utilizing the phase transitions exhibited by the models and using random graphs as gadgets in the reduction.
Based on joint work with Zongchen Chen, Daniel Stefankovic, and Eric Vigoda.

The classical Chow-Liu algorithm estimates a tree-structured graphical model of a distribution using the maximum spanning tree on the pairwise mutual information graph. We show that, if the true distribution P is tree-structured over a discrete domain Σ^n, then the Chow-Liu algorithm returns a distribution Q with D(P||Q) < ε after O~(Σ^3 n / ε) samples, which is nearly optimal in n and ε.
Our analysis of Chow-Liu is based on a new result in conditional independence testing: we prove that for three random variables X,Y,Z each over Σ, testing if I(X;Y∣Z) is 0 or ≥ε is possible with O˜(Σ^3/ε) samples using the empirical mutual information.

Gaussian Graphical models have wide-ranging applications in machine learning and the natural and social sciences where they are one of the most popular ways to model statistical relationships between observed variables. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and the goal is to learn the model assuming the underlying model is sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first fixed polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains.

Extreme value statistics concerns the maxima of random variables and relations between the tails of distributions rather than averages and correlations. The vast majority of models are centered around {max-stable distributions}, the Gaussian analogs for extremes. However, max-stable multivariate have an intractable likelihood, severely limiting statistical learning and inference. Directed graphical models for extreme values (aka max-linear Bayesian networks) have only appeared in 2018, but have seen many applications in finance, hydrology and extreme risks modelling. This talk (1) highlights how they differ from usual Bayesian networks, (2) discusses their connections to tropical convex geometry and k-SAT, (3) shows performances of current learning algorithms on various hydrology datasets, and (4) outlines major computational and statistical challenges in fitting such models to data.