Format results
- Eric Price (University of Texas, Austin)
Photon Emission from Circular Equatorial Orbiters around Kerr Black Holes
Delilah Gates Princeton University
Directed Graphical Models for Extreme Value Statistics
Ngoc Tran (University of Texas, Austin)Non-local quantum computation and holography
Alex May Perimeter Institute
Quantum spacetime and deformed relativistic kinematics
Javier Relancio Universidad de Zaragoza
The Fully Constrained Formulation: local uniqueness and numerical accuracy
Isabel Cordero Carrion University of Valencia
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension
Wei Hu (Princeton University)Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
Nadav Cohen (Tel-Aviv University)Recent Progress in Algorithmic Robust Statistics via the Sum-of-Squares Method
Pravesh Kothari (CMU)
Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu
Eric Price (University of Texas, Austin)The classical Chow-Liu algorithm estimates a tree-structured graphical model of a distribution using the maximum spanning tree on the pairwise mutual information graph. We show that, if the true distribution P is tree-structured over a discrete domain Σ^n, then the Chow-Liu algorithm returns a distribution Q with D(P||Q) < ε after O~(Σ^3 n / ε) samples, which is nearly optimal in n and ε. Our analysis of Chow-Liu is based on a new result in conditional independence testing: we prove that for three random variables X,Y,Z each over Σ, testing if I(X;Y∣Z) is 0 or ≥ε is possible with O˜(Σ^3/ε) samples using the empirical mutual information.Photon Emission from Circular Equatorial Orbiters around Kerr Black Holes
Delilah Gates Princeton University
We consider monochromatic and isotropic photon emission from circular equatorial Kerr orbiters. We calculate the critical curve delineating the region of photon escape from that of photon capture in each emitter’s sky, allowing us to derive analytic expressions for the photon escape probability and the redshift-dependent total flux collected on the celestial sphere as a function of emission radius and black hole parameters. This critical curve generalizes to finite orbital radius the usual Kerr critical curve and displays interesting features in the limit of high spin. These results confirm that the near-horizon geometry of a high-spin black hole is in principle observable.
Learning Some Ill-Conditioned Gaussian Graphical Models
Raghu Meka (UCLA)Gaussian Graphical models have wide-ranging applications in machine learning and the natural and social sciences where they are one of the most popular ways to model statistical relationships between observed variables. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and the goal is to learn the model assuming the underlying model is sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first fixed polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains.Directed Graphical Models for Extreme Value Statistics
Ngoc Tran (University of Texas, Austin)Extreme value statistics concerns the maxima of random variables and relations between the tails of distributions rather than averages and correlations. The vast majority of models are centered around {max-stable distributions}, the Gaussian analogs for extremes. However, max-stable multivariate have an intractable likelihood, severely limiting statistical learning and inference. Directed graphical models for extreme values (aka max-linear Bayesian networks) have only appeared in 2018, but have seen many applications in finance, hydrology and extreme risks modelling. This talk (1) highlights how they differ from usual Bayesian networks, (2) discusses their connections to tropical convex geometry and k-SAT, (3) shows performances of current learning algorithms on various hydrology datasets, and (4) outlines major computational and statistical challenges in fitting such models to data.Non-local quantum computation and holography
Alex May Perimeter Institute
Relativistic quantum tasks are quantum computations which have inputs and outputs that occur at designated spacetime locations.
Understanding which tasks are possible to complete, and what resources are required to complete them, captures spacetime-specific aspects of quantum information. In this talk we explore the connections between such tasks and quantum gravity, specifically in the context of the AdS/CFT correspondence. We find that tasks reveal a novel connection between causal features of bulk geometry and boundary entanglement.
Further, we find that AdS/CFT suggests quantum non-local computations, a specific task with relevance to position-based cryptography, can be performed with linear entanglement. This would be an exponential improvement on existing protocols.
Quantum spacetime and deformed relativistic kinematics
Javier Relancio Universidad de Zaragoza
In this seminar, I will consider a deformed kinematics that goes beyond special relativity as a way to account for possible low-energy effects of a quantum gravity theory that could lead to some experimental evidences. This can be done while keeping a relativity principle, an approach which is usually known as doubly (or deformed) special relativity. In this context, I will give a simple geometric interpretation of the deformed kinematics and explain how it can be related to a metric in maximally symmetric curved momentum space. Moreover, this metric can be extended to the whole phase space, leading to a notion of spacetime. Also, this geometrical formalism can be generalized in order to take into account a space-time curvature, leading to a momentum deformation of general relativity. I will explain theoretical aspects and possible phenomenological consequences of such deformation.
Learning Deep ReLU Networks is Fixed-Parameter Tractable
Sitan Chen (MIT)We consider the problem of learning an unknown ReLU network with an arbitrary number of layers under Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, while prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry. Based on joint work with Adam Klivans and Raghu Meka.The Fully Constrained Formulation: local uniqueness and numerical accuracy
Isabel Cordero Carrion University of Valencia
In this talk I will introduce the Fully Constrained Formulation (FCF) of General Relativity. In this formulation one has a hyperbolic sector and an elliptic one. The constraint equations are solved in each time step and are encoded in the elliptic sector; this set of equations have to be solved to compute initial data even if a free evolution scheme is used for a posterior dynamical evolution. Other formulations (like the XCTS formulation) share a similar elliptic sector. I will comment about the local uniqueness issue of the elliptic sector in the FCF. I will also described briefly the hyperbolic sector. I will finish with some recent reformulation of the equations which keeps the good properties of the local uniqueness, improves the numerical accuracy of the system and gives some additional information.
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension
Wei Hu (Princeton University)Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions in high dimension, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) at initialization and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training. Link to paper: https://arxiv.org/abs/2006.14599Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
Nadav Cohen (Tel-Aviv University)Understanding deep learning calls for addressing the questions of: (i) optimization --- the effectiveness of simple gradient-based algorithms in solving neural network training programs that are non-convex and thus seemingly difficult; and (ii) generalization --- the phenomenon of deep learning models not overfitting despite having many more parameters than examples to learn from. Existing analyses of optimization and/or generalization typically adopt the language of classical learning theory, abstracting away many details on the setting at hand. In this talk I will argue that a more refined perspective is in order, one that accounts for the dynamics of the optimizer. I will then demonstrate a manifestation of this approach, analyzing the dynamics of gradient descent over linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, adding (redundant) linear layers to a classic linear model significantly accelerates gradient descent, despite the introduction of non-convexity. Finally, we will show that such addition of layers induces an implicit bias towards low rank (different from any type of norm regularization), and by this explain generalization of deep linear neural networks for the classic problem of low rank matrix completion. Works covered in this talk were in collaboration with Sanjeev Arora, Noah Golowich, Elad Hazan, Wei Hu, Yuping Luo and Noam Razin.SGD Learns One-Layer Networks in WGANs
Qi Lei (Princeton University)Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a min-max optimization problem to global optimality but are in practice successfully trained using stochastic gradient descent-ascent. In this talk, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution with polynomial time and sample complexity.Recent Progress in Algorithmic Robust Statistics via the Sum-of-Squares Method
Pravesh Kothari (CMU)Past five years have witnessed a sequence of successes in designing efficient algorithms for statistical estimation tasks when the input data is corrupted with a constant fraction of fully malicious outliers. The Sum-of-Squares (SoS) method has been an integral part of this story and is behind robust learning algorithms for tasks such as estimating the mean, covariance, and higher moment tensors of a broad class of distributions, clustering and parameter estimation for spherical and non-spherical mixture models, linear regression, and list-decodable learning. In this talk, I will attempt to demystify this (unreasonable?) effectiveness of the SoS method in robust statistics. I will argue that the utility of the SoS algorithm in robust statistics can be directly attributed to its capacity (via low-degree SoS proofs) to "reason about" analytic properties of probability distributions such as sub-gaussianity, hypercontractivity, and anti-concentration. I will discuss precise formulations of such statements, show how they lead to a principled blueprint for problems in robust statistics including the applications mentioned above, and point out natural gaps in our understanding of analytic properties within SoS, which, if resolved would yield improved guarantees for basic tasks in robust statistics.