Format results
- Ngoc Tran (University of Texas, Austin)
Non-local quantum computation and holography
Alex May Perimeter Institute
Quantum spacetime and deformed relativistic kinematics
Javier Relancio Universidad de Zaragoza
The Fully Constrained Formulation: local uniqueness and numerical accuracy
Isabel Cordero Carrion University of Valencia
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension
Wei Hu (Princeton University)Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
Nadav Cohen (Tel-Aviv University)Recent Progress in Algorithmic Robust Statistics via the Sum-of-Squares Method
Pravesh Kothari (CMU)Phase Transitions for Detecting Latent Geometry in Random Graphs
Dheeraj Nagaraj (MIT)Low-Degree Hardness of Random Optimization Problems
Alex Wein (New York University)A voyage through undulating dark matter and the GUTs of u(48)
Joseph Tooby-Smith University of Cambridge
Directed Graphical Models for Extreme Value Statistics
Ngoc Tran (University of Texas, Austin)Extreme value statistics concerns the maxima of random variables and relations between the tails of distributions rather than averages and correlations. The vast majority of models are centered around {max-stable distributions}, the Gaussian analogs for extremes. However, max-stable multivariate have an intractable likelihood, severely limiting statistical learning and inference. Directed graphical models for extreme values (aka max-linear Bayesian networks) have only appeared in 2018, but have seen many applications in finance, hydrology and extreme risks modelling. This talk (1) highlights how they differ from usual Bayesian networks, (2) discusses their connections to tropical convex geometry and k-SAT, (3) shows performances of current learning algorithms on various hydrology datasets, and (4) outlines major computational and statistical challenges in fitting such models to data.Non-local quantum computation and holography
Alex May Perimeter Institute
Relativistic quantum tasks are quantum computations which have inputs and outputs that occur at designated spacetime locations.
Understanding which tasks are possible to complete, and what resources are required to complete them, captures spacetime-specific aspects of quantum information. In this talk we explore the connections between such tasks and quantum gravity, specifically in the context of the AdS/CFT correspondence. We find that tasks reveal a novel connection between causal features of bulk geometry and boundary entanglement.
Further, we find that AdS/CFT suggests quantum non-local computations, a specific task with relevance to position-based cryptography, can be performed with linear entanglement. This would be an exponential improvement on existing protocols.
Quantum spacetime and deformed relativistic kinematics
Javier Relancio Universidad de Zaragoza
In this seminar, I will consider a deformed kinematics that goes beyond special relativity as a way to account for possible low-energy effects of a quantum gravity theory that could lead to some experimental evidences. This can be done while keeping a relativity principle, an approach which is usually known as doubly (or deformed) special relativity. In this context, I will give a simple geometric interpretation of the deformed kinematics and explain how it can be related to a metric in maximally symmetric curved momentum space. Moreover, this metric can be extended to the whole phase space, leading to a notion of spacetime. Also, this geometrical formalism can be generalized in order to take into account a space-time curvature, leading to a momentum deformation of general relativity. I will explain theoretical aspects and possible phenomenological consequences of such deformation.
Learning Deep ReLU Networks is Fixed-Parameter Tractable
Sitan Chen (MIT)We consider the problem of learning an unknown ReLU network with an arbitrary number of layers under Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, while prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry. Based on joint work with Adam Klivans and Raghu Meka.The Fully Constrained Formulation: local uniqueness and numerical accuracy
Isabel Cordero Carrion University of Valencia
In this talk I will introduce the Fully Constrained Formulation (FCF) of General Relativity. In this formulation one has a hyperbolic sector and an elliptic one. The constraint equations are solved in each time step and are encoded in the elliptic sector; this set of equations have to be solved to compute initial data even if a free evolution scheme is used for a posterior dynamical evolution. Other formulations (like the XCTS formulation) share a similar elliptic sector. I will comment about the local uniqueness issue of the elliptic sector in the FCF. I will also described briefly the hyperbolic sector. I will finish with some recent reformulation of the equations which keeps the good properties of the local uniqueness, improves the numerical accuracy of the system and gives some additional information.
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension
Wei Hu (Princeton University)Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions in high dimension, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) at initialization and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training. Link to paper: https://arxiv.org/abs/2006.14599Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
Nadav Cohen (Tel-Aviv University)Understanding deep learning calls for addressing the questions of: (i) optimization --- the effectiveness of simple gradient-based algorithms in solving neural network training programs that are non-convex and thus seemingly difficult; and (ii) generalization --- the phenomenon of deep learning models not overfitting despite having many more parameters than examples to learn from. Existing analyses of optimization and/or generalization typically adopt the language of classical learning theory, abstracting away many details on the setting at hand. In this talk I will argue that a more refined perspective is in order, one that accounts for the dynamics of the optimizer. I will then demonstrate a manifestation of this approach, analyzing the dynamics of gradient descent over linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, adding (redundant) linear layers to a classic linear model significantly accelerates gradient descent, despite the introduction of non-convexity. Finally, we will show that such addition of layers induces an implicit bias towards low rank (different from any type of norm regularization), and by this explain generalization of deep linear neural networks for the classic problem of low rank matrix completion. Works covered in this talk were in collaboration with Sanjeev Arora, Noah Golowich, Elad Hazan, Wei Hu, Yuping Luo and Noam Razin.SGD Learns One-Layer Networks in WGANs
Qi Lei (Princeton University)Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a min-max optimization problem to global optimality but are in practice successfully trained using stochastic gradient descent-ascent. In this talk, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution with polynomial time and sample complexity.Recent Progress in Algorithmic Robust Statistics via the Sum-of-Squares Method
Pravesh Kothari (CMU)Past five years have witnessed a sequence of successes in designing efficient algorithms for statistical estimation tasks when the input data is corrupted with a constant fraction of fully malicious outliers. The Sum-of-Squares (SoS) method has been an integral part of this story and is behind robust learning algorithms for tasks such as estimating the mean, covariance, and higher moment tensors of a broad class of distributions, clustering and parameter estimation for spherical and non-spherical mixture models, linear regression, and list-decodable learning. In this talk, I will attempt to demystify this (unreasonable?) effectiveness of the SoS method in robust statistics. I will argue that the utility of the SoS algorithm in robust statistics can be directly attributed to its capacity (via low-degree SoS proofs) to "reason about" analytic properties of probability distributions such as sub-gaussianity, hypercontractivity, and anti-concentration. I will discuss precise formulations of such statements, show how they lead to a principled blueprint for problems in robust statistics including the applications mentioned above, and point out natural gaps in our understanding of analytic properties within SoS, which, if resolved would yield improved guarantees for basic tasks in robust statistics.Phase Transitions for Detecting Latent Geometry in Random Graphs
Dheeraj Nagaraj (MIT)Random graphs with latent geometric structure are popular models of social and biological networks, with applications ranging from network user profiling to circuit design. These graphs are also of purely theoretical interest within computer science, probability and statistics. A fundamental initial question regarding these models is: when are these random graphs affected by their latent geometry and when are they indistinguishable from simpler models without latent structure, such as the Erdős-Rényi graph G(n,p)? We address this question for two of the most well-studied models of random graphs with latent geometry -- the random intersection and random geometric graph. Joint work with Matt Brennan and Dheeraj Nagaraj.Low-Degree Hardness of Random Optimization Problems
Alex Wein (New York University)In high-dimensional statistical problems (including planted clique, sparse PCA, community detection, etc.), the class of "low-degree polynomial algorithms" captures many leading algorithmic paradigms such as spectral methods, approximate message passing, and local algorithms on sparse graphs. As such, lower bounds against low-degree algorithms constitute concrete evidence for average-case hardness of statistical problems. This method has been widely successful at explaining and predicting statistical-to-computational gaps in these settings. While prior work has understood the power of low-degree algorithms for problems with a "planted" signal, we consider here the setting of "random optimization problems" (with no planted signal), including the problem of finding a large independent set in a random graph, as well as the problem of optimizing the Hamiltonian of mean-field spin glass models. Focusing on the independent set problem, I will define low-degree algorithms in this setting, argue that they capture the best known algorithms, and explain new proof techniques that give sharp lower bounds against low-degree algorithms in this setting. The proof involves a generalization of the so-called "overlap gap property", which is a structural property of the solution space. Based on arXiv:2004.12063 (joint with David Gamarnik and Aukosh Jagannath) and arXiv:2010.06563A voyage through undulating dark matter and the GUTs of u(48)
Joseph Tooby-Smith University of Cambridge
This talk will be split into two distinct halves: The first half will be based on the paper arxiv:2007.03662 and suggest that an interplay between microscopic and macroscopic physics can lead to an undulation on time scales not related to celestial dynamics. By searching for such undulations, the discovery potential of light DM search experiments can be enhanced.
The second half will look at some currently unpublished work into finding all the semi-simple subalgebras of u(48) which contain the SM. Such algebras (in a loose sense of the term) form GUTs and studying them has relevance to family unification, proton decay etc. Although there has been previous work into the classification of GUTs, we believe this is the first this broad question has been answered.