Format results
The Fully Constrained Formulation: local uniqueness and numerical accuracy
Isabel Cordero Carrion University of Valencia
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension
Wei Hu (Princeton University)Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
Nadav Cohen (Tel-Aviv University)Recent Progress in Algorithmic Robust Statistics via the Sum-of-Squares Method
Pravesh Kothari (CMU)Phase Transitions for Detecting Latent Geometry in Random Graphs
Dheeraj Nagaraj (MIT)Low-Degree Hardness of Random Optimization Problems
Alex Wein (New York University)A voyage through undulating dark matter and the GUTs of u(48)
Joseph Tooby-Smith University of Cambridge
Generalized entropy in topological string theory
Gabriel Wong Harvard University
Learning Deep ReLU Networks is Fixed-Parameter Tractable
Sitan Chen (MIT)We consider the problem of learning an unknown ReLU network with an arbitrary number of layers under Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, while prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry. Based on joint work with Adam Klivans and Raghu Meka.The Fully Constrained Formulation: local uniqueness and numerical accuracy
Isabel Cordero Carrion University of Valencia
In this talk I will introduce the Fully Constrained Formulation (FCF) of General Relativity. In this formulation one has a hyperbolic sector and an elliptic one. The constraint equations are solved in each time step and are encoded in the elliptic sector; this set of equations have to be solved to compute initial data even if a free evolution scheme is used for a posterior dynamical evolution. Other formulations (like the XCTS formulation) share a similar elliptic sector. I will comment about the local uniqueness issue of the elliptic sector in the FCF. I will also described briefly the hyperbolic sector. I will finish with some recent reformulation of the equations which keeps the good properties of the local uniqueness, improves the numerical accuracy of the system and gives some additional information.
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension
Wei Hu (Princeton University)Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions in high dimension, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) at initialization and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training. Link to paper: https://arxiv.org/abs/2006.14599Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
Nadav Cohen (Tel-Aviv University)Understanding deep learning calls for addressing the questions of: (i) optimization --- the effectiveness of simple gradient-based algorithms in solving neural network training programs that are non-convex and thus seemingly difficult; and (ii) generalization --- the phenomenon of deep learning models not overfitting despite having many more parameters than examples to learn from. Existing analyses of optimization and/or generalization typically adopt the language of classical learning theory, abstracting away many details on the setting at hand. In this talk I will argue that a more refined perspective is in order, one that accounts for the dynamics of the optimizer. I will then demonstrate a manifestation of this approach, analyzing the dynamics of gradient descent over linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, adding (redundant) linear layers to a classic linear model significantly accelerates gradient descent, despite the introduction of non-convexity. Finally, we will show that such addition of layers induces an implicit bias towards low rank (different from any type of norm regularization), and by this explain generalization of deep linear neural networks for the classic problem of low rank matrix completion. Works covered in this talk were in collaboration with Sanjeev Arora, Noah Golowich, Elad Hazan, Wei Hu, Yuping Luo and Noam Razin.SGD Learns One-Layer Networks in WGANs
Qi Lei (Princeton University)Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a min-max optimization problem to global optimality but are in practice successfully trained using stochastic gradient descent-ascent. In this talk, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution with polynomial time and sample complexity.Recent Progress in Algorithmic Robust Statistics via the Sum-of-Squares Method
Pravesh Kothari (CMU)Past five years have witnessed a sequence of successes in designing efficient algorithms for statistical estimation tasks when the input data is corrupted with a constant fraction of fully malicious outliers. The Sum-of-Squares (SoS) method has been an integral part of this story and is behind robust learning algorithms for tasks such as estimating the mean, covariance, and higher moment tensors of a broad class of distributions, clustering and parameter estimation for spherical and non-spherical mixture models, linear regression, and list-decodable learning. In this talk, I will attempt to demystify this (unreasonable?) effectiveness of the SoS method in robust statistics. I will argue that the utility of the SoS algorithm in robust statistics can be directly attributed to its capacity (via low-degree SoS proofs) to "reason about" analytic properties of probability distributions such as sub-gaussianity, hypercontractivity, and anti-concentration. I will discuss precise formulations of such statements, show how they lead to a principled blueprint for problems in robust statistics including the applications mentioned above, and point out natural gaps in our understanding of analytic properties within SoS, which, if resolved would yield improved guarantees for basic tasks in robust statistics.Phase Transitions for Detecting Latent Geometry in Random Graphs
Dheeraj Nagaraj (MIT)Random graphs with latent geometric structure are popular models of social and biological networks, with applications ranging from network user profiling to circuit design. These graphs are also of purely theoretical interest within computer science, probability and statistics. A fundamental initial question regarding these models is: when are these random graphs affected by their latent geometry and when are they indistinguishable from simpler models without latent structure, such as the Erdős-Rényi graph G(n,p)? We address this question for two of the most well-studied models of random graphs with latent geometry -- the random intersection and random geometric graph. Joint work with Matt Brennan and Dheeraj Nagaraj.Low-Degree Hardness of Random Optimization Problems
Alex Wein (New York University)In high-dimensional statistical problems (including planted clique, sparse PCA, community detection, etc.), the class of "low-degree polynomial algorithms" captures many leading algorithmic paradigms such as spectral methods, approximate message passing, and local algorithms on sparse graphs. As such, lower bounds against low-degree algorithms constitute concrete evidence for average-case hardness of statistical problems. This method has been widely successful at explaining and predicting statistical-to-computational gaps in these settings. While prior work has understood the power of low-degree algorithms for problems with a "planted" signal, we consider here the setting of "random optimization problems" (with no planted signal), including the problem of finding a large independent set in a random graph, as well as the problem of optimizing the Hamiltonian of mean-field spin glass models. Focusing on the independent set problem, I will define low-degree algorithms in this setting, argue that they capture the best known algorithms, and explain new proof techniques that give sharp lower bounds against low-degree algorithms in this setting. The proof involves a generalization of the so-called "overlap gap property", which is a structural property of the solution space. Based on arXiv:2004.12063 (joint with David Gamarnik and Aukosh Jagannath) and arXiv:2010.06563A voyage through undulating dark matter and the GUTs of u(48)
Joseph Tooby-Smith University of Cambridge
This talk will be split into two distinct halves: The first half will be based on the paper arxiv:2007.03662 and suggest that an interplay between microscopic and macroscopic physics can lead to an undulation on time scales not related to celestial dynamics. By searching for such undulations, the discovery potential of light DM search experiments can be enhanced.
The second half will look at some currently unpublished work into finding all the semi-simple subalgebras of u(48) which contain the SM. Such algebras (in a loose sense of the term) form GUTs and studying them has relevance to family unification, proton decay etc. Although there has been previous work into the classification of GUTs, we believe this is the first this broad question has been answered.Counting and Sampling Subgraphs in Sublinear Time
Talya Eden (MIT)In this talk I will shortly survey recent developments in approximate subgraph counting and sampling in sublinear-time. Both counting and sampling small subgraphs is a basic primitive, well studied both in theory and in practice. We consider these problems in the sublinear-time setting, where access to the graph $G$ is given via queries. We will consider both general graphs, and graphs of bounded arboricity which can be viewed as ``sparse everywhere" graphs, and we will see how we can use this property to obtain substantially faster algorithms.Learning and Testing for Gradient Descent
Emmanuel Abbe (EPFL)We present lower-bounds for the generalization error of gradient descent on free initializations, reducing the problem to testing the algorithm’s output under different data models. We then discuss lower-bounds on random initialization and present the problem of learning communities in the pruned-block-model, where it is conjectured that GD fails.Generalized entropy in topological string theory
Gabriel Wong Harvard University
The Ryu Takayanagi formula identifies the area of extremal surfaces in AdS with the entanglement entropy of the boundary CFT. However the bulk microstate interpretation of the extremal area remains mysterious. Progress along this direction requires understanding how to define entanglement entropy in the bulk closed string theory. As a toy model for AdS/CFT, we study the entanglement entropy of closed strings in the topological A model in the context of Gopakumar Vafa duality. We give a self consistent factorization of the closed string Hilbert space which leads to string edge modes transforming under a q-deformed surface symmetry group. Compatibility with this symmetry requires a q-deformed definition of entanglement entropy. Using the topological vertex formalism, we define the Hartle Hawking state for the resolved conifold and compute its q-deformed entropy directly from the closed string reduced density matrix. We show that this is the same as the generalized entropy, defined by prescribing a contractible replica manifold for the closed string theory on the resolved conifold. We then apply the Gopakumar Vafa duality to reproduce the closed string entropy from Chern Simons dual using the un-deformed definition of entanglement entropy. Finally we relate non local aspects of our factorization map to analogous phenomenon recently found in JT gravity.