Format results
- Emmanuel Abbe (École polytechnique fédérale de Lausanne), Enric Boix (MIT), Theodor Misiakiewicz (Stanford University)
Large Scale Structure Beyond the 2-Point Function
Oliver Philcox Columbia University
Sharp Matrix Concentration
March Boedihardjo (University of California, Irvine)Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent
Raphaël Berthier (École polytechnique fédérale de Lausanne)Self-Training Converts Weak Learners to Strong Learners in Mixture Models
Spencer Frei (UC Berkeley)Graphon Neural Networks and the Transferability of Graph Neural Networks
Luiz Chamon (UC Berkeley)Adaptive Wavelet Distillation from DNNs and Dictionary Learning
Bin Yu (UC Berkeley)Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting
Frederic Koehler (Simons Institute)Realizing a dynamical topological phase without symmetry protection in trapped ions
Andrew Potter University of British Columbia
Probing topological invariants from a ground state wave function
Ze-Pei Cian University of New Mexico
Equivariant Machine Learning Structured Like Classical Physics
Soledad Villar (Johns Hopkins)
Learning Staircases
Emmanuel Abbe (École polytechnique fédérale de Lausanne), Enric Boix (MIT), Theodor Misiakiewicz (Stanford University)It is known that arbitrary poly-size neural networks trained by GD/SGD can learn in class in SQ/PAC. This is however not expected to hold for more regular architectures and initializations. Recently, the staircase property emerged as a condition that seems both necessary and sufficient for certain regular networks to learn with high accuracy, with the positive result established for sparse homogeneous initializations. In this talk, we show that standard two-layer architectures can also learn staircases with features being learned over time. It is also shown that kernels cannot learn staircases of growing degree. Joint work with Enric Boix-Adsera and Theodor Misiakiewicz.Large Scale Structure Beyond the 2-Point Function
Oliver Philcox Columbia University
Quantum fluctuations in inflation provide the seeds for the large scale distribution of matter today. According to the standard paradigm, these fluctuations induce density perturbations that are adiabatic and Gaussian distributed. In this limit, all the information is contained within the two-point correlation function, or equivalently, the power spectrum. Today, the distribution of matter is far from Gaussian, with structures forming across a vast range of scales. Despite this, almost all analyses of observational data are performed using two-point functions. This begs the question: what information lies in higher-point statistics?
In this seminar, I will present a pedagogical overview of the non-Gaussian correlation functions, and demonstrate how they can be used both to sharpen constraints on known physical parameters, and to provide stringent tests of new physics occurring in the early Universe. One of the major barriers to constraining cosmology from the higher-point functions is computational: measuring the statistics with conventional techniques is infeasible for current and future datasets. I will discuss new methods capable of reducing the computational cost by orders of magnitude, and show how this facilitates a number of exciting new tests of the cosmological model.
Sharp Matrix Concentration
March Boedihardjo (University of California, Irvine)Classical matrix concentration inequalities are sharp up to a logarithmic factor. This logarithmic factor is necessary in the commutative case but unnecessary in many classical noncommutative cases. We will present some matrix concentration results that are sharp in many cases, where we overcome this logarithmic factor by using an easily computable quantity that captures noncommutativity. Joint work with Afonso Bandeira and Ramon van Handel. Paper: https://arxiv.org/abs/2108.06312Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent
Raphaël Berthier (École polytechnique fédérale de Lausanne)Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.Quantum Scientific Computation
Jin-Peng Liu University of New Mexico
Quantum computers are expected to dramatically outperform classical computers for certain computational problems. While there has been extensive previous work for linear dynamics and discrete models, for more complex realistic problems arising in physical and social science, engineering, and medicine, the capability of quantum computing is far from well understood. One fundamental challenge is the substantial difference between the linear dynamics of a system of qubits and real-world systems with continuum, stochastic, and nonlinear behaviors. Utilizing advanced linear algebra techniques and nonlinear analysis, I attempt to build a bridge between classical and quantum mechanics, understand and optimize the power of quantum computation, and discover new quantum speedups over classical algorithms with provable guarantees. In this talk, I would like to cover quantum algorithms for scientific computational problems, including topics such as linear, nonlinear, and stochastic differential equations, with applications in areas such as quantum dynamics, biology and epidemiology, fluid dynamics, and finance.
Reference:
Quantum spectral methods for differential equations, Communications in Mathematical Physics 375, 1427-1457 (2020), https://arxiv.org/abs/1901.00961
High-precision quantum algorithms for partial differential equations, Quantum 5, 574 (2021), https://arxiv.org/abs/2002.07868
Efficient quantum algorithm for dissipative nonlinear differential equations, Proceedings of the National Academy of Sciences 118, 35 (2021), https://arxiv.org/abs/2011.03185
Quantum-accelerated multilevel Monte Carlo methods for stochastic differential equations in mathematical finance, Quantum 5, 481 (2021), https://arxiv.org/abs/2012.06283Self-Training Converts Weak Learners to Strong Learners in Mixture Models
Spencer Frei (UC Berkeley)We consider a binary classification problem where the data comes from a mixture of two rotationally symmetric log-concave distributions. We analyze the dynamics of a nonconvex gradient-based self-training algorithm that utilizes a pseudolabeler to generate labels for unlabeled data and updates the pseudolabeler based on a "supervised" loss that treats the pseoudolabels as if they were ground truth-labels. We show that provided the initial pseudolabeler has classification error smaller than some absolute constant, then self-training produces classifiers with error at most epsilon greater than that of the Bayes-optimal classifier using O(d/epsilon^2) unlabeled examples. That is, self-training converts weak learners to strong learners using only unlabeled examples. We show that gradient descent on the logistic loss with O(d) labeled examples (i.e., independent of epsilon) suffices to produce a pseudolabeler such that the aforementioned results hold.Graphon Neural Networks and the Transferability of Graph Neural Networks
Luiz Chamon (UC Berkeley)Graph neural networks (GNNs) generalize convolutional neural networks (CNNs) by using graph convolutions that enable information extraction from non-Euclidian domains, e.g., network data. These graph convolutions combine information from adjacent nodes using coefficients that are shared across all nodes. Since these coefficients do not depend on the graph, one can envision using the same coefficients to define a GNN on a different graph. In this talk, I will tackle this problem by introducing graphon NNs as limit objects of sequences of GNNs and characterizing the difference between the output of a GNN and its limit graphon-NN. This bound vanishes as the number of nodes grows as long as the graph convolutional filters are bandlimited in the graph spectral domain. This establishes a tradeoff between discriminability and transferability of GNNs and sheds light on the effect of training using smaller (possibly corrupted) graph convolutions.Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting
Frederic Koehler (Simons Institute)We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum l1-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings. Joint work with Lijia Zhou, Danica Sutherland, and Nathan Srebro.Realizing a dynamical topological phase without symmetry protection in trapped ions
Andrew Potter University of British Columbia
In thermal equilibrium, 1d bosonic systems (e.g. spin- or qubit- chains) cannot support intrinsically topological phases without symmetry protection. For example, the edge states of the Haldane spin chain are fragile to magnetic fields, in contrast to the absolutely stable Majorana edge states of a topological superconducting wire of fermionic electrons. This fragility is a serious drawback to harnessing topological edge states as protected quantum memories in existing AMO and qubit platforms for quantum simulation and information processing. In this talk, I will present evidence for a non-equilibrium topological phase of quasiperiodically-driven trapped ion chains, that exhibits topological edge states that are protected purely by emergent dynamical symmetries that cannot be broken by microscopic perturbations. This represents both the first experimental realization of a non-equilibrium quantum phase, and the first example of a 1d bosonic topological phase that does not rely on symmetry-protection.
Probing topological invariants from a ground state wave function
Ze-Pei Cian University of New Mexico
With the rapid development of programmable quantum simulators, the quantum states can be controlled with unprecedented precision. Thus, it opens a new opportunity to explore the strongly correlated phase of matter with new quantum technology platforms. In quantum simulators, one can engineer interactions between the microscopic degree of freedom and create exotic phases of matter that presumably are beyond the reach of natural materials. Moreover, quantum states can be directly measured instead of probing physical properties indirectly via optical and electrical responses of material as done in traditional condensed matter. Therefore, it is pressing to develop new approaches to efficiently prepare and characterize desired quantum states in the novel quantum technology platforms.
In this talk, I will introduce our recent works on the characterization of the topological invariants from a ground state wave function of the topological order phase and the implementation in noisy intermediate quantum devices. First, using topological field theory and tensor network simulations, we demonstrate how to extract the many-body Chern number (MBCN) given a bulk of a fractional quantum Hall wave function [1]. We then propose an ancilla-free experimental scheme for measuring the MBCN without requiring any knowledge of the Hamiltonian. Specifically, we use the statistical correlations of randomized measurements to infer the MBCN of a wave function [2]. Finally, I will present an unbiased numerical optimization scheme to systematically find the Wilson loop operators given a ground state wave function of a gapped, translationally invariant Hamiltonian on a disk. We then show how these Wilson loop operators can be cut and glued through further optimization to give operators that can create, move, and annihilate anyon excitations. We then use these operators to determine the braiding statistics and topological twists of the anyons, yielding a way to fully characterize topological order from the bulk of a ground state wave function [3].
[1] H. Dehghani, Z.P. Cian, M. Hafezi, and M. Barkeshl, Phys. Rev. B 103, 075102
[2] Z.P. Cian, H. Dehghani, A. Elben, B. Vermersch, G. Zhu, M. Barkeshli, P. Zoller, and M. Hafezi, Phys. Rev. Lett. 126, 050501
[3] Z.P. Cian, M. Hafezi, and M. Barkeshl, Manuscript in preparation.Equivariant Machine Learning Structured Like Classical Physics
Soledad Villar (Johns Hopkins)There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), and permutations. Here we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincaré groups, at any dimensionality d. The key observation is that nonlinear O(d)-equivariant (and related-group-equivariant) functions can be universally expressed in terms of a lightweight collection of scalars -- scalar products and scalar contractions of the scalar, vector, and tensor inputs. We complement our theory with numerical examples that show that the scalar-based method is simple, efficient, and scalable.