Computer Science

Displaying 181 - 192 of 384

Learning Staircases

Emmanuel Abbe (École polytechnique fédérale de Lausanne), Enric Boix (MIT), Theodor Misiakiewicz (Stanford University)
December 07, 2021

18839
Computer Science
Sharp Matrix Concentration

March Boedihardjo (University of California, Irvine)
December 06, 2021

18844
Computer Science
Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Raphaël Berthier (École polytechnique fédérale de Lausanne)
December 06, 2021

18843
Computer Science
Self-Training Converts Weak Learners to Strong Learners in Mixture Models

Spencer Frei (UC Berkeley)
December 06, 2021

18842
Computer Science
Graphon Neural Networks and the Transferability of Graph Neural Networks

Luiz Chamon (UC Berkeley)
December 06, 2021

18841
Computer Science
Adaptive Wavelet Distillation from DNNs and Dictionary Learning

Bin Yu (UC Berkeley)
December 06, 2021

18840
Computer Science
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

Frederic Koehler (Simons Institute)
December 06, 2021

18846
Computer Science
Equivariant Machine Learning Structured Like Classical Physics

Soledad Villar (Johns Hopkins)
December 03, 2021

18811
Computer Science
Equivariant RL

Max Welling (University of Amsterdam)
December 03, 2021

18812
Computer Science
Symmetry Adapted Gram Spectrahedra

Serkan Hosten (San Francisco State University)
December 02, 2021

18810
Computer Science
Optimization On Tensor Network Varieties

Fulvio Gesmundo (Max Planck Institute for Mathematics in the Sciences)
December 02, 2021

18801
Computer Science
On The Polyhedral Geometry Of Pivot Rules

Raman Sanyal (Goethe University Frankfurt)
December 02, 2021

18809
Computer Science

Learning Staircases

Emmanuel Abbe (École polytechnique fédérale de Lausanne), Enric Boix (MIT), Theodor Misiakiewicz (Stanford University)
December 07, 2021

18839
Computer Science
It is known that arbitrary poly-size neural networks trained by GD/SGD can learn in class in SQ/PAC. This is however not expected to hold for more regular architectures and initializations. Recently, the staircase property emerged as a condition that seems both necessary and sufficient for certain regular networks to learn with high accuracy, with the positive result established for sparse homogeneous initializations. In this talk, we show that standard two-layer architectures can also learn staircases with features being learned over time. It is also shown that kernels cannot learn staircases of growing degree. Joint work with Enric Boix-Adsera and Theodor Misiakiewicz.
Sharp Matrix Concentration

March Boedihardjo (University of California, Irvine)
December 06, 2021

18844
Computer Science
Classical matrix concentration inequalities are sharp up to a logarithmic factor. This logarithmic factor is necessary in the commutative case but unnecessary in many classical noncommutative cases. We will present some matrix concentration results that are sharp in many cases, where we overcome this logarithmic factor by using an easily computable quantity that captures noncommutativity. Joint work with Afonso Bandeira and Ramon van Handel. Paper: https://arxiv.org/abs/2108.06312
Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Raphaël Berthier (École polytechnique fédérale de Lausanne)
December 06, 2021

18843
Computer Science
Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.
Self-Training Converts Weak Learners to Strong Learners in Mixture Models

Spencer Frei (UC Berkeley)
December 06, 2021

18842
Computer Science
We consider a binary classification problem where the data comes from a mixture of two rotationally symmetric log-concave distributions. We analyze the dynamics of a nonconvex gradient-based self-training algorithm that utilizes a pseudolabeler to generate labels for unlabeled data and updates the pseudolabeler based on a "supervised" loss that treats the pseoudolabels as if they were ground truth-labels. We show that provided the initial pseudolabeler has classification error smaller than some absolute constant, then self-training produces classifiers with error at most epsilon greater than that of the Bayes-optimal classifier using O(d/epsilon^2) unlabeled examples. That is, self-training converts weak learners to strong learners using only unlabeled examples. We show that gradient descent on the logistic loss with O(d) labeled examples (i.e., independent of epsilon) suffices to produce a pseudolabeler such that the aforementioned results hold.
Graphon Neural Networks and the Transferability of Graph Neural Networks

Luiz Chamon (UC Berkeley)
December 06, 2021

18841
Computer Science
Graph neural networks (GNNs) generalize convolutional neural networks (CNNs) by using graph convolutions that enable information extraction from non-Euclidian domains, e.g., network data. These graph convolutions combine information from adjacent nodes using coefficients that are shared across all nodes. Since these coefficients do not depend on the graph, one can envision using the same coefficients to define a GNN on a different graph. In this talk, I will tackle this problem by introducing graphon NNs as limit objects of sequences of GNNs and characterizing the difference between the output of a GNN and its limit graphon-NN. This bound vanishes as the number of nodes grows as long as the graph convolutional filters are bandlimited in the graph spectral domain. This establishes a tradeoff between discriminability and transferability of GNNs and sheds light on the effect of training using smaller (possibly corrupted) graph convolutions.
Adaptive Wavelet Distillation from DNNs and Dictionary Learning

Bin Yu (UC Berkeley)
December 06, 2021

18840
Computer Science
TBA
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

Frederic Koehler (Simons Institute)
December 06, 2021

18846
Computer Science
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum l1-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings. Joint work with Lijia Zhou, Danica Sutherland, and Nathan Srebro.
Equivariant Machine Learning Structured Like Classical Physics

Soledad Villar (Johns Hopkins)
December 03, 2021

18811
Computer Science
There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), and permutations. Here we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincar√© groups, at any dimensionality d. The key observation is that nonlinear O(d)-equivariant (and related-group-equivariant) functions can be universally expressed in terms of a lightweight collection of scalars -- scalar products and scalar contractions of the scalar, vector, and tensor inputs. We complement our theory with numerical examples that show that the scalar-based method is simple, efficient, and scalable.
Equivariant RL

Max Welling (University of Amsterdam)
December 03, 2021

18812
Computer Science
Symmetries play a unifying role in physics and many other sciences. In deep learning, symmetries have been incorporated into neural networks through the concept of equivariance. One of the major benefits is that it will reduce the number parameters through parameter sharing and as such can learn with less data. In this talk I will ask the question, can equivariance also help in RL? Besides the obvious idea of using equivariant value functions, we explore the idea of deep equivariant policies. We make a connection between equivariance and MDP homomorphisms, and generalize to distributed multi-agent settings. Joint work with Elise van der Pol (main contributor), Herke van Hoof and Frans Oliehoek.
Symmetry Adapted Gram Spectrahedra

Serkan Hosten (San Francisco State University)
December 02, 2021

18810
Computer Science
We report on the geometric structure of symmetry adapted PSD cones and symmetry adapted Gram spectrahedra of symmetric polynomials. We determine the dimension of symmetry adapted PSD cones, describe its extreme rays, and discuss the structure of its matrix representations. We also focus on symmetry adapted Gram spectrahedra of symmetric binary forms, quadrics, ternary quartics and sextics. In particular, we characterize extreme points of these spectrahedra for symmetric binary forms that are of rank two, and we report what we know about the facial structure of the same spectrahedra. The talk will be based on two collaborations, one with Alex Heaton and Isabelle Shankar, and another one with Matthew Heid.
Optimization On Tensor Network Varieties

Fulvio Gesmundo (Max Planck Institute for Mathematics in the Sciences)
December 02, 2021

18801
Computer Science
Tensor network states form a variational ansatz class widely used in the study of quantum many-body systems. Geometrically, these states form an algebraic variety of tensors with rich representation theoretic structure. It is known that tensors on the "boundary" of this variety can provide more efficient representations for states of physical interest, but the pathological geometric properties of the boundary make it difficult to extend the classical optimization methods. In recent work, we introduced a new ansatz class which includes states at the boundary of the tensor network variety. I will present some of the geometric features of this class and explain how it can be used in the variational approach. This is based on joint work with M. Christandl, D. Stilck-Franca and A. Werner.
On The Polyhedral Geometry Of Pivot Rules

Raman Sanyal (Goethe University Frankfurt)
December 02, 2021

18809
Computer Science
Geometrically, a linear program is a polyhedron together with an orientation of its graph. A simplex method determines a path from any given starting vertex to the sink. The centerpiece of any simplex algorithm is the pivot rule that successively selects outgoing edges along the path. We introduce normalized-weight pivot rules, a class of pivot rules that are memory-less, that subsume many of the most-used pivot rules, and that can be parametrized in a natural continuous manner. We introduce two polytopes that capture the behavior of normalized-pivot rules on polytopes and linear programs. On the one hand, this gives a new perspective on the performance of pivot rules. On the other hand, our constructions generalize classical constructions (e.g. monotone path polytopes) and objects (e.g. permutahedra, associahedra, multiplihedra) from geometric combinatorics, This is joint work with Alex Black, Jesus De Loera, and Niklas L√ºtjeharms.

Title	Speaker(s)	Date	Info link
Learning Staircases	Emmanuel Abbe (École polytechnique fédérale de Lausanne), Enric Boix (MIT), Theodor Misiakiewicz (Stanford University)	2021-12-07	View details
Sharp Matrix Concentration	March Boedihardjo (University of California, Irvine)	2021-12-06	View details
Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent	Raphaël Berthier (École polytechnique fédérale de Lausanne)	2021-12-06	View details
Self-Training Converts Weak Learners to Strong Learners in Mixture Models	Spencer Frei (UC Berkeley)	2021-12-06	View details
Graphon Neural Networks and the Transferability of Graph Neural Networks	Luiz Chamon (UC Berkeley)	2021-12-06	View details
Adaptive Wavelet Distillation from DNNs and Dictionary Learning	Bin Yu (UC Berkeley)	2021-12-06	View details
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting	Frederic Koehler (Simons Institute)	2021-12-06	View details
Equivariant Machine Learning Structured Like Classical Physics	Soledad Villar (Johns Hopkins)	2021-12-03	View details
Equivariant RL	Max Welling (University of Amsterdam)	2021-12-03	View details
Symmetry Adapted Gram Spectrahedra	Serkan Hosten (San Francisco State University)	2021-12-02	View details
Optimization On Tensor Network Varieties	Fulvio Gesmundo (Max Planck Institute for Mathematics in the Sciences)	2021-12-02	View details
On The Polyhedral Geometry Of Pivot Rules	Raman Sanyal (Goethe University Frankfurt)	2021-12-02	View details

Supported by

Format results

Learning Staircases

Sharp Matrix Concentration

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Self-Training Converts Weak Learners to Strong Learners in Mixture Models

Graphon Neural Networks and the Transferability of Graph Neural Networks

Adaptive Wavelet Distillation from DNNs and Dictionary Learning

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

Equivariant Machine Learning Structured Like Classical Physics

Equivariant RL

Symmetry Adapted Gram Spectrahedra

Optimization On Tensor Network Varieties

On The Polyhedral Geometry Of Pivot Rules

Learning Staircases

Sharp Matrix Concentration

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Self-Training Converts Weak Learners to Strong Learners in Mixture Models

Graphon Neural Networks and the Transferability of Graph Neural Networks

Adaptive Wavelet Distillation from DNNs and Dictionary Learning

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

Equivariant Machine Learning Structured Like Classical Physics

Equivariant RL

Symmetry Adapted Gram Spectrahedra

Optimization On Tensor Network Varieties

On The Polyhedral Geometry Of Pivot Rules