Video URL

https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradien…

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

(2021). Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent. The Simons Institute for the Theory of Computing, Dec. 06, 2021, https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent

          @misc{ scivideos_18843,
            doi = {},
            url = {https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent},
            author = {},
            keywords = {},
            language = {en},
            title = {Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2021},
            month = {dec},
            note = {18843 see, \url{https://scivideos.org/Simons-Institute/18843}}
          }

Raphaël Berthier (École polytechnique fédérale de Lausanne)

December 06, 2021

Talk number18843

Source RepositorySimons Institute

Subject

Computer Science

Abstract

Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.

Supported by

Video URL

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production