18843

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

APA

(2021). Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent

MLA

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent. The Simons Institute for the Theory of Computing, Dec. 06, 2021, https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent

BibTex

          @misc{ scivideos_18843,
            doi = {},
            url = {https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent},
            author = {},
            keywords = {},
            language = {en},
            title = {Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2021},
            month = {dec},
            note = {18843 see, \url{https://scivideos.org/Simons-Institute/18843}}
          }
          
Raphaël Berthier (École polytechnique fédérale de Lausanne)
Talk number18843
Source RepositorySimons Institute

Abstract

Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.