(2021). Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent
MLA
Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent. The Simons Institute for the Theory of Computing, Dec. 06, 2021, https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent
BibTex
@misc{ scivideos_18843,
doi = {},
url = {https://simons.berkeley.edu/talks/non-parametric-convergence-rates-plain-vanilla-stochastic-gradient-descent},
author = {},
keywords = {},
language = {en},
title = {Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent},
publisher = {The Simons Institute for the Theory of Computing},
year = {2021},
month = {dec},
note = {18843 see, \url{https://scivideos.org/Simons-Institute/18843}}
}
Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.