Video URL

https://simons.berkeley.edu/talks/analyzing-optimization-and-generalization-deep-learning-dynamics-…

Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent

(2020). Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/analyzing-optimization-and-generalization-deep-learning-dynamics-gradient-descent

Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent. The Simons Institute for the Theory of Computing, Dec. 16, 2020, https://simons.berkeley.edu/talks/analyzing-optimization-and-generalization-deep-learning-dynamics-gradient-descent

          @misc{ scivideos_16875,
            doi = {},
            url = {https://simons.berkeley.edu/talks/analyzing-optimization-and-generalization-deep-learning-dynamics-gradient-descent},
            author = {},
            keywords = {},
            language = {en},
            title = {Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16875 see, \url{https://scivideos.org/Simons-Institute/16875}}
          }

Nadav Cohen (Tel-Aviv University)

December 16, 2020

Talk number16875

Source RepositorySimons Institute

Subject

Computer Science

Abstract

Understanding deep learning calls for addressing the questions of: (i) optimization --- the effectiveness of simple gradient-based algorithms in solving neural network training programs that are non-convex and thus seemingly difficult; and (ii) generalization --- the phenomenon of deep learning models not overfitting despite having many more parameters than examples to learn from. Existing analyses of optimization and/or generalization typically adopt the language of classical learning theory, abstracting away many details on the setting at hand. In this talk I will argue that a more refined perspective is in order, one that accounts for the dynamics of the optimizer. I will then demonstrate a manifestation of this approach, analyzing the dynamics of gradient descent over linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, adding (redundant) linear layers to a classic linear model significantly accelerates gradient descent, despite the introduction of non-convexity. Finally, we will show that such addition of layers induces an implicit bias towards low rank (different from any type of norm regularization), and by this explain generalization of deep linear neural networks for the classic problem of low rank matrix completion. Works covered in this talk were in collaboration with Sanjeev Arora, Noah Golowich, Elad Hazan, Wei Hu, Yuping Luo and Noam Razin.

Supported by

Video URL

Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production