22921

Chasing the Long Tail: What Neural Networks Memorize and Why

APA

(2022). Chasing the Long Tail: What Neural Networks Memorize and Why. The Simons Institute for the Theory of Computing. https://old.simons.berkeley.edu/node/22921

MLA

Chasing the Long Tail: What Neural Networks Memorize and Why. The Simons Institute for the Theory of Computing, Nov. 07, 2022, https://old.simons.berkeley.edu/node/22921

BibTex

          @misc{ scivideos_22921,
            doi = {},
            url = {https://old.simons.berkeley.edu/node/22921},
            author = {},
            keywords = {},
            language = {en},
            title = {Chasing the Long Tail: What Neural Networks Memorize and Why},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {nov},
            note = {22921 see, \url{https://scivideos.org/simons-institute/22921}}
          }
          
Vitaly Feldman (Apple ML Research)
Talk number22921
Source RepositorySimons Institute

Abstract

Deep learning algorithms that achieve state-of-the-art results on image and text recognition tasks tend to fit the entire training dataset (nearly) perfectly including mislabeled examples and outliers. This propensity to memorize seemingly useless data and the resulting large generalization gap have puzzled many practitioners and is not explained by existing theories of machine learning. We provide a simple conceptual explanation and a theoretical model demonstrating that memorization of outliers and mislabeled examples is necessary for achieving close-to-optimal generalization error when learning from long-tailed data distributions. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. We then demonstrate the utility of memorization and support our explanation empirically. These results rely on a new technique for efficiently estimating memorization and influence of training data points.  Our results allow us to quantify the cost of limiting memorization in learning and explain the disparate effects that privacy and model compression have on different subgroups.