PIRSA:23100100

The Quantization Model of Neural Scaling

APA

Michaud, E. (2023). The Quantization Model of Neural Scaling. Perimeter Institute for Theoretical Physics. https://pirsa.org/23100100

MLA

Michaud, Eric. The Quantization Model of Neural Scaling. Perimeter Institute for Theoretical Physics, Oct. 20, 2023, https://pirsa.org/23100100

BibTex

          @misc{ scivideos_PIRSA:23100100,
            doi = {10.48660/23100100},
            url = {https://pirsa.org/23100100},
            author = {Michaud, Eric},
            keywords = {Other Physics},
            language = {en},
            title = {The Quantization Model of Neural Scaling},
            publisher = {Perimeter Institute for Theoretical Physics},
            year = {2023},
            month = {oct},
            note = {PIRSA:23100100 see, \url{https://scivideos.org/index.php/pirsa/23100100}}
          }
          

Eric Michaud Massachusetts Institute of Technology (MIT)

Talk numberPIRSA:23100100
Source RepositoryPIRSA
Talk Type Scientific Series
Subject

Abstract

The performance of neural networks like large language models (LLMs) is governed by "scaling laws": the error of the network, averaged across the whole dataset, drops as a power law in the number of network parameters and the amount of data the network was trained on. While the mean error drops smoothly and predictably, scaled up LLMs seem to have qualitatively different (emergent) capabilities than smaller versions when one evaluates them at specific tasks. So how does scaling change what neural networks learn? We propose the "quantization model" of neural scaling, where smooth power laws in mean loss are understood as averaging over many small discrete jumps in network performance. Inspired by Max Planck's assumption in 1900 that energy is quantized, we make the assumption that the knowledge or skills that networks must learn are quantized, coming in discrete chunks which we call "quanta". In our model, neural networks can be understand as being implicitly a large number of modules, and scaling simply adds modules to the network. In this talk, I will discuss evidence for and against this hypothesis, its implications for interpretability and for further scaling, and how it fits in with a broader vision for a "science of deep learning".

---

Zoom link https://pitp.zoom.us/j/93886741739?pwd=NzJrcTBNS2xEUUhXajgyak94LzVvdz09