PIRSA:23100100

Video URL

The Quantization Model of Neural Scaling

Michaud, E. (2023). The Quantization Model of Neural Scaling. Perimeter Institute for Theoretical Physics. https://pirsa.org/23100100

Michaud, Eric. The Quantization Model of Neural Scaling. Perimeter Institute for Theoretical Physics, Oct. 20, 2023, https://pirsa.org/23100100

          @misc{ scivideos_PIRSA:23100100,
            doi = {10.48660/23100100},
            url = {https://pirsa.org/23100100},
            author = {Michaud, Eric},
            keywords = {Other Physics},
            language = {en},
            title = {The Quantization Model of Neural Scaling},
            publisher = {Perimeter Institute for Theoretical Physics},
            year = {2023},
            month = {oct},
            note = {PIRSA:23100100 see, \url{https://scivideos.org/pirsa/23100100}}
          }

Eric Michaud Massachusetts Institute of Technology (MIT)

October 20, 2023

Talk numberPIRSA:23100100

DOI10.48660/23100100

Source RepositoryPIRSA

Collection

Machine Learning Initiative

Talk Type Scientific Series

Subject

Physics

Abstract

The performance of neural networks like large language models (LLMs) is governed by "scaling laws": the error of the network, averaged across the whole dataset, drops as a power law in the number of network parameters and the amount of data the network was trained on. While the mean error drops smoothly and predictably, scaled up LLMs seem to have qualitatively different (emergent) capabilities than smaller versions when one evaluates them at specific tasks. So how does scaling change what neural networks learn? We propose the "quantization model" of neural scaling, where smooth power laws in mean loss are understood as averaging over many small discrete jumps in network performance. Inspired by Max Planck's assumption in 1900 that energy is quantized, we make the assumption that the knowledge or skills that networks must learn are quantized, coming in discrete chunks which we call "quanta". In our model, neural networks can be understand as being implicitly a large number of modules, and scaling simply adds modules to the network. In this talk, I will discuss evidence for and against this hypothesis, its implications for interpretability and for further scaling, and how it fits in with a broader vision for a "science of deep learning".

---

Zoom link https://pitp.zoom.us/j/93886741739?pwd=NzJrcTBNS2xEUUhXajgyak94LzVvdz09

Supported by

Video URL

The Quantization Model of Neural Scaling

Abstract

Theoretical physics at ELI ERIC

Kinematic Stratifications

The Quadratic Formula Revisited

Neural Networks and Quantum Mechanics

Supporting neurodivergence and understanding neurodivergent ways of being (brought to PI by the Mental Health Working Group)

Video URL

The Quantization Model of Neural Scaling

APA

MLA

BibTex

Abstract

Theoretical physics at ELI ERIC

Kinematic Stratifications

The Quadratic Formula Revisited

Neural Networks and Quantum Mechanics

Supporting neurodivergence and understanding neurodivergent ways of being (brought to PI by the Mental Health Working Group)