Search results from ICTS-TIFR
Format results
-
-
-
-
Mean-Field Theory Insights into Neural Feature Dynamics, Infinite-Scale Limits, and Scaling Laws
Cengiz PehlevanICTS:32497 -
-
-
Computationally efficient reductions between some statistical models (Online)
Ashwin PananjadyICTS:32494 -
-
-
-
-
-
Strongly correlated particle systems: a toolbox for machine intelligence
Subhro GhoshICTS:32495The classical paradigm of randomness in the sciences is that of i.i.d. random variables, and going beyond i.i.d. is often considered a difficulty and a challenge to be overcome. In this talk, we will explore a new perspective, wherein strongly constrained random systems in fact help to understand fundamental problems in machine learning. In particular, we will discuss strongly correlated particle systems that are well-motivated from statistical and quantum physics, including in particular determinantal probability measures. These will be used to shed important light on questions of fundamental interest in learning theory, focussing on applications to novel sampling techniques and advances in stochastic gradient descent.
-
What does guidance do? (Online)
Sitan ChenICTS:32499When sampling from a base measure tilted by a reward model, a popular trick is to approximate the score of the tilted measure with the sum of the base score and the gradient of the reward. It is well-known that this does not sample from the base distribution but nevertheless seems to do something interesting and useful, e.g., classifier-free guidance (CFG) and diffusion posterior sampling (DPS). In this talk, I provide some theoretical perspectives on what this method actually samples from, focusing on a simple mixture model setting. In the first part, I will rigorously characterize the dynamics of CFG, proving that it generates archetypal and low-diversity samples in a certain precise sense. In the second part, I will show that for linear inverse problems, DPS with a careful choice of initialization simultaneously boosts reward and likelihood under the prior. I will then describe some experiments demonstrating that DPS with this initialization scheme achieves strong performance on hard image restoration tasks like large box inpainting. Based on https://arxiv.org/abs/2409.13074 and https://arxiv.org/abs/2506.10955
-
New research directions in vector search
Kiran ShiragurICTS:32498Vector search is a fundamental problem with numerous applications in machine learning, computer vision, recommendation systems, and more. While vector search has been extensively studied, modern applications have introduced new requirements, such as diversity, multivector, multifilter, and others. In this talk, we explore these emerging research directions, with a focus on diversity and multivector embeddings in vector search.
For both problems, we propose the first provable graph-based algorithms that efficiently return approximate solutions. Our algorithms leverage popular graph-based methods, enabling us to build on existing, efficient implementations. Experimental results show that our algorithms outperform other approaches.
-
Mean-Field Theory Insights into Neural Feature Dynamics, Infinite-Scale Limits, and Scaling Laws
Cengiz PehlevanICTS:32497When a neural network becomes extremely wide or deep, its learning dynamics simplify and can be described by the same “mean-field” ideas that explain magnetism and fluids. I will walk through these ideas step-by-step, showing how they suggest practical recipes for initialization and optimization that scale smoothly from small models to cutting-edge transformers. I will also discuss neural scaling laws—empirical power-law rules that relate model size, data, and compute—and illustrate them with solvable toy models.
-
Turing Lecture: Overparametrized models: linear theory and its limits
Andrea MontanariICTS:32491The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
-
Sandbox for the Blackbox: How LLMs learn Structured Data
Ashok MakkuvaICTS:32490In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
References and detailed explanation of our tutorial is here: https://capricious-comb-7a3tbssph.notion.site/NeurIPS-2024-Tutorial-San…
-
Computationally efficient reductions between some statistical models (Online)
Ashwin PananjadyICTS:32494Can a sample from one parametric statistical model (the source) be transformed into a sample from a different (target) model? Versions of this question were asked as far back as 1950, and a beautiful asymptotic theory of equivalence of experiments emerged in the latter half of the 20th century. Motivated by problems spanning information-computation gaps and differentially private data analysis, we ask the analogous non-asymptotic question in high-dimensional problems and with algorithmic considerations. We show how a single observation from some source models can be approximately transformed to a single observation from a large class of target models by a computationally efficient algorithm. I will present several such reductions and discuss their applications to the aforementioned problems.
This is joint work with Mengqi Lou and Guy Bresler.
-
Strongly correlated particle systems: a toolbox for machine intelligence
Subhro GhoshICTS:32493The classical paradigm of randomness in the sciences is that of i.i.d. random variables, and going beyond i.i.d. is often considered a difficulty and a challenge to be overcome. In this talk, we will explore a new perspective, wherein strongly constrained random systems in fact help to understand fundamental problems in machine learning. In particular, we will discuss strongly correlated particle systems that are well-motivated from statistical and quantum physics, including in particular determinantal probability measures. These will be used to shed important light on questions of fundamental interest in learning theory, focussing on applications to novel sampling techniques and advances in stochastic gradient descent.
-
Posterior Sampling for Image Personalization and Editing
Sanjay ShakkottaiICTS:32492This talk will consist of two parts: In the first part, we will present an overview of posterior sampling with diffusion models, and motivate the connection to inverse problems. Specific topics that we will cover include Gibbs sampling, Importance sampling and approximations for test-time optimization (aka training-free approaches such as DPS) with diffusion models. In the second part, we will discuss algorithms for image editing, stylization, etc, that are in production in large-scale settings. Specifically, we will discuss both diffusion and flow-based algorithms (PSLD, STSL, RB Modulation, RF Inversion) that operate in the latent space of SOTA foundation models (such as Stable Diffusion or Flux).
Diffusions class videos are posted on YouTube (and lecture notes link is also posted in the video caption). Link: https://www.youtube.com/@ifml9883/playlists
-
Posterior Sampling for Image Personalization and Editing
Sanjay ShakkottaiICTS:32483This talk will consist of two parts: In the first part, we will present an overview of posterior sampling with diffusion models, and motivate the connection to inverse problems. Specific topics that we will cover include Gibbs sampling, Importance sampling and approximations for test-time optimization (aka training-free approaches such as DPS) with diffusion models. In the second part, we will discuss algorithms for image editing, stylization, etc, that are in production in large-scale settings. Specifically, we will discuss both diffusion and flow-based algorithms (PSLD, STSL, RB Modulation, RF Inversion) that operate in the latent space of SOTA foundation models (such as Stable Diffusion or Flux).
Diffusions class videos are posted on YouTube (and lecture notes link is also posted in the video caption). Link: https://www.youtube.com/@ifml9883/playlists
-
Sandbox for the Blackbox: How LLMs learn Structured Data
Ashok MakkuvaICTS:32482In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
References and detailed explanation of our tutorial is here: https://capricious-comb-7a3tbssph.notion.site/NeurIPS-2024-Tutorial-San…
-
Turing lecture: The mathematics of large machine learning models
Andrea MontanariICTS:32487The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.