Search results from ICTS-TIFR
Format results
-
Cengiz PehlevanICTS:32497
-
-
-
Computationally efficient reductions between some statistical models (Online)
Ashwin PananjadyICTS:32494 -
-
-
-
-
-
-
-
-
Mean-Field Theory Insights into Neural Feature Dynamics, Infinite-Scale Limits, and Scaling Laws
Cengiz PehlevanICTS:32497When a neural network becomes extremely wide or deep, its learning dynamics simplify and can be described by the same “mean-field” ideas that explain magnetism and fluids. I will walk through these ideas step-by-step, showing how they suggest practical recipes for initialization and optimization that scale smoothly from small models to cutting-edge transformers. I will also discuss neural scaling laws—empirical power-law rules that relate model size, data, and compute—and illustrate them with solvable toy models.
-
Turing Lecture: Overparametrized models: linear theory and its limits
Andrea MontanariICTS:32491The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
-
Sandbox for the Blackbox: How LLMs learn Structured Data
Ashok MakkuvaICTS:32490In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
References and detailed explanation of our tutorial is here: https://capricious-comb-7a3tbssph.notion.site/NeurIPS-2024-Tutorial-San…
-
Computationally efficient reductions between some statistical models (Online)
Ashwin PananjadyICTS:32494Can a sample from one parametric statistical model (the source) be transformed into a sample from a different (target) model? Versions of this question were asked as far back as 1950, and a beautiful asymptotic theory of equivalence of experiments emerged in the latter half of the 20th century. Motivated by problems spanning information-computation gaps and differentially private data analysis, we ask the analogous non-asymptotic question in high-dimensional problems and with algorithmic considerations. We show how a single observation from some source models can be approximately transformed to a single observation from a large class of target models by a computationally efficient algorithm. I will present several such reductions and discuss their applications to the aforementioned problems.
This is joint work with Mengqi Lou and Guy Bresler.
-
Strongly correlated particle systems: a toolbox for machine intelligence
Subhro GhoshICTS:32493The classical paradigm of randomness in the sciences is that of i.i.d. random variables, and going beyond i.i.d. is often considered a difficulty and a challenge to be overcome. In this talk, we will explore a new perspective, wherein strongly constrained random systems in fact help to understand fundamental problems in machine learning. In particular, we will discuss strongly correlated particle systems that are well-motivated from statistical and quantum physics, including in particular determinantal probability measures. These will be used to shed important light on questions of fundamental interest in learning theory, focussing on applications to novel sampling techniques and advances in stochastic gradient descent.
-
Posterior Sampling for Image Personalization and Editing
Sanjay ShakkottaiICTS:32492This talk will consist of two parts: In the first part, we will present an overview of posterior sampling with diffusion models, and motivate the connection to inverse problems. Specific topics that we will cover include Gibbs sampling, Importance sampling and approximations for test-time optimization (aka training-free approaches such as DPS) with diffusion models. In the second part, we will discuss algorithms for image editing, stylization, etc, that are in production in large-scale settings. Specifically, we will discuss both diffusion and flow-based algorithms (PSLD, STSL, RB Modulation, RF Inversion) that operate in the latent space of SOTA foundation models (such as Stable Diffusion or Flux).
Diffusions class videos are posted on YouTube (and lecture notes link is also posted in the video caption). Link: https://www.youtube.com/@ifml9883/playlists
-
Posterior Sampling for Image Personalization and Editing
Sanjay ShakkottaiICTS:32483This talk will consist of two parts: In the first part, we will present an overview of posterior sampling with diffusion models, and motivate the connection to inverse problems. Specific topics that we will cover include Gibbs sampling, Importance sampling and approximations for test-time optimization (aka training-free approaches such as DPS) with diffusion models. In the second part, we will discuss algorithms for image editing, stylization, etc, that are in production in large-scale settings. Specifically, we will discuss both diffusion and flow-based algorithms (PSLD, STSL, RB Modulation, RF Inversion) that operate in the latent space of SOTA foundation models (such as Stable Diffusion or Flux).
Diffusions class videos are posted on YouTube (and lecture notes link is also posted in the video caption). Link: https://www.youtube.com/@ifml9883/playlists
-
Sandbox for the Blackbox: How LLMs learn Structured Data
Ashok MakkuvaICTS:32482In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
References and detailed explanation of our tutorial is here: https://capricious-comb-7a3tbssph.notion.site/NeurIPS-2024-Tutorial-San…
-
Turing lecture: The mathematics of large machine learning models
Andrea MontanariICTS:32487The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
-
Collaborative Prediction via Tractable Agreement Protocols
Surbhi GoelICTS:32485Designing effective collaboration between humans and AI systems is crucial for leveraging their complementary abilities in complex decision tasks. But how should agents possessing unique, private knowledge—like a human expert and an AI model—interact to reach decisions better than either could alone? If they were perfect Bayesians with a shared prior, Aumann's classical agreement theorem suggests conversation leads to a prediction via agreement which is accuracy-improving. However, this relies on implausible assumptions about their knowledge and computational power.
We show how to recover and generalize these guarantees using only computationally and statistically tractable assumptions. We develop efficient "collaboration protocols" where parties iteratively exchange only low-dimensional information – their current predictions or best-response actions – without needing to share underlying features. These protocols are grounded in conditions like conversation calibration/swap regret, which relax full Bayesian rationality, and are computationally efficiently enforceable. First, we prove this simple interaction leads to fast convergence to agreement, generalizing quantitative bounds even to high-dimensional and action-based settings. Second, we introduce a weak learning condition under which this agreement process inherently aggregates the parties' distinct information, that is, agents via our protocols arrive at final predictions that are provably competitive with an optimal predictor having access to their joint features. Together, these results offers a new, practical foundation for building systems that achieve the power of pooled knowledge through tractable interaction alone.
This talk is based on joint work with the amazing Natalie Collina, Varun Gupta, Ira Globus-Harris, Aaron Roth, Mirah Shi.
-
-