Search results from ICTS-TIFR
Format results
-
-
-
Generic regularity for minimizing hypersurfaces up to dimension 11 - I (Online)
Felix SchulzeICTS:32608 -
-
Mean-Field Theory Insights into Neural Feature Dynamics, Infinite-Scale Limits, and Scaling Laws
Cengiz PehlevanICTS:32500 -
Asymptotic optimality of confidence interval based algorithms for fixed confidence MABs
Jayakrishnan NairICTS:32504 -
Learning Causal World Models from Acting and Seeing Using Score Functions
Karthikeyan ShanmugamICTS:32503 -
-
-
-
-
-
Decompositions of Scherk-Type Zero Mean Curvature Surfaces
Subham PaulICTS:32599Using a special Euler–Ramanujan identity and Wick rotation, we reveal how Scherk-type zero mean curvature surfaces in Lorentz–Minkowski space can be decomposed into superpositions of dilated helicoids and hyperbolic helicoids. These decompositions also extend to maximal codimension-2 surfaces, linking them to weakly untrapped and ∗-surfaces.
-
Construction of maxfaces with infitely many swallowtails and planar ends.
Anu DhochakICTS:32647In this talk, I will discuss the existence of a one-parameter family of infinite genus maxfaces exhibiting infinitely many planar spacelike ends.
-
Generic regularity for minimizing hypersurfaces up to dimension 11 - I (Online)
Felix SchulzeICTS:32608We will give an introduction to our recent joint work with Otis Chodosh, Christos Mantoulidis and Zhihan Wang on generic regularity for area minimizing hypersurfaces up to dimension 11.
-
n-Step Temporal Difference Learning with Optimal n
Shalabh BhatnagarICTS:32501We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm. Our objective function for the optimization problem is the average root mean squared error (RMSE). We find the optimal n by resorting to a model-free optimization technique involving a one-simulation simultaneous perturbation stochastic approximation (SPSA) based procedure. Whereas SPSA is a zeroth-order continuous optimization procedure, we adapt it to the discrete optimization setting by using a random projection operator. We prove the asymptotic convergence of the recursion by showing that the sequence of n-updates obtained using zeroth-order stochastic gradient search converges almost surely to an internally chain transitive invariant set of an associated differential inclusion. This results in convergence of the discrete parameter sequence to the optimal n in n-step TD. Through experiments, we show that the optimal value of n is achieved with our algorithm for arbitrary initial values. We further show using numerical evaluations that the proposed algorithm outperforms a well known discrete parameter stochastic optimization algorithm ‘Optimal Computing Budget Allocation (OCBA)’ on benchmark RL tasks.
-
Mean-Field Theory Insights into Neural Feature Dynamics, Infinite-Scale Limits, and Scaling Laws
Cengiz PehlevanICTS:32500When a neural network becomes extremely wide or deep, its learning dynamics simplify and can be described by the same “mean-field” ideas that explain magnetism and fluids. I will walk through these ideas step-by-step, showing how they suggest practical recipes for initialization and optimization that scale smoothly from small models to cutting-edge transformers. I will also discuss neural scaling laws—empirical power-law rules that relate model size, data, and compute—and illustrate them with solvable toy models.
-
Asymptotic optimality of confidence interval based algorithms for fixed confidence MABs
Jayakrishnan NairICTS:32504In this work, we address the challenge of identifying the optimal arm in a stochastic multi-armed bandit scenario with the minimum number of arm pulls, given a predefined error probability in a fixed confidence setting. Our focus is on examining the asymptotic behavior of sample complexity and the distribution of arm weights upon termination, as the error threshold is scaled to zero, under confidence-interval based algorithms. Specifically, we analyze the asymptotic sample complexity and termination weight fractions for the well-known LUCB algorithm, and introduce a new variant, the LUCB Greedy algorithm. We demonstrate that the upper bounds on the sample complexities for both algorithms are asymptotically within a (universal) constant factor of the established lower bounds.
-
Learning Causal World Models from Acting and Seeing Using Score Functions
Karthikeyan ShanmugamICTS:32503n causal inference, true causal order and the graph of causal interaction can be uniquely determined if you have sufficient interventional data. Interventions are local (randomized control trials) RCTs done where different variables, taken few or one at a time, in a causal graph are randomized. We consider a harder problem when the causal variables are not directly observed and are "latent". Instead, we observe a high dimensional transformation (as images etc.) of the true causal variables. Central problem in causal representation learning is to invert the unknown transformation between true causal variables and the observations up to coordinate wise scaling and permutation. We show that this is possible with enough interventional diversity by exploiting two key ideas: a) Represent interventional distributions in terms of their scores (gradient of likelihoods). b) The encoder-decoder pair that minimizes reconstruction loss and sparsifies the score difference in the latent space is the optimal pair. We show various versions of these results for linear transforms and general transforms with mild regularity assumptions on the diversity of interventions. We also will discuss empirical results on some simple image datasets.
Joint work with Burak Varici (CMU), Emre Acarturk (RPI), Abhishek Kumar (Amazon, ex-GDM), Ali Tajer (RPI)
-
Second Order Methods for Bandit Optimization and Control
Arun Sai SuggalaICTS:32502Bandit convex optimization is a powerful framework for sequential decision-making, but existing algorithms with optimal regret guarantees are often too computationally expensive for high-dimensional problems. This talk introduces a simple and practical BCO algorithm, the Bandit Newton Step, which leverages second-order information for decision-making. We will show that our algorithm obtains an optimal $O(T^{1/2})$ regret bound for a large and practical class of functions that satisfy a condition we call “$\kappa$-convexity,” which includes linear, quadratic, and generalized linear losses. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression.
Furthermore, we'll discuss the extension of our method to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an optimal regret algorithm for the bandit Linear-Quadratic (LQ) control problem under a fully adversarial noise model, resolving a key open question in the field. Finally, we contrast this result by proving that bandit problems with more general memory structures are fundamentally harder, establishing a tight $\Omega(T^{2/3})$ lower bound on regret.
-
Turing lecture: Dynamical phenomena in nonlinear learning
Andrea MontanariICTS:32496The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
-
Strongly correlated particle systems: a toolbox for machine intelligence
Subhro GhoshICTS:32495The classical paradigm of randomness in the sciences is that of i.i.d. random variables, and going beyond i.i.d. is often considered a difficulty and a challenge to be overcome. In this talk, we will explore a new perspective, wherein strongly constrained random systems in fact help to understand fundamental problems in machine learning. In particular, we will discuss strongly correlated particle systems that are well-motivated from statistical and quantum physics, including in particular determinantal probability measures. These will be used to shed important light on questions of fundamental interest in learning theory, focussing on applications to novel sampling techniques and advances in stochastic gradient descent.
-
What does guidance do? (Online)
Sitan ChenICTS:32499When sampling from a base measure tilted by a reward model, a popular trick is to approximate the score of the tilted measure with the sum of the base score and the gradient of the reward. It is well-known that this does not sample from the base distribution but nevertheless seems to do something interesting and useful, e.g., classifier-free guidance (CFG) and diffusion posterior sampling (DPS). In this talk, I provide some theoretical perspectives on what this method actually samples from, focusing on a simple mixture model setting. In the first part, I will rigorously characterize the dynamics of CFG, proving that it generates archetypal and low-diversity samples in a certain precise sense. In the second part, I will show that for linear inverse problems, DPS with a careful choice of initialization simultaneously boosts reward and likelihood under the prior. I will then describe some experiments demonstrating that DPS with this initialization scheme achieves strong performance on hard image restoration tasks like large box inpainting. Based on https://arxiv.org/abs/2409.13074 and https://arxiv.org/abs/2506.10955
-
New research directions in vector search
Kiran ShiragurICTS:32498Vector search is a fundamental problem with numerous applications in machine learning, computer vision, recommendation systems, and more. While vector search has been extensively studied, modern applications have introduced new requirements, such as diversity, multivector, multifilter, and others. In this talk, we explore these emerging research directions, with a focus on diversity and multivector embeddings in vector search.
For both problems, we propose the first provable graph-based algorithms that efficiently return approximate solutions. Our algorithms leverage popular graph-based methods, enabling us to build on existing, efficient implementations. Experimental results show that our algorithms outperform other approaches.