Computer Science

Displaying 73 - 84 of 384

On The Exploration In Load-Balancing Under Unknown Service Rates

Weina Wang (CMU)
October 07, 2022

22710
Computer Science
Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And Linear Function Approximation

Siva Theja Maguluri (Georgia Institute of Technology)
October 07, 2022

22709
Computer Science
The Compensated Coupling (or Why the Future is the Best Guide for the Present)

Sid Banerjee (Cornell University)
October 07, 2022

22708
Computer Science
Higher-Dimensional Expansion of Random Geometric Complexes

Tselil Schramm (Stanford University)
October 07, 2022

22707
Computer Science
On the Power of Preconditioning in Sparse Linear Regression

Raghu Meka (UCLA)
October 07, 2022

22706
Computer Science
What Functions Do Transformers Prefer to Represent?

Surbhi Goel (Microsoft Research and University of Pennsylvania)
October 07, 2022

22705
Computer Science
Optimality of Variational Inference for Stochastic Block Model

Olga Klopp (ESSEC Business School)
September 30, 2022

22618
Computer Science
Machine Learning on Large-Scale Graphs

Luana Ruiz (University of Pennsylvania)
September 30, 2022

22611
Computer Science
Survey on Sparse Graph Limits + A Toy Example

Mei Yin (University of Denver)
September 30, 2022

22616
Computer Science
Long Range Dependence in Evolving Networks

Shankar Bhamidi (University of North Carolina, Chapel Hill)
September 30, 2022

22614
Computer Science
Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations

Kavita Ramanan (Brown University)
September 30, 2022

22613
Computer Science
Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of Inhomogeneous Erd\H{o}s-R\'enyi Random Graphs

Frank den Hollander (Leiden University)
September 30, 2022

22612
Computer Science

On The Exploration In Load-Balancing Under Unknown Service Rates

Weina Wang (CMU)
October 07, 2022

22710
Computer Science
Today’s computing systems consist of servers that offer highly heterogeneous service rates due to the diverse machine types and the dynamic computing environment. In such systems, to guarantee low latency, load-balancing algorithms need to consider not only the amount of work on servers but also their service rates. However, because of the dynamic nature of the computing environment, the service rates cannot be obtained beforehand and thus have to be learned on the fly. In this talk, we consider a load-balancing system that consists of multiple servers with unknown service rates. Each incoming job needs to be assigned to one of the servers upon arrival. Our goal is to develop learning-integrated job-assigning algorithms that perform comparably to their counterparts under known service rates. Although some facets of this problem resemble the well-studied multi-armed bandit problem, our findings reveal that the exploration component is fundamentally different. In particular, we develop a meta-algorithm that integrates learning into a large class of load-balancing algorithms. The proposed algorithm uses almost no exploration, yet it achieves a constant regret in the accumulated latency of jobs. A similar free exploration phenomenon has been observed in a prior paper, but in a different setting with a single server. This is joint work with Yifei Huang and Zhiyuan Tang, both from Tsinghua University.
Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And Linear Function Approximation

Siva Theja Maguluri (Georgia Institute of Technology)
October 07, 2022

22709
Computer Science
In this work, we study policy-space methods for solving the reinforcement learning (RL) problem. We first consider the setting where the model is known (MDP setting) and show that the Natural Policy Gradient (NPG) algorithm (and other variants) have linear (geometric) convergence without the need of any regularization. In contrast to optimization approaches used in the literature, our approach is based on approximate dynamic programing. We then consider a linear function approximation variant of it and establish its convergence guarantees. Finally, we consider the RL setting where a critic is deployed for policy evaluation when the model is unknown. We consider a critic that is based on TD learning, and uses off-policy sampling with linear function approximation. This leads to the infamous deadly-triad issue. We propose a generic algorithm framework of a single time-scale multi-step TD-learning with generalized importance sampling ratios that enables us to overcome the high variance issue in off-policy learning, and establish its finite-sample guarantees. We show that this leads to an overall sample complexity of O(epsilon^-2).
The Compensated Coupling (or Why the Future is the Best Guide for the Present)

Sid Banerjee (Cornell University)
October 07, 2022

22708
Computer Science
What makes online decision-making different from other decision-making/optimization problems? While it seems clear that the unique features are the sequential nature of taking actions and uncertainty in future outcomes, most techniques for solving such problems tend to obfuscate these features - so are these the best ways to think about these settings? I will present the compensated coupling: a simple paradigm for reasoning about and designing online decision-making policies, based on a sample-pathwise accounting of their performance compared to some benchmark policy. This approach generalizes many standard results used in studying Markov decision processes and reinforcement learning, but also gives us new policies which are much simpler and more effective than existing heuristics. For a large class of widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks, I will illustrate how these new algorithms achieve constant regret (i.e., additive loss compared to the hindsight optimal which is independent of the horizon and state-space) under a wide range of conditions. Time permitting, I will try and describe how we can use this technique to incorporate side information and historical data in these settings, and achieve constant regret with as little as a single data trace.
Higher-Dimensional Expansion of Random Geometric Complexes

Tselil Schramm (Stanford University)
October 07, 2022

22707
Computer Science
A graph is said to be a (1-dimensional) expander if the second eigenvalue of its adjacency matrix is bounded away from 1, or almost-equivalently, if it has no sparse vertex cuts. There are several natural ways to generalize the notion of expansion to hypergraphs/simplicial complexes, but one such way is 2-dimensional spectral expansion, in which the expansion of vertex neighborhoods (remarkably) witnesses global expansion. While 1-dimensional expansion is known to be achieved by, e.g., random regular graphs, very few constructions of sparse 2-dimensional expanders are known, and at present all are algebraic. It is an open question whether sparse 2-dimensional expanders are "abundant" and "natural" or "rare." In this talk, we'll give some evidence towards abundance: we show that the set of triangles in a random geometric graph on a high-dimensional sphere yields an expanding simplicial complex of arbitrarily small polynomial degree.
On the Power of Preconditioning in Sparse Linear Regression

Raghu Meka (UCLA)
October 07, 2022

22706
Computer Science
Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to solve it efficiently without restrictive conditions on the design matrix. This talk will focus on the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian and the ground-truth signal is sparse. Information theoretically, one can achieve strong error bounds with O(klogn) samples for arbitrary covariance matrices and sparsity k; however, no efficient algorithms are known to match these guarantees even with o(n) samples in general. As for hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In the talk, we will discuss new upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if the covariance matrix is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. Based on joint works with Jonathan Kelner, Frederic Koehler, Dhruv Rohatgi.
What Functions Do Transformers Prefer to Represent?

Surbhi Goel (Microsoft Research and University of Pennsylvania)
October 07, 2022

22705
Computer Science
Over the past few years, Transformers have revolutionized deep learning, leading to advances in natural language processing and beyond. These models discard recurrence and convolutions, in favor of "self-attention" which directly and globally models interactions within the input context. Despite their success, currently there is limited understanding of why they work. In this talk, I will present our recent results on rigorously quantifying the statistical and representational properties of Transformers which shed light on their ability to capture long range dependencies efficiently. First, I will show how bounded-norm self-attention layers can represent arbitrary sparse functions of the input sequence, with sample complexity scaling only logarithmically with the context length, akin to sparse regression. Subsequently, I will briefly show how this ability of self-attention to compute sparse functions along with its ability to compute averages can be used to construct Transformers that exactly replicate the dynamics of a recurrent model of computation depth $T$ with only $o(T)$ depth. I will conclude the talk with experimental results on synthetic tasks based on learning Boolean functions and automata. Based on joint works with Jordan T. Ash, Ben L. Edelman, Sham M. Kakade, Akshay Krishnamurthy, Bingbin Liu, and Cyril Zhang.
Optimality of Variational Inference for Stochastic Block Model

Olga Klopp (ESSEC Business School)
September 30, 2022

22618
Computer Science
Abstract Variational methods are extremely popular in the analysis of network data. Statistical guarantees obtained for these methods typically provide asymptotic normality for the problem of estimation of global model parameters under the stochastic block model. In the present work, we consider the case of networks with missing links that is important in application and show that the variational approximation to the maximum likelihood estimator converges at the minimax rate. This provides the first minimax optimal and tractable estimator for the problem of parameter estimation for the stochastic block model. We complement our results with numerical studies of simulated and real networks, which confirm the advantages of this estimator over current methods.
Machine Learning on Large-Scale Graphs

Luana Ruiz (University of Pennsylvania)
September 30, 2022

22611
Computer Science
Abstract Graph neural networks (GNNs) are successful at learning representations from most types of network data but suffer from limitations in large graphs, which do not have the Euclidean structure that time and image signals have in the limit. Yet, large graphs can often be identified as being similar to each other in the sense that they share structural properties. Indeed, graphs can be grouped in families converging to a common graph limit -- the graphon. A graphon is a bounded symmetric kernel which can be interpreted as both a random graph model and a limit object of a convergent sequence of graphs. Graphs sampled from a graphon almost surely share structural properties in the limit, which implies that graphons describe families of similar graphs. We can thus expect that processing data supported on graphs associated with the same graphon should yield similar results. In this talk, I formalize this intuition by showing that the error made when transferring a GNN across two graphs in a graphon family is small when the graphs are sufficiently large. This enables large-scale graph machine learning by transference: training GNNs on moderate-scale graphs and executing them on large-scale graphs.
Survey on Sparse Graph Limits + A Toy Example

Mei Yin (University of Denver)
September 30, 2022

22616
Computer Science
Abstract The theory of graph limits is an important tool in understanding properties of large networks. We begin the talk with a survey of this theory, concentrating in particular on the sparse setting. We then investigate a power-law random graph model and cast it in the sparse graph limit theory framework. The distinctively different structures of the limit graph are explored in detail in the sub-critical and super-critical regimes. In the sub-critical regime, the graph is empty with high probability, and in the rare event that it is non-empty, it consists of a single edge. Contrarily, in the super-critical regime, a non-trivial random graph exists in the limit, and it serves as an uncovered boundary case between different types of graph convergence.
Long Range Dependence in Evolving Networks

Shankar Bhamidi (University of North Carolina, Chapel Hill)
September 30, 2022

22614
Computer Science
Abstract The goal of this talk is to describe probabilistic approaches to three major problems for dynamic networks, both of which are intricately connected to long range dependence in the evolution of such models: 1.Nonparametric change point detection: Consider models of growing networks which evolve via new vertices attaching to the pre-existing network according to one attachment function $f$ till the system grows to size $τ(n) < n$, when new vertices switch their behavior to a different function g till the system reaches size n. The goal is to estimate the change point given the observation of the networks over time with no knowledge of the functions driving the dynamics. We will describe non-parametric estimators for such problems. 2. Detecting the initial seed which resulted in the current state of the network: Imagine observing a static time slice of the network after some large time $n$ started with an initial seed. Suppose all one gets to see is the current topology of the network (without any label or age information). Developing probably efficient algorithms for estimating the initial seed has inspired intense activity over the last few years in the probability community. We will describe recent developments in addressing such questions including robustness results such as the fixation of so called hub vertices as time evolves. 3. Co-evolving networks: models of networks where dynamics on the network (directed random walks towards the root) modifies the structure of the network which then impacts behavior of subsequent dynamics. We will describe non-trivial phase transitions of condensation around the root and its connection to quasi-stationary distributions of 1-d random walks.
Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations

Kavita Ramanan (Brown University)
September 30, 2022

22613
Computer Science
Abstract We consider interacting particle systems on suitable convergent sequences of sparse (or heterogeneous graphs) and show that the limiting dynamics of the associated neighborhood empirical measure process (the so-called hydrodynamic limit) can be autonomously characterized in terms of a non-Markovian process. We then describe Markovian approximations to the latter and provide examples where they are exact. This includes joint work with G. Cocomello and A. Ganguly.
Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of Inhomogeneous Erd\H{o}s-R\'enyi Random Graphs

Frank den Hollander (Leiden University)
September 30, 2022

22612
Computer Science
Abstract We consider an inhomogeneous Erd\H{o}s-R\'enyi random graph $G_N$ with vertex set $[N] = \{1,\dots,N\}$ for which the pair of vertices $i,j \in [N]$, $i\neq j$, is connected by an edge with probability $r(\tfrac{i}{N},\tfrac{j}{N})$, independently of other pairs of vertices. Here, $r\colon\,[0,1]^2 \to (0,1)$ is a symmetric function that plays the role of a reference graphon. (1) Let $\lambda_N$ be the maximal eigenvalue of the \emph{adjacency matrix} of $G_N$. It is known that $\lambda_N/N$ satisfies an LDP with rate $N$ as $N \to \infty$. The associated rate function $\psi_r$ is given by a variational formula that involves the rate function $I_r$ of a large deviation principle on graphon space. We analyse this variational formula in order to identify the basic properties of $\psi_r$, especially when the reference graphon $r$ is of rank 1. (2) Let $\hat\lambda_N$ be the maximal eigenvalue of the \emph{Laplacian matrix} of $G_N$. We show that $\hat\lambda_N/N$ satisfies a downward LDP with rate $\binom{N}{2}$ and an upward LDP with rate $N$. The associated rate functions $\hat\psi_r$ and $\hat\psi^*_r$ are those of the downward LDP and upward LDP for the maximal degree in the graph. We identify the basic properties of both $hat\psi_r$ and $\hat\psi^*_r$.

Title	Speaker(s)	Date	Info link
On The Exploration In Load-Balancing Under Unknown Service Rates	Weina Wang (CMU)	2022-10-07	View details
Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And Linear Function Approximation	Siva Theja Maguluri (Georgia Institute of Technology)	2022-10-07	View details
The Compensated Coupling (or Why the Future is the Best Guide for the Present)	Sid Banerjee (Cornell University)	2022-10-07	View details
Higher-Dimensional Expansion of Random Geometric Complexes	Tselil Schramm (Stanford University)	2022-10-07	View details
On the Power of Preconditioning in Sparse Linear Regression	Raghu Meka (UCLA)	2022-10-07	View details
What Functions Do Transformers Prefer to Represent?	Surbhi Goel (Microsoft Research and University of Pennsylvania)	2022-10-07	View details
Optimality of Variational Inference for Stochastic Block Model	Olga Klopp (ESSEC Business School)	2022-09-30	View details
Machine Learning on Large-Scale Graphs	Luana Ruiz (University of Pennsylvania)	2022-09-30	View details
Survey on Sparse Graph Limits + A Toy Example	Mei Yin (University of Denver)	2022-09-30	View details
Long Range Dependence in Evolving Networks	Shankar Bhamidi (University of North Carolina, Chapel Hill)	2022-09-30	View details
Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations	Kavita Ramanan (Brown University)	2022-09-30	View details
Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of Inhomogeneous Erd\H{o}s-R\'enyi Random Graphs	Frank den Hollander (Leiden University)	2022-09-30	View details

Supported by

Format results

On The Exploration In Load-Balancing Under Unknown Service Rates

Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And Linear Function Approximation

The Compensated Coupling (or Why the Future is the Best Guide for the Present)

Higher-Dimensional Expansion of Random Geometric Complexes

On the Power of Preconditioning in Sparse Linear Regression

What Functions Do Transformers Prefer to Represent?

Optimality of Variational Inference for Stochastic Block Model

Machine Learning on Large-Scale Graphs

Survey on Sparse Graph Limits + A Toy Example

Long Range Dependence in Evolving Networks

Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations

Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of Inhomogeneous Erd\H{o}s-R\'enyi Random Graphs

On The Exploration In Load-Balancing Under Unknown Service Rates

Sample Complexity Of Policy-Based Methods Under Off-Policy Sampling And Linear Function Approximation

The Compensated Coupling (or Why the Future is the Best Guide for the Present)

Higher-Dimensional Expansion of Random Geometric Complexes

On the Power of Preconditioning in Sparse Linear Regression

What Functions Do Transformers Prefer to Represent?

Optimality of Variational Inference for Stochastic Block Model

Machine Learning on Large-Scale Graphs

Survey on Sparse Graph Limits + A Toy Example

Long Range Dependence in Evolving Networks

Stochastic Processes on Sparse Graphs: Hydrodynamic Limits and Markov Approximations

Large Deviation Principle for the Norm of the Adjacency Matrix and the Laplacian Matrix of Inhomogeneous Erd\H{o}s-R\'enyi Random Graphs