Format results
- Akshay Krishnamurthy (Microsoft Research)
Special Topics in Astrophysics - Numerical Hydrodynamics - Lecture 13
Daniel Siegel University of Greifswald
Multiplayer Bandit Learning - From Competition to Cooperation
Simina Branzei (Purdue University)Multi-Player Multi-Armed Bandit: Can We Still Collaborate at Homes Without "Zoom"?
Yuanzhi Li (Carnegie Mellon University)A note on dual gravitational charges
Roberto Olivieri Observatoire de Paris
Country-Scale Bandit Implementation for Targeted COVID-19 Testing
Hamsa Bastani (Wharton School of the University of Pennsylvania)Borcherds algebras and 2d string theory
Sarah Harrison Stanford University
Moving Closer to a Detection of nHz-frequency Gravitational Waves with NANOGrav
Scott Ransom National Radio Astronomy Observatory (NRAO)
Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping
David Goldberg (Cornell ORIE)Nilpotent Slodowy slices and W-algebras
Anne Moreau University of Poitiers
Measurement of quantum fields in curved spacetimes
Chris Fewster University of York
Representation Learning and Exploration in Reinforcement Learning
Akshay Krishnamurthy (Microsoft Research)I will discuss new provably efficient algorithms for reinforcement in rich observation environments with arbitrarily large state spaces. Both algorithms operate by learning succinct representations of the environment, which they use in an exploration module to acquire new information. The first algorithm, called Homer, operates in a block MDP model and uses a contrastive learning objective to learn the representation. On the other hand, the second algorithm, called FLAMBE, operates in a much richer class of low rank MDPs and is model based. Both algorithms accommodate nonlinear function approximation and enjoy provable sample and computational efficiency guarantees.Special Topics in Astrophysics - Numerical Hydrodynamics - Lecture 13
Daniel Siegel University of Greifswald
Multiplayer Bandit Learning - From Competition to Cooperation
Simina Branzei (Purdue University)The stochastic multi-armed bandit model captures the tradeoff between exploration and exploitation. We study the effects of competition and cooperation on this tradeoff. Suppose there are k arms and two players, Alice and Bob. In every round, each player pulls an arm, receives the resulting reward, and observes the choice of the other player but not their reward. Alice's utility is ΓA+λΓB (and similarly for Bob), where ΓA is Alice's total reward and λ∈[−1,1] is a cooperation parameter. At λ=−1 the players are competing in a zero-sum game, at λ=1, they are fully cooperating, and at λ=0, they are neutral: each player's utility is their own reward. The model is related to the economics literature on strategic experimentation, where usually players observe each other's rewards. With discount factor β, the Gittins index reduces the one-player problem to the comparison between a risky arm, with a prior μ, and a predictable arm, with success probability p. The value of p where the player is indifferent between the arms is the Gittins index g=g(μ,β)>m, where m is the mean of the risky arm. We show that competing players explore less than a single player: there is p∗∈(m,g) so that for all p>p∗, the players stay at the predictable arm. However, the players are not myopic: they still explore for some p>m. On the other hand, cooperating players explore more than a single player. We also show that neutral players learn from each other, receiving strictly higher total rewards than they would playing alone, for all p∈(p∗,g), where p∗ is the threshold from the competing case. Finally, we show that competing and neutral players eventually settle on the same arm in every Nash equilibrium, while this can fail for cooperating players. This is based on a joint work with Yuval Peres.Multi-Player Multi-Armed Bandit: Can We Still Collaborate at Homes Without "Zoom"?
Yuanzhi Li (Carnegie Mellon University)Multi-armed bandit is a well-established area in online decision making: Where one player makes sequential decisions in a non-stationary environment to maximize his/her accumulative rewards. The multi-armed bandit problem becomes significantly more challenging when there are multiple players in the same environment, while only one piece of reward is presented at a time for each arm. In this setting, if two players pick the same arm at the same round, they are only able to get one piece of reward instead of two. To maximize the reward, players need to collaborate to avoid "collision" -- i.e. they need to make sure that they do not all rush to the same arm (even if it has the highest reward). We consider the even more challenging setting where communications between players are completely disabled: e.g. they are separated in different places of the world without any "Zoom". We show that nearly optimal regret can still be obtained in this setting: Players can actually collaborate in a non-stationary environment without any communication (of course, without quantum entanglement either :))A note on dual gravitational charges
Roberto Olivieri Observatoire de Paris
Dual gravitational charges (DGCs) have been originally computed in the first-order formalism by means of covariant phase space methods using tetrad variables. I show i) why DGCs do not arise using the metric variables and ii) how they can be set to zero by exploiting the freedom to add exact 3-forms to the symplectic potential.
Without exploiting that freedom, DGCs can be understood as Hamiltonian charges associated to the Kosmann variation. I then discuss the implications of this observation for asymptotic symmetries and comment about subleading contributions thereof.
Finally, I also show that DGCs can be equally derived by means of cohomological methods. In this case, DGCs depends on the order of the Lagrangian: they exist only in the first-order formalism.
Country-Scale Bandit Implementation for Targeted COVID-19 Testing
Hamsa Bastani (Wharton School of the University of Pennsylvania)In collaboration with the Greek government, we use machine learning to manage the threat of COVID-19. With tens of thousands of international visitors every day, Greece cannot test each visitor to ensure that they are not a carrier of COVID-19. We developed a bandit policy that balances allocating scarce tests to (i) continuously monitor the dynamic infection risk of passengers from different locations (exploration), and (ii) preferentially target risky tourist profiles for testing (exploitation). Our solution is currently deployed across all ports of entry to Greece. I will describe a number of technical challenges, including severely imbalanced outcomes, batched/delayed feedback, high-dimensional arms, port-specific testing constraints, and transferring knowledge from (unreliable) public epidemiological data. Joint work with Kimon Drakopoulos and Vishal Gupta.Borcherds algebras and 2d string theory
Sarah Harrison Stanford University
Borcherds Kac-Moody (BKM) algebras are a generalization of familiar Kac-Moody algebras with imaginary simple roots. On the one hand, they were invented by Borcherds in his proof of the monstrous moonshine conjectures and have many interesting connections to new moonshines, number theory and the theory of automorphic forms. On the other hand, there is an old conjecture of Harvey and Moore that BPS states in string theory form an algebra that is in some cases a BKM algebra and which is based on certain signatures of BKMs observed in 4d threshold corrections and black hole physics. I will talk about the construction of new BKMs superalgebras arising from self-dual vertex operator algebras of central charge 12, and comment on their relation to physical string theories in 2 dimensions. Based on work with N. Paquette, D. Persson, and R. Volpato.
Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization
Negin Golrezaei (MIT)Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have O(T^{1/2}) (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into an O(T^{2/3})(approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds.Moving Closer to a Detection of nHz-frequency Gravitational Waves with NANOGrav
Scott Ransom National Radio Astronomy Observatory (NRAO)
Millisecond Pulsars (MSPs) have become reliable and
extremely stable workhorses of modern astronomy and physics. The
North American Nanohertz Observatory for Gravitational Waves, or
NANOGrav, has been observing growing numbers of these systems for over
15 years, and the data look great. High precision timing of almost 80
MSPs has provided unprecedented sensitivity to the gravitational wave
Universe at nHz-frequencies, where our upper limits are already
constraining the population of super-massive black hole binaries. But
our sensitivity is increasing each year as we continue to add MSPs to
our timing array and develop new techniques to remove systematics due
to the interstellar medium and the uncertain solar system ephemerides.
Meanwhile, though, our observations provide a wide variety of
astrophysics, such as new neutron star mass measurements and
constraints of the dense matter equation of state.Beating the Curse of Dimensionality in High-Dimensional Optimal Stopping
David Goldberg (Cornell ORIE)The sample and computational complexity of reinforcement learning algorithms under the generative model has been a central focus of the literature. Most existing approaches which can guarantee epsilon-optimality require a complexity either dependent on the size of the state space, or which scales exponentially in the time horizon T, in both cases suffering from the curse of dimensionality. Other approaches may be more efficient, but typically either make strong structural assumptions, or do not come with optimality guarantees. We reconcile this dilemma for the celebrated problem of high-dimensional optimal stopping, a special case of RL with important applications to financial engineering, operations research, economics and computer science, and well-known to be computationally challenging when the state space and time horizon are large. We propose the first algorithm with sample and computational complexity polynomial in T, and effectively independent of the underlying state space, which outputs epsilon-optimal stopping policies and value function approximations. Our algorithm is inspired by a novel connection between network flow theory in combinatorial optimization and the classical martingale duality theory for stopping problems, and can be viewed as a higher-order generalization of the celebrated prophet inequalities. Our approach yields an expansion for the value functions as an infinite sum, which is fundamentally different from standard expansions based on backwards induction, with the following properties : 1. each term is the expectation of an elegant recursively defined infimum; 2. the first k terms only require T^k samples to approximate; and 3. truncating at the first k terms yields a (normalized) error of 1/k. The terms of the associated sum can also be viewed as a provably good set of basis functions, which can be efficiently evaluated by simulation, in contrast with past approximate dynamic programming approaches which take a heuristic approach to deriving basis functions. Our work shows that although the state space is exponentially large, one need only explore it to the extent that one can compute these (few) basis functions, which can be done efficiently. Time permitting, we will also explore applications of our approach to other high-dimensional RL problems (including those arising in dynamic pricing and online combinatorial optimization), and discuss connections to parallel and quantum computing. Joint work with Ph.D. student Yilun Chen.Nilpotent Slodowy slices and W-algebras
Anne Moreau University of Poitiers
To any vertex algebra one can attach in a canonical way a certain Poisson variety, called the associated variety. Nilpotent Slodowy slices appear as associated varieties of admissible (simple) W-algebras. They also appear as Higgs branches of the Argyres-Douglas theories in 4d N=2 SCFT’s. These two facts are linked by the so-called Higgs branch conjecture. In this talk I will explain how to exploit the geometry of nilpotent Slodowy slices to study some properties of W-algebras whose motivation stems from physics. This is a joint work with Tomoyuki Arakawa and Jethro van Ekeren (still in preparation).
Measurement of quantum fields in curved spacetimes
Chris Fewster University of York
A standard account of the measurement chain in quantum mechanics involves a probe (itself a quantum system) coupled temporarily to the system of interest. Once the coupling is removed, the probe is measured and the results are interpreted as the measurement of a system observable. Measurement schemes of this type have been studied extensively in Quantum Measurement Theory, but they are rarely discussed in the context of quantum fields and still less on curved spacetimes.
In this talk I will describe how measurement schemes may be formulated for quantum fields on curved spacetime within the general setting of algebraic QFT. This allows the discussion of the localisation and properties of the system observable induced by a probe measurement, and the way in which a system state can be updated thereafter. The framework is local and fully covariant, allowing the consistent description of measurements made in spacelike separated regions. Furthermore, specific models can be given in which the framework may be exemplified by concrete calculations.
I will also explain how this framework can shed light on an old problem due to Sorkin concerning "impossible measurements" in which measurement apparently conflicts with causality.
The talk is based on work with Rainer Verch [Leipzig], (Comm. Math. Phys. 378, 851–889(2020), arXiv:1810.06512; see also arXiv:1904.06944 for a summary) and a recent preprint arXiv:2003.04660 with Henning Bostelmann and Maximilian H. Ruep [York].