Format results
A Game-Theoretic Approach to Offline Reinforcement Learning
Ching-An Cheng (Microsoft Research)Where are Milky Way’s Hadronic PeVatrons?
Takahiro Sudo Ohio State University
The Statistical Complexity of Interactive Decision Making
Dylan Foster (Microsoft Research)The reconstruction of the CMB lensing bispectrum
Alba Kalaja University of Groningen
Quantum Field Theory I - Lecture 221011
Gang Xu Perimeter Institute for Theoretical Physics
PIRSA:22100048Relativity - Lecture 221011
PIRSA:22100075A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning
Siva Theja Maguluri (Georgia Institute of Technology)Stochastic Bin Packing with Time-Varying Item Sizes
Weina Wang (Carnegie Mellon University)
Entropy-Area Law from Interior Semi-classical Degrees of Freedom
Yuki Yokokura RIKEN
Can degrees of freedom in the interior of black holes be responsible for the entropy-area law? If yes, what spacetime appears? In this talk, I answer these questions at the semi-classical level. Specifically, a black hole is considered as a bound state consisting of many semi-classical degrees of freedom which exist uniformly inside and have maximum gravity. The distribution of their information determines the interior metric through the semi-classical Einstein equation. Then, the interior is a continuous stacking of AdS_2 times S^2 without horizon or singularity and behaves like a local thermal state. Evaluating the entropy density from thermodynamic relations and integrating it over the interior volume, the area law is obtained with the factor 1/4 for any interior degrees of freedom. Here, the dynamics of gravity plays an essential role in changing the entropy from the volume law to the area law. This should help us clarify the holographic property of black-hole entropy. [arXiv: 2207.14274]
Zoom link: https://pitp.zoom.us/j/99386433635?pwd=VzlLV2U4T1ZOYmRVbG9YVlFIemVVZz09
Towards Explicit Discrete Holography: Aperiodic Spin Chains from Hyperbolic Tilings
Giuseppe Di Giulio University of Würzburg
The AdS/CFT correspondence is one of the most important breakthroughs of the last decades in theoretical physics. A recently proposed way to get insights on various features of this duality is achieved by discretizing the Anti-de Sitter spacetime. Within this program, we consider the Poincaré disk and we discretize it by introducing a regular hyperbolic tiling on it. The features of this discretization are expected to be identified in the quantum theory living on the boundary of the hyperbolic tiling. In this talk, we discuss how a class of boundary Hamiltonians can be naturally obtained in this discrete geometry via an inflation rule that allows constructing the tiling using concentric layers of tiles. The models in this class are aperiodic spin chains. Using strong-disorder renormalization group techniques, we study the entanglement entropy of these boundary theories, identifying a logarithmic growth in the subsystem size, with a coefficient depending on the bulk discretization parameters.
Zoom link: https://pitp.zoom.us/j/95849965965?pwd=eEx5Q0gxR2orR0dzS2pQbG8rR09oUT09
A Game-Theoretic Approach to Offline Reinforcement Learning
Ching-An Cheng (Microsoft Research)Offline reinforcement learning (RL) is a paradigm for designing agents that can learn from existing datasets. Because offline RL can learn policies without collecting new data or expensive expert demonstrations, it offers great potentials for solving real-world problems. However, offline RL faces a fundamental challenge: oftentimes data in real world can only be collected by policies meeting certain criteria (e.g., on performance, safety, or ethics). As a result, existing data, though being large, could lack diversity and have limited usefulness. In this talk, I will introduce a generic game-theoretic approach to offline RL. It frames offline RL as a two-player game where a learning agent competes with an adversary that simulates the uncertain decision outcomes due to missing data coverage. By this game analogy, I will present a systematic and provably correct framework to design offline RL algorithms that can learn good policies with state-of-the-art empirical performance. In addition, I will show that this framework reveals a natural connection between offline RL and imitation learning, which ensures the learned policies to be always no worse than the data collection policies regardless of hyperparameter choices.Where are Milky Way’s Hadronic PeVatrons?
Takahiro Sudo Ohio State University
Observations indicate the existence of natural particle accelerators in the Milky Way, capable of producing PeV cosmic rays (“PeVatrons”). Observations also indicate the existence of extreme sources in the Milky Way, capable of producing gamma-ray radiations above 100 TeV. If these gamma-ray sources are hadronic cosmic-ray accelerators, then they must also be neutrino sources. However, no neutrino sources have been detected. How can we consistently understand the observations of cosmic rays, gamma rays, and neutrinos? We point out two extreme scenarios are allowed: (1) the hadronic cosmic-ray accelerators and the gamma-ray sources are the same objects, so that neutrino sources exist and improved telescopes can detect them, versus (2) the hadronic cosmic-ray accelerators and the gamma-ray sources are distinct, so that there are no detectable neutrino sources. We discuss the nature of Milky Way’s highest energy gamma-ray sources and outline future prospects toward understanding the origin of hadronic cosmic rays.
Zoom link: https://pitp.zoom.us/j/91390039665?pwd=dGJ2b3VCbVFhUVpSelpjYzJHdk1Gdz09
The Statistical Complexity of Interactive Decision Making
Dylan Foster (Microsoft Research)A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. In this talk, we will introduce a new complexity measure, the Decision-Estimation Coefficient, which is necessary and sufficient for sample-efficient interactive learning. In particular, we will provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions, which attains a regret bound matching our lower bound, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results give a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds.The reconstruction of the CMB lensing bispectrum
Alba Kalaja University of Groningen
Weak gravitational lensing by the intervening large-scale structure (LSS) of the Universe is the leading non-linear effect on the anisotropies of the cosmic microwave background (CMB). The integrated line-of-sight gravitational potential that causes the distortion can be reconstructed from the lensed temperature and polarization anisotropies via estimators quadratic in the CMB modes. While previous studies have focused on the lensing power spectrum, upcoming experiments will be sensitive to the bispectrum of the lensing field, sourced by the non-linear evolution of structure. The detection of such a signal would provide additional information on late-time cosmological evolution, complementary to the power spectrum.
Zoom link: https://pitp.zoom.us/j/94880169487?pwd=dzRWcVRwQ2dVdWZ3N2RjOWU2RDUyZz09
Quantum Field Theory I - Lecture 221011
Gang Xu Perimeter Institute for Theoretical Physics
PIRSA:22100048Relativity - Lecture 221011
PIRSA:22100075A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning
Zaiwei Chen (Caltech)Reinforcement learning (RL) is a learning paradigm for large-scale sequential decision making problems in complex stochastic systems. Many modern RL algorithms solve the underlying Bellman fixed point equation using Stochastic Approximation (SA). This two-part tutorial presents an overview of our results on SA, and illustrate how they can be used to obtain sample complexity results of a large class of RL algorithms. Part I of the tutorial focuses on SA, a popular approach for solving fixed point equations when the information is corrupted by noise. We consider a type of SA algorithms for operators that are contractive under arbitrary norms (especially the l-infinity norm). We present finite sample bounds on the mean square error, which are established using a Lyapunov framework based on infimal convolution and generalized Moreau envelope. We then present our recent result on exponential concentration of the tail error, even when the iterates are not bounded by a constant. These tail bounds are obtained using exponential supermartingales in conjunction with the Moreau envelop and bootstrapping. Part II of the tutorial focuses on RL. We briefly illustrate the connection between RL algorithms and SA of contractive operators, and highlight the importance of the infinity norm. We then exploit the results from Part I, to present finite sample bounds of various RL algorithms including on policy and off policy algorithms, both in tabular and linear function approximation settings.A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With Applications in Reinforcement Learning
Siva Theja Maguluri (Georgia Institute of Technology)Reinforcement learning (RL) is a learning paradigm for large-scale sequential decision making problems in complex stochastic systems. Many modern RL algorithms solve the underlying Bellman fixed point equation using Stochastic Approximation (SA). This two-part tutorial presents an overview of our results on SA, and illustrate how they can be used to obtain sample complexity results of a large class of RL algorithms. Part I of the tutorial focuses on SA, a popular approach for solving fixed point equations when the information is corrupted by noise. We consider a type of SA algorithms for operators that are contractive under arbitrary norms (especially the l-infinity norm). We present finite sample bounds on the mean square error, which are established using a Lyapunov framework based on infimal convolution and generalized Moreau envelope. We then present our recent result on exponential concentration of the tail error, even when the iterates are not bounded by a constant. These tail bounds are obtained using exponential supermartingales in conjunction with the Moreau envelop and bootstrapping. Part II of the tutorial focuses on RL. We briefly illustrate the connection between RL algorithms and SA of contractive operators, and highlight the importance of the infinity norm. We then exploit the results from Part I, to present finite sample bounds of various RL algorithms including on policy and off policy algorithms, both in tabular and linear function approximation settings.Stochastic Bin Packing with Time-Varying Item Sizes
Weina Wang (Carnegie Mellon University)In today's computing systems, there is a strong contention between achieving high server utilization and accommodating the time-varying resource requirements of jobs. Motivated by this problem, we study a stochastic bin packing formulation with time-varying item sizes, where bins and items correspond to servers and jobs, respectively. Our goal is to answer the following fundamental question: How can we minimize the number of active servers (servers running at least one job) given a budget for the cost associated with resource overcommitment on servers? We propose a novel framework for designing job dispatching policies, which reduces the problem to a policy design problem in a single-server system through policy conversions. Through this framework, we develop a policy that is asymptotically optimal as the job arrival rate increases. This is a joint work with Yige Hong at Carnegie Mellon University and Qiaomin Xie at University of Wisconsin–Madison.Constant Regret in Exchangeable Action Models: Overbooking, Bin Packing, and Beyond
Daniel Freund (MIT)Many problems in online decision-making can be viewed through a lens of exchangeable actions: over a long time-horizon of length T some arrival process allows for many opportunities to take a particular action, but the exact timing of actions is irrelevant for the eventual outcome. Examples of this phenomenon include bin packing (where it does not matter when we put an item of a given size into a given bin), knapsack (where it does not matter when we accept an item of a given value), and a range of matching problems among others (where it does not matter when we assign a request type to a given resource). In this talk we survey a number of results that capture the conditions under which such problems give rise to algorithms with uniform loss guarantees, i.e., where the additive loss relative to an optimal solution in hindsight is bounded independent of the horizon length. Our conditions characterize permissible geometries of the objective function, minimal requirements on the arrival process, and uncertainty about the underlying arrival processes, including the length of the time horizon. Based on joint works with Jiayu (Kamessi) Zhao, Chamsi Hssaine, Sid Banerjee, Thodoris Lykouris, Wentao Weng, Elisabeth Paulson, and Brad Sturt