22708

The Compensated Coupling (or Why the Future is the Best Guide for the Present)

APA

(2022). The Compensated Coupling (or Why the Future is the Best Guide for the Present) . The Simons Institute for the Theory of Computing. https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present

MLA

The Compensated Coupling (or Why the Future is the Best Guide for the Present) . The Simons Institute for the Theory of Computing, Oct. 07, 2022, https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present

BibTex

          @misc{ scivideos_22708,
            doi = {},
            url = {https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present},
            author = {},
            keywords = {},
            language = {en},
            title = {The Compensated Coupling (or Why the Future is the Best Guide for the Present) },
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {oct},
            note = {22708 see, \url{https://scivideos.org/simons-institute/22708}}
          }
          
Sid Banerjee (Cornell University)
Talk number22708
Source RepositorySimons Institute

Abstract

What makes online decision-making different from other decision-making/optimization problems? While it seems clear that the unique features are the sequential nature of taking actions and uncertainty in future outcomes, most techniques for solving such problems tend to obfuscate these features - so are these the best ways to think about these settings? I will present the compensated coupling: a simple paradigm for reasoning about and designing online decision-making policies, based on a sample-pathwise accounting of their performance compared to some benchmark policy. This approach generalizes many standard results used in studying Markov decision processes and reinforcement learning, but also gives us new policies which are much simpler and more effective than existing heuristics. For a large class of widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks, I will illustrate how these new algorithms achieve constant regret (i.e., additive loss compared to the hindsight optimal which is independent of the horizon and state-space) under a wide range of conditions. Time permitting, I will try and describe how we can use this technique to incorporate side information and historical data in these settings, and achieve constant regret with as little as a single data trace.