Video URL

https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present

The Compensated Coupling (or Why the Future is the Best Guide for the Present)

(2022). The Compensated Coupling (or Why the Future is the Best Guide for the Present) . The Simons Institute for the Theory of Computing. https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present

The Compensated Coupling (or Why the Future is the Best Guide for the Present) . The Simons Institute for the Theory of Computing, Oct. 07, 2022, https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present

          @misc{ scivideos_22708,
            doi = {},
            url = {https://old.simons.berkeley.edu/talks/compensated-coupling-or-why-future-best-guide-present},
            author = {},
            keywords = {},
            language = {en},
            title = {The Compensated Coupling (or Why the Future is the Best Guide for the Present) },
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {oct},
            note = {22708 see, \url{https://scivideos.org/simons-institute/22708}}
          }

Sid Banerjee (Cornell University)

October 07, 2022

Talk number22708

Source RepositorySimons Institute

Subject

Computer Science

Abstract

What makes online decision-making different from other decision-making/optimization problems? While it seems clear that the unique features are the sequential nature of taking actions and uncertainty in future outcomes, most techniques for solving such problems tend to obfuscate these features - so are these the best ways to think about these settings? I will present the compensated coupling: a simple paradigm for reasoning about and designing online decision-making policies, based on a sample-pathwise accounting of their performance compared to some benchmark policy. This approach generalizes many standard results used in studying Markov decision processes and reinforcement learning, but also gives us new policies which are much simpler and more effective than existing heuristics. For a large class of widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks, I will illustrate how these new algorithms achieve constant regret (i.e., additive loss compared to the hindsight optimal which is independent of the horizon and state-space) under a wide range of conditions. Time permitting, I will try and describe how we can use this technique to incorporate side information and historical data in these settings, and achieve constant regret with as little as a single data trace.

Supported by

Video URL

The Compensated Coupling (or Why the Future is the Best Guide for the Present)

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

The Compensated Coupling (or Why the Future is the Best Guide for the Present)

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production