Video URL

https://simons.berkeley.edu/talks/alternative-softmax-operator-reinforcement-learning

An Alternative Softmax Operator for Reinforcement Learning

(2020). An Alternative Softmax Operator for Reinforcement Learning. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/alternative-softmax-operator-reinforcement-learning

An Alternative Softmax Operator for Reinforcement Learning. The Simons Institute for the Theory of Computing, Oct. 30, 2020, https://simons.berkeley.edu/talks/alternative-softmax-operator-reinforcement-learning

          @misc{ scivideos_16708,
            doi = {},
            url = {https://simons.berkeley.edu/talks/alternative-softmax-operator-reinforcement-learning},
            author = {},
            keywords = {},
            language = {en},
            title = {An Alternative Softmax Operator for Reinforcement Learning},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {oct},
            note = {16708 see, \url{https://scivideos.org/Simons-Institute/16708}}
          }

Michael Littman (Brown University)

October 30, 2020

Talk number16708

Source RepositorySimons Institute

Subject

Computer Science

Abstract

A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study a differentiable softmax operator that, among other properties, is a non-expansion ensuring a convergent behavior in learning and planning. We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice. (With Kavosh Asadi.)

Supported by

Video URL

An Alternative Softmax Operator for Reinforcement Learning

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

An Alternative Softmax Operator for Reinforcement Learning

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production