Video URL

https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-met…

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

(2020). On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods

On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing, Oct. 30, 2020, https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods

          @misc{ scivideos_16707,
            doi = {},
            url = {https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods},
            author = {},
            keywords = {},
            language = {en},
            title = {On the Global Convergence and Approximation Benefits of Policy Gradient Methods},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {oct},
            note = {16707 see, \url{https://scivideos.org/index.php/Simons-Institute/16707}}
          }

Daniel Russo (Columbia University)

October 30, 2020

Talk number16707

Source RepositorySimons Institute

Subject

Computer Science

Abstract

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties ‚Äì shared by several canonical control problems ‚Äì that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I‚Äôll then zoom in on the special case of state aggregated policies and a proof showing that policy gradient converges to better policies than its relative, approximate policy iteration.

Supported by

Video URL

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production

Video URL

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

APA

MLA

BibTex

Abstract

Intro to Meta-Complexity: Part 2

Intro to Meta-Complexity: Part 1

Aggregative Efficiency of Bayesian Learning in Networks

(Relaxing) Common Belief for Social Networks

Organizing Modular Production