16707

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

APA

(2020). On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods

MLA

On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing, Oct. 30, 2020, https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods

BibTex

          @misc{ scivideos_16707,
            doi = {},
            url = {https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods},
            author = {},
            keywords = {},
            language = {en},
            title = {On the Global Convergence and Approximation Benefits of Policy Gradient Methods},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {oct},
            note = {16707 see, \url{https://scivideos.org/Simons-Institute/16707}}
          }
          
Daniel Russo (Columbia University)
Talk number16707
Source RepositorySimons Institute

Abstract

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I’ll then zoom in on the special case of state aggregated policies and a proof showing that policy gradient converges to better policies than its relative, approximate policy iteration.