(2020). On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods
MLA
On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing, Oct. 30, 2020, https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods
BibTex
@misc{ scivideos_16707,
doi = {},
url = {https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods},
author = {},
keywords = {},
language = {en},
title = {On the Global Convergence and Approximation Benefits of Policy Gradient Methods},
publisher = {The Simons Institute for the Theory of Computing},
year = {2020},
month = {oct},
note = {16707 see, \url{https://scivideos.org/index.php/Simons-Institute/16707}}
}
Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I’ll then zoom in on the special case of state aggregated policies and a proof showing that policy gradient converges to better policies than its relative, approximate policy iteration.