(2020). Zap Q-learning with Nonlinear Function Approximation. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/tbd-244
MLA
Zap Q-learning with Nonlinear Function Approximation. The Simons Institute for the Theory of Computing, Dec. 02, 2020, https://simons.berkeley.edu/talks/tbd-244
BibTex
@misc{ scivideos_16824,
doi = {},
url = {https://simons.berkeley.edu/talks/tbd-244},
author = {},
keywords = {},
language = {en},
title = {Zap Q-learning with Nonlinear Function Approximation},
publisher = {The Simons Institute for the Theory of Computing},
year = {2020},
month = {dec},
note = {16824 see, \url{https://scivideos.org/Simons-Institute/16824}}
}
Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.