This post explores how many of the most popular gradient-based optimization algorithms actually work. Note: If you are looking for a review paper, this blog post is also available as an article on arXiv. Update 20.03.2020: Added a note on recent optimizers. Update 09.02.2018: Added AMSGrad. Update 24.11.2017: Most of the content in this article
