This textbook has received 5,000+ stars! Glad that it is helpful to many readers.
My English open course is online now. You can click the above figure or the link here to jump to our YouTube channel. You can also click the following links to be directed to specific lecture videos.
- Overview of Reinforcement Learning in 30 Minutes
- L1: Basic Concepts (P1-State, action, policy, …)
- L1: Basic Concepts (P2-Reward,return, Markov decision process)
- L2: Bellman Equation (P1-Motivating examples)
- L2: Bellman Equation (P2-State value)
- L2: Bellman Equation (P3-Bellman equation-Derivation)
- L2: Bellman Equation (P4-Matrix-vector form and solution)
- L2: Bellman Equation (P5-Action value)
- L3: Bellman Optimality Equation (P1-Motivating example)
- L3: Bellman Optimality Equation (P2-Optimal policy)
- L3: Bellman Optimality Equation (P3-More on BOE)
- L3: Bellman Optimality Equation (P4-Interesting properties)
- L4: Value Iteration and Policy Iteration (P1-Value iteration)
- L4: Value Iteration and Policy Iteration (P2-Policy iteration)
- L4: Value Iteration and Policy Iteration (P3-Truncated policy iteration)
- L5: Monte Carlo Learning (P1-Motivating examples)
- L5: Monte Carlo Learning (P2-MC Basic-introduction)
- L5: Monte Carlo Learning (P3-MC Basic-examples)
- L5: Monte Carlo Learning (P4-MC Exploring Starts)
- L5: Monte Carlo Learning (P5-MC Epsilon-Greedy-introduction)
- L5: Monte Carlo Learning (P6-MC Epsilon-Greedy-examples)
- L6: Stochastic Approximation and SGD (P1-Motivating example)
- L6: Stochastic Approximation and SGD (P2-RM algorithm: introduction)
- L6: Stochastic Approximation and SGD (P3-RM algorithm: convergence)
- L6: Stochastic Approximation and SGD (P4-SGD algorithm: introduction)
- L6: Stochastic Approximation and SGD (P5-SGD algorithm: examples)
- L6: Stochastic Approximation and SGD (P6-SGD algorithm: properties)
- L6: Stochastic Approximation and SGD (P7-SGD algorithm: comparison)
- L7: Temporal-Difference Learning (P1-Motivating example)
- L7: Temporal-Difference Learning (P2-TD algorithm: introduction)
- L7: Temporal-Difference Learning (P3-TD algorithm: convergence)
- L7: Temporal-Difference Learning (P4-Sarsa)
- L7: Temporal-Difference Learning (P5-Expected Sarsa & n-step Sarsa)
- L7: Temporal-Difference Learning (P6-Q-learning: introduction)
- L7: Temporal-Difference Learning (P7-Q-learning: pseudo code)
- L7: Temporal-Difference Learning (P8-Unified viewpoint and summary)
- L8: Value Function Approximation (P1-Motivating example–curve fitting)
- L8: Value Function Approximation (P2-Objective function)
- L8: Value Function Approximation (P3-Optimization algorithm)
- L8: Value Function Approximation (P4-illustrative examples and analysis)
- L8: Value Function Approximation (P5-Sarsa and Q-learning)
- L8: Value Function Approximation (P6-DQN–basic idea)
- L8: Value Function Approximation (P7-DQN–experience replay)
- L8: Value Function Approximation (P8-DQN–implementation and example)
- L9: Policy Gradient Methods (P1-Basic idea)
- L9: Policy Gradient Methods (P2-Metric 1–Average value)
- L9: Policy Gradient Methods (P3-Metric 2–Average reward)
- L9: Policy Gradient Methods (P4-Gradients of the metrics)
- L9: Policy Gradient Methods (P5-Gradient-based algorithms & REINFORCE)
- The lecture videos of the last chapter (Chapter 10) will be uploaded shortly. Please stay tuned!
You are warmly welcome to check out the English videos to see if they are helpful!
This book aims to provide a mathematical but friendly introduction to the fundamental concepts, basic problems, and classic algorithms in reinforcement learning. Some essential features of this book are highlighted as follows.
-
The book introduces reinforcement learning from a mathematical point of view. Hopefully, readers will not only know the procedure of an algorithm but also understand why it was designed in the first place and why it works effectively.
-
The depth of the mathematics is carefully controlled to an adequate level. The mathematics is also presented in a carefully designed manner to ensure that the book is friendly to read. Readers can selectively read the materials presented in gray boxes according to their interests.
-
Many illustrative examples are given to help readers better understand the topics. All the examples in this book are based on a grid world task, which is easy to understand and helpful for illustrating concepts and algorithms.
-
When introducing an algorithm, the book aims to separate its core idea from complications that may be distracting. In this way, readers can better grasp the core idea of an algorithm.
-
The contents of the book are coherently organized. Each chapter is built based on the preceding chapter and lays a necessary foundation for the subsequent one.
The topics addressed in the book are shown in the figure below. This book contains ten chapters, which can be classified into two parts: the first part is about basic tools, and the second part is about algorithms. The ten chapters are highly correlated. In general, it is necessary to study the earlier chapters first before the later ones.
This book is designed for senior undergraduate students, graduate students, researchers, and practitioners interested in reinforcement learning.
It does not require readers to have any background in reinforcement learning because it starts by introducing the most basic concepts. If the reader already has some background in reinforcement learning, I believe the book can help them understand some topics more deeply or provide different perspectives.
This book, however, requires the reader to have some knowledge of probability theory and linear algebra. Some basics of the required mathematics are also included in the appendix of this book.
By combining the book with my lecture videos, I believe you can study better.
-
Chinese lecture videos: You can check the Bilibili channel or the Youtube channel.
The lecture videos have received 1,300,000+ views up to Feb 2025 over the Internet and received very good feedback! -
English lecture videos: The English lecture videos have been uploaded to YouTube. Please see the links and details in another part of this document.
You can find my info on my homepage https://www.shiyuzhao.net/ (GoogleSite) and my research group website https://shiyuzhao.westlake.edu.cn
I
7 Comments
dualofdual
The best lectures on Reinforcement Learning and related topics are by Dimitris Bertsekas: https://web.mit.edu/dimitrib/www/home.html
lemonlym
Another great resource on RL is Mykel Kochenderfer's suite of textbooks:
https://algorithmsbook.com/
alkafdslk
[dead]
kristjansson
Also worth mentioning Murphy's WIP textbook[0] focused entirely on RL, which is an outgrowth of his excellent ML textbooks.
[0]: https://arxiv.org/abs/2412.05265
ivanbelenky
Awesome resource, in case someone is interested I implemented most of suttons book here https://github.com/ivanbelenky/RL
monadicmonad
I don't know how to go from understanding this material to having a job in the field. Just stuck as a SWE for now.
jgord
Highly recommended .. even the main contents diagram is a great visual overview of RL in general, as is the 30 minute intro YT video.
Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine
LLMs currently attract all the hype for good reasons, but Im surprised VCs dont seem to be looking at RL companies specifically.