Hi! Since 2020, we’ve been developing a reinforcement learning (RL) agent to beat the 1996 game Pokémon Red.
As of February 2025, we are able to beat Pokémon Red with Reinforcement Learning using a <10 million parameter policy (60500x smaller than DeepSeekV3) and with minimal simplifications. The output is not a policy capable of beating Pokémon, but a technique for producing solutions to Pokémon. This website describes the system’s current state. All code is open sourced and available for you, the reader, to try
.
As improvements to the codebase are made, the changelog will be updated.
What is Pokémon Red?
#
Pokémon Red, released in 1996, is a single player Japanese role playing game (JRPG) that follows the journey of a new “Pokémon Trainer.” Players capture Pokémon “creatures” to battle against opposing Pokémon,
explore the world and progress through the game’s storyline. Pokémon has two goals:
- Catch all possible Pokémon species.
- Become the “champion.”
We focused on the second (more popular) goal, becoming the champion.
Why Pokémon Red
#
Why do we care about developing an agent to beat Pokémon with machine learning?
The answer is really a bit higher level. We believe solving JRPGs with reinforcement learning provide extremely difficult challenges not present in current RL environments. It is our hope that JRPGs will provide a great benchmark for improving AI.
- Can be just as complex as games like Go, StarCraft II or Minecraft.
- Involve complex reasoning and decision making.
- Are nonlinear.
- Can be long, with > 24 hours average human gameplay. Pokémon Red takes 25 hours on average for a new player to complete.
- Require mult
15 Comments
jononor
Very nice! Nice to see demonstrations of reinforcement learning being used to solve non-trivial tasks.
xinpw8
This is a first-in-world, isn't it?
worble
Heads up, clicking "Next Page" just takes you to an empty screen, you have to use the navigation links on the left if you want to get read past the first screen.
bee_rider
Ah, very neat.
Maybe some day the “rival” character in Pokemon can be played by a RL system, haha. That way you can have a “real player (simulated)” for your rival.
modeless
Can't Pokemon be beaten by almost random play?
bubblyworld
What an awesome project! I'm curious – I would have thought that rewarding unique coordinates would be enough to get the agent to (eventually) explore all areas, including the key ones. What did the agents end up doing before key areas got an extra reward?
(and how on earth did you port Pokémon red to a RL environment? O.o)
rvz
Note: What makes this interesting is that this is a pre-LLM project which shows that in some projects you don't need an "LLM" for this. All you need is just a plain old reinforcement learning algorithm and a deep neural network which is perfect for this.
This is what I want to see more of and goes against the hype of LLMs. What a great RL project.
Meanwhile, "Claude" is still stuck somewhere in the game. Imagine the costs of running that vs this project.
mclau156
Could you have used the decompilations of pokemon on github? https://github.com/pret/pokered
levocardia
Really cool work. It seems like some critical areas (team rocket, safari zone) rely on encoding game knowledge into the reward function somehow, which "smuggles in" external intelligence about the game. A lot of these are related to planning, which makes me wonder whether you could "bolt on" an LLM to do things like steer the RL agent, dynamically choose what to reward, or even do some of the planning itself. Do you think there's any low-hanging fruit on this front?
differintegral
This is very cool, congrats!
I wonder, does anyone have a sense of the approximate raw number of button presses required to beat the game? Mostly curious to see how that compares to the parameter count.
benopal64
Incredible work. I am just learning about PyBoy from your project, and it made me think of many fun ways to use that library to play Pokemon autonomously.
kerkeslager
Are there any uses for AI yet that aren't either:
1. Doing things humans do for fun.
2. Doing things that AI is horribly terrible at.
?
nimish
Considering how many things are less complicated than Pokemon, this is very cool
novia
Please stream the gameplay to twitch so people can compare.
endofreach
> Pokémon Red takes 25 hours on average for a new player to complete.
Seriously? I've never really played video games, but i remember spending so much time on pokemon red when i was young. Not sure if i ever really finished more than once. But i'm pretty sure i must have played for more than 50h or so before even close to finish. My memory might trick me though.
Not sure which pokemon version it was, but i got so hooked trying to get this "secret" pokemon which was just a bunch of pixels. Some kind of bug (of the game, not the type of pokemon). You had to do specific things in a park and other things and then surf up and down x-times on the right shore of an island… or something like that.
I had no idea how it worked and got so hooked, i must have spent most of my playing time on things like that.
Oh boy, memories…