Hey all, I’ve been enjoying wordle recently but also wondered how a program might do at it.
My first step was to extract the lists of words from the wordle site. Interestingly, there is a “target” list of 2315 words – the words that may be the answer, but additionally a list of 10657 possible additional guesses – words that users can guess that are valid but will never be the answer. If you want these lists there’s a Python-formatted couple of sets
representing them in the repo below.
My immediate first thought was I should use the frequency of letters in the english language to inform my guess strategy. However, I realized there was a better way: Use the frequency of letters in the target list! That’s what really matters, yeah? No etaoin shrdlu for me!
I also had the thought, that I could measure the frequency of letters in position. For the first position of a 5 letter word, perhaps the frequency of consonants is higher than vowels? Or perhaps, I am wrong, there.
I then thought, why don’t I measure this frequency for the possible targets based on previous guesses, by pruning the wordlist for each new guess?
I eventually wrote this all in a program with the following steps:
- Figure out, for the target list of words, what the frequency of letters per positions 1-5 is.
- Pick the word that is most likel