Using the Wolfram Language
Published in · 11 min read · Jan 27, 2022
--
Every time I play Wordle I am reminded of this quote from Arthur Conan Doyle’s character Sherlock Holmes, when he solves another mysterious conundrum:
How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? — Sherlock Holmes
Wordle is a new daily word game which challenges you to guess a five-letter word in six rounds or fewer. After entering a guess, the game will tell you which letters are both correct and in the right place with a green highlight. Letters that are correct but not in the right place are indicated with yellow. Using this information you can eliminate large numbers of words and make a better guess each round to solve the game. And like Sherlock Holmes, you’re also in a race against the clock because you have to solve the game in six rounds.
In this post, I will use computational methods to explore ways to solve this game efficiently. While I don’t use these methods directly to solve the daily Wordle game myself (where would be the fun in that?!), doing this analysis has given me insights in how the game works “behind the scenes”. As a result, I feel like these insights help me play the game better and give me better outcomes.
Every round of Wordle gives you feedback on your guess, so you will be able to deduce more and more information about the solution. Specifically, you learn the following information:
- Letters that are in the solution
- Letters that are not in the solution
- Places where certain letters do appear
- Places where certain letters do not appear
- Number of times a letter appears in the solution
To explain the last item on this list: If you guess the word “RADAR” and Wordle responds with one “A” being marked with green or yellow, then you know that the solution only has one “A” in it. It’s a good strategy to start your first guess with a word that has five distinct letters, to cover as much ground as possible. But solution words do often contain repeated letters so at some point during the game you need to switch to guessing a likely repeated letter.
There are many ways to solve Wordle. Most people that I know start with a random word and go from there, based on the clues you get. Other people use the same two initial guesses, for example “RAISE” and “CLOUD” contain a lot of letters that are very common in the English language.
Starting with words that have letters that are common in the English language relies on tables of letter frequencies. For example, the word “RATES” contains more common letters than the word “EPOXY”. Starting with a word that has common letters is a good approach, because it lets you quickly eliminate many words:
If you pick a word with the most common letters and one or more of those common letters is not in the solution, then you have identified a large number of impossible solutions. This greatly reduces the number of possible solutions, which makes picking the correct solution in the next round more likely.
So focusing on the letter frequency should be part of a winning strategy.
Letter frequency tables exist for arbitrary words in the English language (or any other language for that matter), but here we only have to consider a very specific subset of the English dictionary: five letter words that Wordle accepts.
Wordle has its own unique five letter word dictionary that contains exactly 12,972 words. You can conveniently obtain this list of words with the Wolfram Language by loading the data from the following Data Repository object:
words = ResourceData[
ResourceObject["Wordle Word List"]
]
And we can examine what sort of words are in this list:
RandomSample[words,10] Out[] = {
"mawns", "alary", "maare",
"lower", "cesta", "muggy",
"mayed", "broth", "taler",
"brags"
}
There are some obvious English words like “lower”, “muggy”, “broth”, and “brags”, but this Wordle list also includes some very obscure words like “maare” and “alary”. I don’t know the source of this Wordle word list, so we’ll just take this list at face value and use it to play the game. I should point out that there is a subset of this list which contains the actual solution words for the next couple of years. In this post I am choosing to ignore that subset of solution words, since I consider that too much of a “cheat”.
Using the Wolfram Language, we can get the overall letter distribution of the Wordle words by joining all the words into a single very long string of 64,860 letters (12,972 words of 5 letters each) and using the CharacterCounts function to see how often each letter occurs:
KeySort[CharacterCounts[StringJoin[words]]] Out[] = <|
"a" -> 5990, "b" -> 1627, "c" -> 2028, "d" -> 2453,
"e" -> 6662, "f" -> 1115, "g" -> 1644, "h" -> 1760,
"i" -> 3759, "j" -> 291, "k" -> 1505, "l" -> 3371,
"m" -> 1976, "n" -> 2952, "o" -> 4438, "p" -> 2019,
"q" -> 112, "r" -> 4158, "s" -> 6665, "t" -> 3295,
"u" -> 2511, "v" -> 694, "w" -> 1039, "x" -> 288,
"y" -> 2074, "z" -> 434
|>
So: StringJoin smushes all the 12,972 words together in a single string and then CharacterCounts counts how many times each letter appears. The KeySort function then sorts the result alphabetically.
We can see that the letter Q is uncommon (appears only 112 times in all words) and the letter E is very common (appears 6,662 times). Here is a BarChart that shows all the letter counts:
We can go one step further and look at how common each letter is in a specific position in a five letter word. For example, how common is the letter “E” in the third position? Here is the Wolfram Language code that achieves that:
Dataset[
Map[
Join[
AssociationThread[CharacterRange["a", "z"] -> 0],
Counts[#]
] &,
Transpose[Map[Characters, words]]
],
Alignment -> Right,
MaxItems -> {All, All}
]
This code looks a little more complicated, but it just counts how many times each letter occurs in a specific place in a five letter word. It gives the following output in a notebook:
So, for example, given the Wordle list of words, the number of times the letter “E” occurs in the third position is 882 times. Here is a visually more appealing version of this table:
Now that we have this table of values, we can assign a score to every five letter word based on how common each letter is in a particular position. For example the word “RAISE” has a score of 628+2263+1051+516+1522=5980. This is calculated by looking up the letter in the corresponding row of the table. For example, the fourth letter in the word, S, has a value of 516 in the fourth row of the table.
Assigning a score to each word gives you a way to rank words. Words with high scores have common letters in places where they are occur the most.
We can do this “score” computation for every word in any five letter word list, and the following Wolfram Language code does exactly that. It looks a little bit more complicated, but all it does is look at every word in a list that you give to the function and compute its score:
wordScores[words_] := Module[{letters, a, e},
letters = CharacterRange["a", "z"];
a = Map[
Values[
Join[
AssociationThread[CharacterRange["a", "z"] -> 0],
KeySort[Counts[#]]]] &,
Transpose[Map[Characters, words]]
];
Echo[Grid[a]];
Map[
Function[{word},
e = Transpose[{
Range[5],
Flatten[Position[letters, #] & /@ Characters[word]]
}];
word -> Total[Extract[a, e]]
],
words
]
]
When we call this “wordScores” function with the initial list of words, we get the following:
wordleScores = wordScores[words];
Take[ReverseSortBy[wordleScores, Last], 25] Out[] = {
"sores" -> 11144, "sanes" -> 11077, "sales" -> 10961,
"sones" -> 10910, "soles" -> 10794, "sates" -> 10729,
"seres" -> 10676, "cares" -> 10668, "bares" -> 10655,
"sames" -> 10624, "pares" -> 10605, "tares" -> 10561,
"sades" -> 10503, "cores" -> 10501, "bores" -> 10488,
"sages" -> 10477, "sabes" -> 10448, "senes" -> 10442,
"mares" -> 10439, "pores" -> 10438, "canes" -> 10434,
"sires" -> 10431, "dares" -> 10431, "banes" -> 10421,
"tores" -> 10394
}
The highest scoring word is “SORES” which makes sense because all the letters are very common and in very common places (for example a lot of words end with the letter S). In this article I do not filter out words with repeated letters, like “SORES”. Some people prefer five distinct letters in their initial guesses, which is only a slight variation on the approach I am taking here. The highest scoring word with no repeating letters here is “CARES”.
So now we can use “SORES” as our starting point for an actual game of Wordle:
The conclusion we can draw from the game feedback is that the letters S, R, and E do not appear in the word, but the letter O does appear and it appears in the second position. We can write a little snippet of code to remove all the impossible solutions and get a list of 520 words that can still be a solution:
words2 = Select[words,
And[
Not[StringContainsQ[#, "s" | "r" | "e"]],
StringTake[#, {2}] == "o"
] &
];
Length[words2] Out[]= 520
We can again compute the highest scoring word, but now based on those 520 remaining words:
scores2 = wordScores[words2];
Take[ReverseSortBy[scores2, Last], 25] Out[] = {
"cooly" -> 882, "booay" -> 862, "colly" -> 856,
"cooky" -> 853, "mooly" -> 850, "dooly" -> 849,
"conky" -> 848, "copay" -> 845, "gooly" -> 844,
"coomy" -> 844, "ponty" -> 842, "hooly" -> 841,
"booty" -> 839, "moony" -> 836, "monty" -> 834,
"polly" -> 832, "coaly" -> 831, "bonny" -> 831,
"loony" -> 830, "hooty" -> 830, "goony" -> 830,
"donny" -> 830, "pongy" -> 829, "poncy" -> 829,
"wooly" -> 827
}
This gives the following result:
There are no new “good” letters, but now we can additionally eliminate the letters C, L, and Y. We also now know that there is only one O in the word:
words3 = Select[words2,
And[
Not[StringContainsQ[#, "c" | "l" | "y"]],
StringCount[#, "o"] == 1
] &
];
Length[words3] Out[]= 117
So we have further reduced the list of possible words to 117. The best scoring guesses are now:
scores3 = wordScores[words3];
Take[ReverseSortBy[scores3, Last], 25] Out[]= {
"gonna" -> 210, "donna" -> 208, "gonia" -> 201,
"ponga" -> 199, "donga" -> 196, "tonga" -> 195,
"honan" -> 194, "honda" -> 193, "gonad" -> 192,
"downa" -> 191, "wonga" -> 190, "tonka" -> 189,
"pound" -> 189, "monad" -> 189, "hound" -> 189,
"bonza" -> 187, "podia" -> 186, "fonda" -> 186,
"mound" -> 185, "bound" -> 185, "donah" -> 184,
"douma" -> 183, "zonda" -> 182, "tomia" -> 182,
"found" -> 182
}
We pick the word with the highest score, “GONNA”, which gives us the following feedback:
The additional information is that the letters G and A do not appear in the word. The letter N appears in the fourth position, and it is also the only letter N in the word. This leaves us with only 13 possible words:
words4 = Select[words3,
And[
Not[StringContainsQ[#, "g" | "a"]],
StringTake[#, {4}] == "n",
StringCount[#, "n"] == 1
] &
] Out[] = {
"boink", "bound", "found", "fount",
"hound", "joint", "mound", "mount",
"poind", "point", "pound", "pownd",
"wound"} Length[words4] Out[] = 13
The highest scoring words are:
scores4 = wordScores[words4];
ReverseSortBy[scores4, Last] Out[]= {
"pound" -> 46, "mound" -> 44, "found" -> 44,
"bound" -> 44, "wound" -> 43, "hound" -> 43,
"poind" -> 42, "mount" -> 40, "fount" -> 40,
"pownd" -> 39, "point" -> 38, "joint" -> 35,
"boink" -> 33
}
So let’s use “POUND” as the next guess:
We now have three known letters (OUN) and two letters that we can exclude (P and D). This whittles the possibilities down to two words:
words5 = Select[words4,
And[
Not[StringContainsQ[#, "p" | "d"]],
StringTake[#, 2 ;; 4] == "oun"
] &
] Out[]= {"fount", "mount"}
Both words have the same score of 9, so we can pick either one:
scores5 = wordScores[words5];
ReverseSortBy[scores5, Last] Out[]= {"mount" -> 9, "fount" -> 9}
Going with “MOUNT”, we get the final solution:
If “MOUNT” had been wrong we would have been left with only “FOUNT” as the final possibility, so either way we would have solved the game in six turns!
And that’s it!
“Excellent! I cried. “Elementary,” said he. — Dr. Watson & Sherlock Holmes
To summarize this strategy: We loaded the list of allowed Wordle words into the Wolfram Language. We then computed a table of values which represented how often a specific letter occurs in a specific position of a five letter word. Using that table we then computed a list of words with high scores, in the hopes of optimizing our chances of guessing the word (or nailing down as many letters as possible).
At every round of the game we removed “impossible” words based on the game’s feedback mechanism, and then generated a new “most likely” word, based on the remaining word list. Using this approach we systematically whittled down the possibilities from 12972, to 520, to 117, to 13, and finally to 2 equally scoring words. It’s interesting to see how important the first guess is. It eliminated more than 95% of the possibilities. Even when you have a disappointing first guess (no matching letters at all), you are still eliminating many many words.
While there is no guarantee this will always yield success in six rounds of the game, it does do a good job of systematically removing a lot of words in each round.