The Best Wordle Starting Word
Wordle is one of those games that took our post-pandemic brains by storm. It captivated (and distracted) millions around the world, at a time when we needed it. I remember sharing my results with friends and family, and feeling a competitive spark to beat the puzzle in fewer guesses than them. So naturally, my engineering instincts kicked in as I challenged myself to find the best starting word.
The code I wrote for this article is available on GitHub.
An Overview of Wordle
(Feel free to skip this section if you already know the rules.)
Wordle is a guessing game. Your task is to guess the random 5-letter word of the day, by entering one guess at a time, up to a total of 6 times. Each time you guess a word, any letters that your guess has in common with the answer will be revealed in either yellow or green: yellow if the letter is correct, but not the position, and green if both the letter and position are correct. Because each guess eliminates future possibilities, it's important to use a good starting word. For example, you wouldn't want to start with a word like ZULUS
, as it has the uncommon letter Z
, and a repeating vowel U
.
A Starting Point
The first guess is the most important. The board is a blank canvas, waiting for you to start eliminating possibilities. In order to come up with the ideal starting word, we need to find a word that has the highest probability of having matching letters out of a randomly chosen 5-letter word. To do this, we can start by counting the frequencies of each letter in the set of 5-letter words:
Those of you with a keen eye may have noticed that the first few letters match up closely with the starting letters in the final round of Wheel of Fortune R S T L N E
. Though A
appears to be more common than E
in the chart above, this is only because we're limiting ourselves to 5-letter words. If we again count using all English words of any length, we find that E
occurs roughly 27.2% more than A
!
Breaking Down the Results
A
is the most common vowel, coming in at 8,392 uses. S
is the most common consonant, clocking in at 6,537 appearances. The next most-common letter is R
, at 5,143 uses. We could stop right here and come up with a word that contains all of the top letters: A E S O R
such as AROSE
. That's a pretty good starting word, but we can do better.
Why We Can't "Arise" to Victory
Why isn't AROSE
the best word? After all, it uses all of the top five letters we just calculated. It turns out we overlooked one important detail: letter positions. If all letters had an equal 1/5 or 20% chance of appearing in any given position in a word, "arose" would be one of the best words. However, this is not the case. For example, S
is almost 6% more likely to appear as the first letter than the fourth letter in a word.
j
The chances of the letter
S
appearing in each letter position of a word.
The chances of each letter in "arose" appearing in their respective positions for a randomly selected word.
A Less Naive Approach
What if we assigned each word a score based on the combined total of each letter's likelihood of appearing in its respective position?
// A hash map of every letter along with its positional frequencies
const letterFrequencies = {'a': [3000, 1500, 500, 400, 1000], ...}
// Generates a score for each word based on letter frequencies
function calculateWordScore(word, letterFrequencies) {
let score = 0;
const keys = Object.keys(letterFrequencies);
for (let i = 0; i < word.length; i++) {
if (!keys.includes(word[i])) continue;
score += letterFrequencies[word[i]][i];
}
return score;
}
Using this approach on every 5-letter word gets us these top five words:
Word | Score |
---|---|
sanes | 11579 |
sales | 11401 |
sores | 11295 |
cares | 11268 |
bares | 11213 |
We're very close now. An astute observer might note that all of these words are pluralized. Unfortunately, there's a lesser-known rule of Wordle: they will never choose a pluralized word. Let's account for this simply by lowering the letter S
score when it's in the fifth position of a word:
letterFrequencies.s[4] = 500;
Choosing 500
was a bit of trial and error. A more optimal approach would be to go through all 15,918 5-letter words and eliminate the plural ones. This is an exercise left for the reader. 😉
Adjusted Top-10 Scoring Words
Word | Score |
---|---|
saree | 10610 |
soree | 10020 |
carey | 9805 |
sooey | 9442 |
siree | 9408 |
coree | 9403 |
boree | 9348 |
salet | 9343 |
saner | 9326 |
sayee | 9295 |
There's still a slight issue here. Some of these words are so uncommon that Wordle excludes them from its dictionary. In fact, the top 7 words are not guessable. That leaves us with SALET
as our first guessable word. A word which means:
a light medieval helmet, usually with a vision slit or a movable visor.
How does this compare to our naive approach of "arose"?
Subtracting the differences between each letter, we get an average improvement of 4.1% per letter. This translates to a slightly higher chance of getting both the letter and position correct with "salet" vs "arose".
And just for fun, I calculated one of the worst starting words:
Conclusion and Potential Improvements
I hope you enjoyed reading about this challenge as much as I enjoyed writing about it. Using letter-frequency-analysis we figured out that SALET
is the ideal starting word. What about future improvements?
Finding the Next Best Guess
What if we could feed the results of the first guess back into our program? One way to do this would be to use the matching and missed letters to reduce the number of available words, then recalculate the frequencies like we did for the first guess. Each subsequent guess will narrow down the potential words and have a new set of words with the highest letter and positional frequency probability.