Solving Wordle Using Monte-Carlo Tree Search, Reinforcement

Learning, and Custom AI Algorithm

Annie Pates (Elizabeth), Devin Quinn

Abstract

Wordle is a word guessing game where agents attempt to guess a predetermined five-letter word in six guesses using feedback from the game on the letters used in the previous guess and their placements. Many algorithms can be helpful in optimizing guesses and guessing the target word. We use Monte-Carlo Tree Search, Reinforcement Learning, and a custom Wordle algorithm explore possible strategies for solving Wordle puzzles. Reinforcement Learning, specifically Q-learning, performed best on our Wordle game, solving the puzzle successfully 84% of the time in an average of 4.9 guesses.

Introduction

Figure 1: Example of solved Wordle from the New York Times

Wordle is a simple word guessing game in which an agent has six attempts to guess a five-letter word. The environment consists of six rows of five blank tiles. As the agent submits guesses, feedback is provided in the form of colored tiles. A green tile indicates the correct letter in the correct position. A yellow tile indicates the correct letter but in the wrong position. A black tile indicates that the letter is not part of the correct word, as shown in Figure 1. Wordle guesses must be English words and cannot be a string of random letters.

Launched in Fall 2021, Wordle is now run through the New York Times’ popular gaming platform. Shortly before its peak popularity in February 2022, the game was played over 2 million times daily (The Guardian). According to analysis by the wordfinding site word.tips, the worldwide average number of guesses to solve Wordle is 3.919. Since this data was based on tweeted scores, it is likely to be weighted toward fewer guesses since people are more likely to post better scores.

The wordle dictionary is a finite list of 2,316 common 5-letter words. This makes wordle fully observable, deterministic, sequential, static, discrete and single agent, an excellent candidate for artificial intelligence-generated solution strategies. To determine an efficient solution algorithm, we first needed to design the Wordle game in Python. Then we developed several algorithms: a naïve approach, Monte-Carlo Tree Search, and Reinforcement Learning. Of our approaches, Q-learning performed the best with a success rate of 84% and an average of 4.9 guesses to solve the puzzle.

The paper is divided into the following sections. Section 2 provides background on the relevant algorithms while Section 3 outlines other methods that have been used to solve Wordle problems. Section 4 details our approach to solving the problem. Section 5 compares the effectiveness of the different algorithms and parameters used, and Section 6 summarizes our findings.

Background

Several approaches are possible for finding an optimal Wordle-solving strategy. We began with Monte-Carlo Tree Search in which each of the node states store win/loss statistics from the repeated rollouts. In the first step, Selection a path is chosen from the root to a leaf node with unexplored child nodes. The node is chosen based on a selection policy that balances exploration (finding new paths, in case there is a better one yet to be explored) and exploitation (digging further into the current best discovered path). In the next step, expansion, a random child node is chosen. In the Simulation step, a rollout (or multiple rollouts) is executed following the rollout policy and win/loss statistics are stored in the node. Then in the final stage, those win/loss statistics are passed back up to update the parent nodes.

Figure 2: Example of Monte-Carlo Tree Search steps (Image source Wikipedia — Monte-Carlo Tree Search)

In q-learning, the agent must find the optimal policy without having a model of transition probabilities and rewards for the environment. The algorithm works by playing the game a large number of times and then “learning to improve the strategy. At each step of each episode, the algorithm updates the optimal function from being in state S and taking action A. See Figure 3 for the update calculation that is performed at each step. ε-greedy Q-learning allows for exploration to discover better paths the algorithm hasn’t discovered yet with probability ε.

Figure 3: Update equation for Q-learning

The final algorithm our project will use is a Wordle specific (not a general purpose) algorithm that narrows down the space of possible actions based on feedback to the guess. This algorithm relies on the fact that the agent knows all of the possible words that could be the target. With that knowledge, the action space can be reduced each round, making it easier to correctly guess the next time.

Related Work

Many researchers have worked on deriving solutions to Wordle. Bertsimas and Paskov claim to have the first approach that optimally solves Wordle every time. Their approach uses Markov Decision Processes and Exact Dynamic Programming to produce an optimal starting word of “SALET” (MIT Sloan). On average, their algorithm finds the hidden word in 3.421 guesses. Anderson and Meyer use reinforcement learning to find an optimal starting word of “SOARE”. We did not follow their methods because they were attempting to create an approach that is possible for humans to follow, while we are taking advantage of the computer’s ability to recall all possible words. Anderson and Meyer’s approach had a total win rate of 64.8%, and the agent took 6 guesses most of the time. Tyler Glaiel programmed an attempt to guess the optimal first word and found that “ROATE” was best to find the solution fastest but since it is not a possible solution, he suggests “RAISE” for the potential of getting the answer on the first guess. This disparate recommendations of the “best” starting word show that there is still work to be done optimizing an algorithm that solves Wordle.

Approach

Creating the Environment

In order to test our approaches to solving Wordle, we first needed to create an environment in which manual or programmed agents could play the game multiple times based on their strategies. For this environment, we used the list of 2,316 common 5-letter words that is used by the Wordle game designers. This allowed us to keep the state space a manageable size, in addition to making the game more fair and fun for human players. The game was designed to play exactly like Wordle. When a user inputs a guess, our environment returns a sequence of five characters that are either g (green tile), y (yellow tile), or b (black tile). Agents can use this feedback to inform their next guess, leading to either a win or a loss at the end of six guesses. We also designed the option to play the game repeatedly, calculating a win rate and average number of guesses the agent made at the end of the session.

Figure 4: Our programmed Wordle environment

Naïve Approach

To set a performance benchmark, we developed a naïve approach that benefits from past research and the ability of our agent to know all of the possible words for the target. To begin, the agent selects from one of six words determined by past research to be optimal starting words: “slate”, “crane”, “slant”, “trace”, “crate”, and “carte”. After each guess, the agent tracks which new yellow and black letters were revealed and the locations of any green letters. Going through the list of potential words, it removes all words that contain black letters. Next, it selects a word from the remaining list that has “green” letters at the appropriate places as determined by past results. In this way, it is not always optimal to try to get as many letters correct as soon as possible. Finding out the target word does not include a common letter can reduce the state space by as much or more as finding out that it contains a common letter.

Monte Carlo Tree Search

For our MCTS approach to the Wordle game our selection process begins with feedback from the initial guess. This feedback allows for expansion of the green and yellow trees of word options by using letter value indicators provided in the feedback. Words with green letters are placed in the green tree while words with yellow letters are placed in the yellow tree while words containing black letters are removed from the list of remaining viable word options. In the simulation phase the words in the yellow and green trees are given weights based on how many green and yellow letters they contain respectively. A weight of 1 is given per green or yellow letter. The max weight possible is 5. A random number is assigned and based on this value either the green tree or the yellow tree is expanded further with more weight being given to the green tree, this is the seed value. The chosen tree is then expanded only to the depth of the current max weight. A new word at this level is chosen at random and then backpropagated to be the new guess where the process repeats itself until the correct word is found or the number of attempts is exhausted.

Reinforcement Learning

In order to apply Q-Learning to the Wordle environment, we needed to determine episode states and actions and to create an effective reward function. We defined a state to be a tuple of the current guess and the result of that guess (ex. “gbbyb” for green, black, black, green, and yellow letters). The initial state was set to a guess of None and a result of “bbbbb”. The action was defined as the next guess, as selected from the list of possible words. While at first, the initial guess each round will be a random selection from the list, as the agent explores the environment, it will choose based on the past calculated Q-values.

The reward function is based on the result of the guess, the outcome of the game, and how the guess reduces the state space. A win yields a reward of 1000. In the result, each green letter increases the reward by 2 and each yellow letter increases it by 1. The final piece is a filtering effect- comparing the size of the list at the previous state to the size of the list after the guess has been made. Just like in the naïve approach, potential actions are eliminated based on whether they contain “black” letters or don’t contain “green” letters in the correct place. The more actions a guess eliminates, the higher the reward.

At the beginning of training, Q-values for all state, action pairs are set to be zero. Guesses are chosen using an ε-greedy approach, where with probability = ε, a random word is chosen. Otherwise, the best word is chosen based on the highest Q-value (chosen randomly from among the words with the highest value). This allows for the agent balance between exploiting the guesses it knows yield good outcomes and exploring unknown guesses in case there is one that works better. The ε value can be tweaked to adjust the balance between exploration and exploitation. The agent plays the chosen guess and uses the result to update the Q-table based on the reward function and the update equation shown above. After each guess, the pool of possible remaining words is filtered during the reward calculation.

The Q agent uses many of the same criteria to filter words as the naïve approach. The primary difference comes in how it chooses the next word- opting for one that it knows has been effective in the past rather than choosing at random.

After several thousand rounds of training, the Q-table is set, and our agent embarks on its testing rounds based on the previously calculated Q-values, with learning and exploration factors set to zero.

Experiments and Results

Naïve Approach

The naïve approach was able to generate an algorithm that guessed the hidden word 49% of the time out of 1000 games run. The naïve agent took on average 5.45 guesses. While this is well below the results from human and other research, it serves as a good baseline.

MCTS

The Monte Carlo Tree Search agent was able to improve on our naïve agent. At its peak, MCTS yielded a success rate of 52.2% and averaged 5.04 guesses. Empirically, we were able to determine that the agent performed better when the “seed threshold” was lower. In effect, the agent is more successful when it is biased toward green letters rather than yellow ones. While the increase in win percentage is marginal compared to our naïve agent, the difference in guesses required is significant enough to make MCTS a strong improvement. MCTS can take several iterations to learn a space, so it is possible that the agent does not have enough experience in an environment before its six guesses are over. The guess limit restricts how successful the MCTS agent can be.

Figure 6: Results of Monte Carlo Tree Search Agent. Agent is more successful with lower seed threshold. With a seed threshold of 1, agent succeeds 52% of the time and takes 5.04 guesses.

Reinforcement Learning

The Q-learning agent was very successful in guessing the hidden word. With 10,000 training sessions and 1000 tests, it won 84% of games and averaged 4.88 guesses. This improvement in win rate is consistent across a range of variables.

Changing the weights assigned to green and yellow letters in the reward function has little effect on the outcome, indicating that a majority of the effectiveness comes from how much the state space is limited by the color of letters in the result. Success rates and average guesses were not significantly different whether weights were 200 and 100 or 2 and 1.

Results were also not significantly altered by changing epsilon values or running more rounds of training. This also indicates that much of the effectiveness of the algorithm comes from limiting the state space for subsequent guesses, since there appears to be little tradeoff between exploration and exploitation. Perhaps this is because the potential for state-action pairs is enormous, even with the limited dictionary of 2316 words. The state is a tuple of any word in the dictionary as a guess and its feedback compared to any other word as the target. When paired with the action which is yet another dictionary word, there are over 12 billion state-action pairs unless the state space is reduced. This makes it very hard to meaningfully explore the space no matter the exploration factor or how many rounds of training are completed.

Figure 7: Results of Q-Learning Agent. Agent is more successful with lower seed threshold. With a seed threshold of 1, agent succeeds 52% of the time and takes 5.04 guesses

Conclusion

We found that the best approach to solving Wordle is to use Q-Learning with a reward function based on win state, green and yellow letters returned, and ability of each guess to reduce the remaining potential words. Regardless of weights in the reward function and exploration factors, this approach consistently achieved a success rate of around 83% in less than 5 guesses. This result outperforms Anderson & Meyer’s attempts to design an agent with human restrictions. The Q agent appears to require more guesses than human performance on average, however that data is likely skewed to overreport high scores.

Figure 8: Comparison of our approaches with results of other research. Our Q-learning model doesn’t match the results of Bertsimas and Paskov but outperforms the human-mimicking model by Anderson and Meyers.

If the agent could visit enough state-action pairs, it should be possible for it to learn the best move to perform in any scenario. Since some scenarios are very unlikely to be explored given how much the list gets filtered with every guess, this would likely require millions of training episodes before real improvement could be made beyond the baseline of 84%. With more research, we would like to explore this possibility. Some approaches also took into account the probability of certain letters following others to determine potential useful guesses. This would also be an interesting avenue for future research.

Code

Please visit our GitHub repository to see our code: https://github.com/devin-p-quinn/wordle_solver/blob/main/wordle.py

References

Anderson, B.J., Meyer, J.G. 2022. Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning. http://dx.doi.org/10.48550/ ARXIV.2202.00557.

Bertsimas, D., Paskov, A. 2022, September 20. An Exact and Interpretable Solution to Wordle. MIT Sloan. https://auction-upload-files.s3.amazonaws.com/Wordle_Paper_Final.pdf.

Glaiel, T. 2021. The mathematically optimal first guess in wordle. https://medium.com/@tglaiel/the-mathematically-optimal-first-guess-in-wordle-cbcb03c19b0a.

Hall, R. 2022, January 11. Wordle Creator Overwhelmed by Global Success of Hit Puzzle. The Guardian. https://www.theguardian.com/games/2022/jan/11/wordle-creator-overwhelmed-by-global-success-of-hit-puzzle

Ho, A. 2022, March 6. Solving Wordle with Reinforcement Learning. Weights & Biases. https://wandb.ai/andrewkho/wordle-solver/reports/Solving-Wordle-with-Reinforcement-Learning–VmlldzoxNTUzOTc4.

Iancu, Mirela. 2022, January. Where in the World is the Best at Solving Wordle? Wordtips. https://word.tips/wordle-wizards.

Trencseni, M. 2022, January 19. Optimal Coverage for Wordle with Monte Carlo Methods. Bytepawn. https://bytepawn.com/optimal-coverage-for-wordle-with-monte-carlo-methods.html#optimal-coverage-for-wordle-with-monte-carlo-methods.