Can ChatGPT Solve NYT's Connections?

Solving Connections: ChatGPT's 60-Day Journey

ChatGPT has emerged as a remarkable language model capable of understanding and generating human-like text. So, we wondered how good it was at making logical connections, especially when it comes to tricky puzzles combining culture, trivia, and sometimes puns with incomplete words or phrases. To find out, we embarked on a 60-day experiment, delving into the new NYT Connections puzzle that premiered on June 12. We had one mission. To uncover the AI's puzzle-solving abilities and observe its performance and accuracy during the NYT Connections' first 60-day beta period from June 12 to August 10.

For those unfamiliar yet with the NYT Connections puzzle, it's a daily word puzzle that poses a daily enigma. Players must group 16 words into four groups, containing four words each. The aim is to group them by discovering commonalities or common threads.

In this article, we present a comprehensive log of ChatGPT's performance during the 60-day challenge, highlighting its successes and occasional missteps. We invite you to join us as we explore the potential of AI language models and uncover fascinating insights from this unique experiment, all to answer the question - Can ChatGPT solve NYT's Connections?

Playing with Puzzles: Inside the 60-Day Methodology

Curious and determined, we embarked on this thrilling 60-day puzzle-solving adventure by crafting the perfect AI-friendly templated prompt for the ever-capable ChatGPT.

Step 1: Engineering The Daily Templated Prompt

It took us a few iterations to craft a reliable daily prompt that gave enough W Questions (what, why, how, and where) to guide the language model.

Here is the daily prompt we used:

ChatPGT Prompt for solving daily Connections puzzles

Step 2: The Daily Routine

Every morning (the NYT Connections puzzle updates at midnight), we gave ChatGPT enough juice by adding a fresh set of 16 words to the templated prompt.

After solving every puzzle, we meticulously recorded the accuracy of ChatGPT's responses.

Our Assessment of ChatGPT's Responses

We considered a few factors when assessing whether Chatgpt solved the Connections puzzle that day. First, whether it managed to group the words according to commonalities correctly, and second, whether it managed to guess the correct category for that group (more or less.)

On June 12th, for example, we assessed that Chatgpt could group all 16 words into the correct four categories (e.g., Weather Phenomena: rain, hail, sleet, and snow) but that the category name was slightly different than Connections which was WET WEATHER. Our experiment still considered the answer good if the groupings were correct, even if the category name was incorrect.

Step 3: Creating Interactive Visuals to Reveal Key Findings

In the spirit of sharing ChatGPT's performance, we created these interactive visuals; here, you'll see its triumphs and blunders.

Our findings are ordered by date, starting with the puzzle's debut on June 12th. We then documented how efficient the AI language model was at grouping the words into their respective categories according to the NYT Connections difficulty levels. In the last column, we gave ChatGPT a solving score out of 100%, depending on how many times it correctly grouped all 16 words daily - you can find the percentage of correct answers by hovering over the purple bar.

If you're curious, use the search box to find out how well ChatGPT did at solving a particular group, word, or puzzle date.

So, how many days did ChatGPT manage to solve that day's puzzle? Here's a bar graph showing how many days over the 60-day period that ChatGpt managed to find all four groups and how many days it missed one or more.

39 Victories Vs. 21 Challenges

Decoding Mistakes: Exploring ChatGPT's Missteps with an Interactive Treemap

So, where did ChatGPT falter, and on which colored categories? This interactive map shows ChatGPT's errors grouped by the four difficulty levels of NYT Connections. Connections purple and blue categories are the hardest, and green and yellow are the easiest.

As you can see, the harder the difficulty level, the more ChatGPT struggled to find a common thread to solve the category. Hover over each category, and you'll see the date of the puzzle and the four words that make it up.

If you prefer to look at each color individually, simply use the filter in the left-hand corner and use the dropdown menu or click directly on the color you want to see.

Chat GPT's Struggles with Connections

While the AI language model usually managed to find commonalities between the yellow and green groups, we noticed it struggled a lot more with the purple and blue categories on average. These were some of the trickiest categories for ChatGPT to solve:

Homophones
Categories with ____, like NAKED____ or _____STICK
Things with..., like THINGS WITH WINGS
Words with..., like WORDS WITH! or WORDS WITH TWO PRONUNCIATIONS
Hyper-specific trivia like MTV SHOWS, SPICE GIRLS, or SLANG FOR MONEY
Things that are, like THINGS THAT ARE RED
Groups that contained a proper noun like someone's name, a TV show, a brand, or a magazine

The majority of these were from the purple and blue groups; however, sometimes, a sneaky green or yellow tripped ChatGPT up, too, especially if it was trivia or a brand name.

Conclusion

Did ChatGPT showcase its linguistic brilliance in solving the NYT Connections puzzle? We will leave that for you to decide.

Regardless, we can all agree that it was an interesting experiment. From triumphant victories to intriguing challenges, this 60-day journey revealed the remarkable potential of AI-driven puzzle-solving. Glimpsing at the limitless possibilities of language and artificial intelligence going toward the future.

About the Authors

Sarah Perowne is a language and education specialist with over 10 years of experience in teaching and content creation. She has worked with students of all ages in various teaching methods, including those with disabilities and ASD. She sports an acute knowledge and skillset in teaching English as a second/foreign language (ESL) English Language Arts and creating content for online teaching resources, articles, and podcasts.

Mirela Iancu is a Growth Marketer specializing in SEO, Content Strategy, and Product Marketing. A user-centered thinker, she loves numbers and data as much as words. A winning combo for SEO and word games marketing. She is also passionate about language education and the impact of tech on learning accessibility. Currently located in Barcelona, she previously founded a platform for learning Romanian online.