Mapping The Most Used Words On Every Country and U.S. State's Wikipedia Page

With 318 different language versions, Wikipedia is the place Earthlings go to tell and/or read the story of our world.

Just like the real world, there are moments of learning, moments of joy and moments of poignancy — such as when readers descended on Queen Elizabeth’s Wiki page, which the site’s “deaditors” updated with news of her passing a full hour before the BBC announced it.

But Wikipedia is as much a sum of its flaws as its assets. Wikipedia’s portrait of the world is distorted by vandalism, (male, western) bias and plain wrongness. Just like the real world.

Still, the world continues to read and represent itself on Wikipedia. There are 27 countries with hundreds of millions of monthly page views each (the U.S. and Japan are top, with three and one billion, respectively). Even in China, where Chinese-language Wikipedia is banned, the website counts over three million hits with each flip of a calendar page. (This stunning interactive map reveals the different languages each country reads the most on Wikipedia.)

At Crossword-Solver, we wondered which words the world chooses to represent itself. So, we checked the page for every country and U.S. state and city to find the words and places most commonly used on each.

What We Did

We gathered the Wikipedia pages for every country and U.S. state and city and got rid of any demonyms, places, stop words (like “a” or “the”) and link words, prepositions, surnames and companies. Next, we counted the words that were left and used an algorithm to identify the most uniquely relevant word on each page. Then, we put the country/state names back in and checked which one was the most used on each page.

Key Findings

  • The most commonly used word on the United States Wikipedia page is native, appearing 22 times.
  • The most common word on the UK’s page is devolve (13).
  • The United States is the most mentioned country on the highest number of other countries’ pages (16).
  • New York, Missouri and Virginia are each the most mentioned state on five other states’ pages.

U.S. and Mexico Wikipedia Pages Highlight Native/Indigenous Aspects

As we hypothesized, the most used word on any country’s Wikipedia page tends to reflect national culture, an element of nature or an aspect of local history. It’s an evocative map of words that veers between wonder and banality, tragedy and farce. Europe betrays a Kafkaesque obsession with bureaucracy: devolve (UK), insurance (Netherlands) and tax (Denmark) leave precious little time for saunas (Finland) or eruptions (Iceland). Africa’s top words are dominated by a mixed blessing of exploitable natural resources, such as cobalt (Congo), tobacco (Malawi and diamond (Botswana, Lesotho and Namibia).

Most Uniquely Popular Word On Wikipedia Pages Infographic World Map

Click to see in full size

The most common word in the United States Wikipedia entry is native; in Mexico, the top word is indigenous, which is mentioned 66 times, making it the second most commonly used ‘top word’ for any country in the world, after Canada. Canada’s top word is percent, although that may have more to do with a stats-focused Wikipedia contributor than the country’s diverse cultures (only 32 percent identify as ethnically Canadian).

We also checked to see which other country each page mentions the most. For Canada, it’s the United States, which also crops up as the most mentioned in 15 other countries – more than any other nation. These are mostly Pacific and Caribbean islands but also the major powers of Australia, China and France, as well as Liberia, with which the U.S. has strong historical ties due to slavery and 20th-century military and political affairs.

Oil is the Most Commonly Used Word in Three U.S. State Wikipedia Pages

America’s states each have their (mostly) unique flavor, with the most common word on each state’s Wikipedia page tending to refer to local culture or industry. Often, these cultures and industries can overlap: Kentucky’s bluegrass is the name of a region, a type of grass that’s essential to the local thoroughbred horse industry and, of course, a genre of music that evolved here and became an industry of its own.

Most Uniquely Popular Word On Wikipedia Pages US States Map

Click to see in full size

Oil is the only word to stand out in multiple states (Alaska, North Dakota and Oklahoma), and a handful of states have a more wholesome natural phenomenon as their keyword. Montana is the flow state thanks to its abundance of waterways; Mississippi is all about the higher land — levees — around the rivers. Parts of Washington state lay claim to being the rainiest in the contiguous U.S., but the mountains create a ‘rain shadow’ that makes the area east of the Cascades one of the driest. Washington’s word is rain.

And which states can’t get which other states off their mind? New York, Missouri and Virginia turn up as the most frequently mentioned states on the Wikipedia page of others. These states get high mention counts from adjoining or nearby states. However, Missouri also appears more than a dozen times in the pages for Montana and North Dakota, thanks to the reaches of the eponymous river.

The Electronic Encyclopedia

The words we use matter. From pronouns to near-synonyms, the choice of one subject or word over another can speak volumes about the writer’s power and relationship to their subject — and can perpetuate ingrained biases or out-and-out inaccuracies. You can check our full data in the interactive below to see the words and places that Wikipedia’s writers most commonly connect to the world’s countries and American states and cities— for better or worse.

In Jorge Luis Borges’ short story “The Library of Babel,” the writer imagines a near-infinite library containing books written with every possible combination of letters. According to the parameters of the story, such a library would contain 251,312,000 books. With 566 new entries appearing every day, it will take a while for Wikipedia to get there — which leaves us plenty of time to make sense of the sometimes baffling word combinations we can find there today.

METHODOLOGY AND SOURCES

Text of Wikipedia pages for each country, state and city were taken directly from Wikipedia in English.

Texts were cleaned only to include the main entry, excluding sections such as "See also," "References," "Further reading," etc. Further data cleaning included removing demonyms (e.g., "French" for France), names of major cities, names of countries themselves and all the usual stop words like articles ("a," "the"), linking words ("and," "or"), prepositions, etc.

Finally, after compiling all the words appearing in the Wiki entry for a given country, we grouped different forms of the same words together (e.g., large, larger, largest => large) so they could be analyzed as a single item.

The most uniquely popular words for each country, state and city were determined using the TF-IDF (term frequency-inverse document frequency) algorithm, which is a measure that evaluates how relevant a word is to a particular text in a collection of texts. Using this algorithm, we were able to determine which word was most distinctly relevant to each Wikipedia entry. Names of geographic entities (rivers, seas, islands), people's last names, names of companies, political parties or organizations were excluded when choosing the most distinctly relevant word.

The most mentioned countries and states were taken as the country or state with the highest number of mentions in another country's or state's Wikipedia entry.

The data was collated and analyzed in Aug 2022.

Recent Clues

Trending Clues