Skip to main content
No. 2754:

Today, let's talk about how to model culture. The University of Houston Mathematics Department presents this program about the machines that make our civilization run, and the people whose ingenuity created them. 

Physics and biology are sometimes called the hard sciences. Here scientists use carefully controlled experiments and rigorous statistical methods to study nature. On the other hand, the social sciences are sometimes referred to as soft. Until recently it was difficult to rigorously examine how culture changes and evolves. 

stack of books

Google's effort to digitize all books ever published has fundamentally changed the picture. Indeed, a team of researchers from Harvard has recently analyzed this mass of words that spans centuries. In these vast quantities of text they saw glimpses of how our language and culture has changed over time. 

The richness and variety of their insights are amazing. They tracked how frequently different verbs appear in books and newspapers. Over the last 400 years irregular verbs have become more regular. In 1800 we "chid" unruly children, but in the year 2000 we "chided" them. The more frequently a verb is used, the more it resists such regularization: "spoke" will not turn into "speaked" for a long time. However, the verb "sped" is giving way to "speeded," a change that started around 1920, and is still going on today. Linguists have known of such changes, but the Google data offered detailed insight into this process of transmutation. The Harvard team also tracked how frequently famous people are mentioned in books. They saw that fame reaches a peak about 75 years after a person's birth, and declines thereafter. Today celebrities rise to fame much faster and become more famous than in the past. But they are also forgotten more quickly.

The data also shows how Soviet dissidents abruptly disappeared from pages of books as they fell out of favor. The same happened to famous Jews when Nazis came to power in Germany. The researchers were able to identify victims of Nazi repression by tracking how frequently they were mentioned in print during the years of Hitler's rule. 

Click to enlarge.
 The frequency at which the phrase "Great War" (blue) and "First World War" (red) appear in English books published since 1800 and digitized by the Google Books project.

The Harvard team analyzed culture as biologists analyze the genomes of animals, and named their approach "culturomics". Since then, statistics was used to predict the Arab Spring simply by measuring the negativity of newspaper articles in the region. Looking at a single newspaper would not tell as much. However, an analysis of thousands of articles from hundreds of newspapers can detect even subtle shifts in public sentiment. 

This is a promising beginning. But, we should be careful: Published records are susceptible to a number of biases. Until recently books were largely written by the well-to-do, or those with rich patrons. Hence, books may not offer an accurate picture of the general culture of the time. Indeed, they may provide a different snapshot of culture than newspapers, or street conversations. 

We will always need expert historians and linguists to ask relevant questions and interpret the answers in their true historical and cultural context. These novel computational and statistical methods are powerful new tools in the social scientists' kit. 

This is Krešo Josić at the University of Houston, where we're interested in the way inventive minds work.

(Theme music)

I have a blog at, and you can follow me on Twitter!/kjosic.

Read more about Culturomics, as well as in the original publication in Science and the comment.

To track n-grams, which are phrases consisting of N words, use Google's Ngram Viewer. The chart above was from the Google Ngram Viewer. 

Book image is from stock photo from SXC.

This episode was first aired on November 16, 2011