Surprisingly, smoke rising from a chimney is an apt metaphor for the way language grows, say physicists Alexander Petersen (GRS’11) and Joel Tenenbaum (GRS’12). They have learned that the growth of language functions a lot like a gas does, slowing down or “cooling” as it expands. “In Physics 101, you learn that when particles have more room to wander, there’s a lower temperature,” says Petersen. “In analyzing languages, we found a similar pattern.”
Petersen and Tenenbaum studied the evolution of languages over the last several centuries using the vast corpora of the millions of books digitized by Google. The Google Ngram Viewer is a free graphing tool that lets users chart word and phrase usage in books published between 1500 and 2008. By searching for multiple synonyms for the same word, you can see the exact moment when, for example, usage of “electronic mail” declined and “email” began to take off. Tenenbaum and Petersen have been studying the behavior of words as part of “culturomics”—employing quantitative methods to assess human behavior. “It’s this idea that we can use big data to analyze formerly unreachable aspects of our culture,” says Petersen, who has previously investigated trends in baseball statistics. Researchers also have used culturomics to predict economic trends and urban sprawl.
The duo’s study of the billions of words available in the Google Ngram data set, published by Nature in December 2012, found that languages become more efficient as they grow larger—synonyms like “radiogram” and “Roentgenogram” gradually are replaced by the ubiquitous “X-ray”—and the need for new words decreases. “It’s the literary survival of the fittest,” says Tenenbaum, whose PhD dissertation is based on the research. “What’s deemed to be irregular is eventually killed off.” In the languages he and Petersen assessed—English, Chinese, French, German, Hebrew, Russian, and Spanish—their vocabulary growth rate slows as the words in the corpora grow more numerous. Petersen and Tenenbaum use smoke escaping from a chimney as a metaphor to illustrate this phenomenon: when constrained inside the flue, gases remain hot because particles are constantly interacting; but as the particles build, they push to the top, expand into the air, and then rapidly cool. “We found a similar pattern while analyzing languages,” says Petersen, “that as they get larger, the annual fluctuations of words become smaller.”
Both Petersen and Tenenbaum concede that their method isn’t perfect. Copyright issues affect what is digitized, and scanning processes are imperfect. “There’s also a sample bias,” says Petersen. “There are more books now than previously, so that affects results, too.” Additionally, they say that some humanities scholars are suspicious of a method that doesn’t take into account the meaning behind words—that it assesses them only mathematically. But Petersen and Tenenbaum say that’s the point of the study. “As physicists, we’re empirically driven,” says Petersen. “So when something like this database is available, it just makes us jump for joy. With so much information, you can really test theories; we were able to look at the evolution of entire languages. And that was really fun.”
Read Petersen and Tenenbaum’s complete study, “Languages cool as they expand: Allometric scaling and the decreasing need for new words.”