When the humanities meet big data
A "close reading" of canonical texts have long been a staple of the humanities. Now, technology is enabling a "distant reading" of everything else.
Being a voracious reader is a prerequisite for academics in the humanities, but even the most dedicated bookworm needs to eat, sleep, and socialize.
Not so for computers, which are known for being tireless, thorough, and very fast. And, when asked the right kinds of questions, these electronic speed-readers can grasp patterns that would otherwise lie beyond the reach of human scholars.
That’s exactly what happened when a team of researchers used machine-learning techniques to plow through transcripts of 40,000 speeches in a parliamentary assembly during the first two years of the French Revolution, according to a paper published in the Proceedings of the National Academy of Sciences last month. By quantifying the novelty of speech patterns and the extent to which those patterns were copied by subsequent speakers, the researchers illustrated how much of the important intellectual work of the revolution was initially carried out in committees, rather than in the whole assembly.
“We’re really getting a quantitative sense of large-scale patterns,” says co-author Simon DeDeo, a professor at Carnegie Mellon University and the Santa Fe Institute, a research center in New Mexico that specializes in complexity science. “There’s a lot of data here. You couldn’t have run this on a machine from 2000 or 2005.... Now you can do this on a desktop.”
Professor DeDeo received his doctorate from Princeton University in 2005 – not in European history, but in astrophysics. That was the tail of an inflationary period in DeDeo’s chosen field, and opportunities to tackle cosmology’s big questions were dwindling. “It was the end of the golden age,” he says. “I went off [and] I spent some time at the Santa Fe Institute, and that’s where I kind of converted into whatever I am now.”
The academy still hasn’t quite settled on a name for what DeDeo does, but the leading contender is “digital humanities,” a term that captures the field’s deeply interdisciplinary approach. Other digital humanities projects have brought together historians, librarians, literary critics, mathematicians, and computer scientists to analyze the complete works of Shakespeare, Time magazine covers, the ancient graffiti of Pompeii, and one million pages of Japanese manga.
“One of the exciting things is, can the humanities and the sciences team up?” DeDeo asks. “There’s a huge amount of knowledge and wisdom that the humanists have that the scientists don’t.”
Digital humanities can be traced to beginnings that are as diverse as the disciplines of its practitioners. One influential figure was Roberto Busa, an Italian Jesuit priest who, beginning in the 1940s, began rendering the works of St. Thomas Aquinas into a machine-readable format. Another is Franco Moretti, a Marxist-trained Italian literary critic who argues that understanding literature comes not from a close reading of the literary canon – literature’s equivalent to the one percent – but from a “distant reading” of the entire corpus.
Whether inspired by Thomistic completism, Marxist inclusivity, or something else entirely, digital humanities holds the potential to shift the way we look at history. “There’s no way that a single academic could have read all 10,000 bad pulpy novels published in the 19th century,” says Indiana University historian Rebecca Spang, a co-author on the French Revolution paper. “So you could ask different kinds of questions because you get different kinds of information.”
In the case of the French parliamentary assembly analysis, researchers found that, unlike Democrats and Republicans today, the bourgeoise and the aristocrats tended to use same language patterns. “There isn’t a sort of discursive spectrum that we can identify,” Professor Spang says, ”where you’ve got speakers on the right who use one vocabulary and the speakers on the left using another.”
Distant reading also results in a different understanding of the subject matter, one that is more holistic but also stands at a greater remove.
From the point of view of the computer, says Professor Spang, “it doesn’t matter what ‘ghijk’ means or says, just that it’s not ‘abcdef.’
“This kind of work is not going to give us a kind of emotionally or narratively satisfying historical explanation,” says David Andress, a historian at the University of Portsmouth in Britain and an expert on the French Revolution, “but it’s certainly going to show us things that we then have to explain, that that we then have to explore why we’ve got that result.”
This explanatory gap is why Dr. Andress doesn’t see digital humanities as a threat to traditional scholarship. “The readers of history and the general public are always going to want to have the story told to them in terms of people,” he says.
[Editor's note: An earlier version misstated the year DeDeo was awarded his doctorate.]