Making these United States safe for Ngrams

Last month's court decision upholding the legality of the Google Books search engine also kept a powerful research tool available to the public.

A side effect of Judge Denny Chin's dismissal of a long-running suit by the Authors Guild against Google Books – discussed in this space last week – was that he made it safe for the Ngram Viewer.

The Google Ngram Viewer is a powerful search tool. It appears, though, that not many people know about Ngrams; or so I infer after a Google News search has yielded only 65 results. The Ngram Viewer is an application that lets you trace over time the frequency of usage of particular search terms, as they are embodied in the "corpus" of books Google has scanned into its database.

The sample terms that come up by default when you go to the Ngram Viewer are Albert Einstein, Sherlock Holmes, and Frankenstein. (Go figure.) The search produces three colored lines on the fever chart, as this kind of graph is known, one for each name. The lines cross and recross. But they show that Frankenstein had an early lead, and by 2000 was once again far ahead of the other two.

It all takes somewhat less effort than a quick check of an online movie schedule. Frankenstein, et al., aside, the poster child for the Ngram Viewer is "the United States" itself.

As Judge Chin wrote in his ruling in the Authors Guild suit: "Using Google Books ... researchers can track the frequency of references to the United States as a single entity ('the United States is') versus references to the United States in the plural ('the United States are') and how that usage has changed over time.

"The ability to determine how often different words or phrases appear in books at different times 'can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology,' " the judge added, quoting one of the scholars who has weighed in on this topic.

The commonplace observation among those who tune into this sort of thing is that it wasn't until after the Civil War that the United States became truly singular in a grammatical sense. An amicus brief submitted on behalf of Google in the Authors Guild case included an Ngram that showed the red line ("United States are") trending downward over time, crossing the blue line ("United States is") in the late 1870s, and then dribbling away into statistical insignificance in the lower right-hand corner.

Ben Zimmer, in a post for "Language Log," included an Ngram built from slightly different search terms that shows the crossing point coming about 10 years later. Your mileage may vary, too.

I've just tried a search for "the United States is, the United States are" and have gotten two lines that start out twisted together like strands of a rope but then separate around 1830, with "the United States is" taking the lead. Each line also shows two humps corresponding to each of the two world wars, when the name of the nation was presumably more often mentioned in books.

Ngrams are available to adjudicate less cosmic matters as well. I've just done a quick check of "e-mail" versus "email" to find out just how much of a fuddy-duddy I am to continue to prefer the hyphen, and I feel positively vindicated: The hyphenated version of the term has a commanding lead – for now, at least.

Ngrams are a useful tool I look forward to learning more about.

You've read  of  free articles. Subscribe to continue.
Real news can be honest, hopeful, credible, constructive.
What is the Monitor difference? Tackling the tough headlines – with humanity. Listening to sources – with respect. Seeing the story that others are missing by reporting what so often gets overlooked: the values that connect us. That’s Monitor reporting – news that changes how you see the world.

Dear Reader,

About a year ago, I happened upon this statement about the Monitor in the Harvard Business Review – under the charming heading of “do things that don’t interest you”:

“Many things that end up” being meaningful, writes social scientist Joseph Grenny, “have come from conference workshops, articles, or online videos that began as a chore and ended with an insight. My work in Kenya, for example, was heavily influenced by a Christian Science Monitor article I had forced myself to read 10 years earlier. Sometimes, we call things ‘boring’ simply because they lie outside the box we are currently in.”

If you were to come up with a punchline to a joke about the Monitor, that would probably be it. We’re seen as being global, fair, insightful, and perhaps a bit too earnest. We’re the bran muffin of journalism.

But you know what? We change lives. And I’m going to argue that we change lives precisely because we force open that too-small box that most human beings think they live in.

The Monitor is a peculiar little publication that’s hard for the world to figure out. We’re run by a church, but we’re not only for church members and we’re not about converting people. We’re known as being fair even as the world becomes as polarized as at any time since the newspaper’s founding in 1908.

We have a mission beyond circulation, we want to bridge divides. We’re about kicking down the door of thought everywhere and saying, “You are bigger and more capable than you realize. And we can prove it.”

If you’re looking for bran muffin journalism, you can subscribe to the Monitor for $15. You’ll get the Monitor Weekly magazine, the Monitor Daily email, and unlimited access to CSMonitor.com.

QR Code to Making these United States safe for Ngrams
Read this article in
https://www.csmonitor.com/The-Culture/Verbal-Energy/2013/1205/Making-these-United-States-safe-for-Ngrams
QR Code to Subscription page
Start your subscription today
https://www.csmonitor.com/subscribe