New online dictionary redefines 'look it up'
Lexicographer Erin McKean’s interactive ‘Wordnik’ is projected to be the largest online dictionary ever.
Jasmine Scott/Special to The Christian Science Monitor
Erin McKean doesn’t look much like a revolutionary. She speaks softly. She sews her own skirts and writes a daily blog entry about vintage patterns. She does work out of a basement, but it’s got carpeting and good lighting and roughly 1,500 books, many of whose titles involve the word “words.” Her suburban Chicago home is not exactly the picture of subversion.
This week, though, she is slated to launch what may be the biggest revolution in the printed word since, well, printed words.
Ms. McKean’s brainchild is called Wordnik, and it combines the best practices of the old-fashioned desk reference with Internet innovations. Words can be tagged like a blog entry, their pronunciation recorded and replayed like streaming radio, their related words cataloged like a list of books customers also bought at an online book depot. When the paper page gives way to the Web page, everything about the way we think of words will change, McKean says. “This project,” she predicts in a quiet voice devoid of bravado, “is going to completely revolutionize all of dictionarymaking forever.”
Granted, a dictionary is closer to a database than a mystery thriller, its authors nothing like, say, John Grisham. But to McKean, nothing has ever seemed more fascinating than collecting and organizing American words.
McKean was 8 years old when she decided that when she grew up, she wanted to be a lexicographer – the technical term for a writer or editor of dictionaries. She first found it in her daily scouring of The Wall Street Journal. Her father was a Journal devotee, and McKean liked the human interest stories (but, she jokes, “even then, I knew enough not to read the editorial page.”) A feature article celebrated Oxford University Press’s 1980 Word of the Year – ayatollah – and talked about preparing the newest edition of its most famous title, the Oxford English Dictionary.
“I think I was really attracted by the fact that it was taking 21 years to make the second edition of the Oxford English Dictionary,” she recalls. “I was 8. Twenty-one years was forever.”
The lexicography bug stuck, in part because McKean loved language. She was a voracious reader, plowing through her local libraries’ stacks and devouring anything she found at home, she says. “If it was lying around, I read it. If my parents didn’t want me to read it,” she says, “they had to hide it.”
As her classmates abandoned childhood dreams of firefighting or Broadway stardom for teaching or nursing, McKean stuck with words. “Nobody ever tried to talk me out of it. Nobody knew enough about it to know if it was easy or difficult,” she recalls. “Nobody had a brother who was a lexicographer the way they might have a brother who was a firefighter or an English teacher or a doctor or a lawyer. Nobody had ever met one.”
For good reason, she found out as she pursued joint bachelor’s and master’s degrees in linguistics at the University of Chicago: There aren’t a whole lot of jobs for lexicographers. McKean estimates there may be 200 working lexicographers in America today, and that the field sees about two full-time openings a year.
McKean got her start through a combination of luck and ingenuity: She called up the only dictionary publisher based in Chicago and asked for an internship. After graduation, the internship turned into a job, which eventually turned into a career at Oxford University Press, a move she likens to “being called up by the Yankees.” At age 29, McKean was the chief editor of the American dictionaries group. “If it had Oxford and American in the title,” she says, “it was my fault.”
She could dream up bestsellers, like the Oxford American Writers Thesaurus, but among her favorite books is the first one she acquired at her new home, a publishing house with a reputation for erudition. “It was called Slayer Slang....[It] is a treatment of the slang of Buffy the Vampire Slayer,” the title character in a hit television drama from the late 1990s.
The purchase revealed as much about McKean’s sensibility as it did about her business sense. And when it comes to dictionaries, McKean says, sensibility is key. “People have this idea of the Platonic ideal of the dictionary. That’s why they call it ‘the dictionary’.... They think that all dictionaries are pretty much the same.” Not so, she says. There are five print dictionary publishers in the US, each choosing which of the billions of words they’ve collected will make it into print.
What gets left out depends on the personality of the publishing house. On the other hand, how to evaluate what gets in is a task beyond most people. “Most consumers don’t have a good metric for deciding on whether the dictionary they want to use is a good one … so they flip the book over, then go to the back, and it says, ‘over 250,000 entries.’ And they go, ‘Great, this dictionary must be awesome!’ ” she says. “Because if you don’t know a word, how do you judge the quality of the definition?”
Enter Wordnik, McKean’s newest project. In the infinite space of the Internet, she can define as many words as she wants.
“There are hundreds of thousands of words that aren’t in any print dictionary today ... because there’s no space for all of them.”
Wordnik has space for many of them, and for their bells and whistles. Her team of seven has analyzed what print and online dictionaries do and don’t do well. They’ve built a user-friendly resource that should be the best – and biggest – of both worlds. Wordnik generates its content from a database of 4 billion words, twice as many as that of her last employer. “Four billion words,” she says with a shrug, “is what you can pick up lying around on the floor of the Internet.”
Want to evaluate a definition of a word you’ve never met? No problem; other users can tell you if they favor that definition. Want to know what other words often appear in the same sentence as what you’ve just looked up? There’s a section called “related” for words used in the same context as yours. Need to know what a farthingale, for instance, looks like? Images are imported to the page from photo-depot giant Flickr. Unsure if you really understood the definition? Every word has several example sentences, culled at random from that Internet floor and then sorted so the best rise to the top of your search page.
These, McKean says, are critical. They’ve been vanishing from print dictionaries as publishers try to cram them with more words, but contextual sentences are what make people pick up reference books in the first place. “We think people go to a dictionary to find out what a word means,” she says. Not so. “Most people go to the dictionary because they don’t want to look stupid.”
They don’t want to sound stupid, either, which is why every word has an audio file of its pronunciation. Users can record their own pronunciations, too.
Print dictionaries do have one clear advantage, though: They show more than one word at a time. That makes skimming the print page fun, and McKean has tried to mimic that feeling with a “serendipity” feature, which generates words at random.
Perhaps the most surprising element of McKean’s new dictionary is a frequency graph, which shows how often the word you’ve looked up was used, as a written word, in a year. That can tell you more about history than just the etymological: Take “chad,” for instance. The word’s frequency in 2000 is high – thanks, of course, to that year’s presidential election controversy. But there are signs of heavy usage much earlier. [Editor's Note: The original version of this story incorrectly used the word "entymological" instead of "etymological." A reader pointed this out here. You can read our response here.]
“We have one text from 1870 that has the word ‘chad’ a lot, because it’s about Jacquard [weaving] looms, which used to be run on punch cards,” McKean explains. “They had the same chad problems as the Florida ballots.”
Ultimately, McKean’s goal is rather humble, when judged against the volume of words that have accumulated in the 400-year history of modern English.
“Ideally my goal is, before I die, to have some information about every word that’s ever been used in print.”
That may be the real revolution: digitizing a bit of data about every word we English speakers have ever put on the old-fashioned page. Byte by byte, the soft-spoken lexicographer will see her revolution through.