The little search engines that can't

As Web pages multiply like the Borg, new search theories hope to cut through the irrelevant morass.

By Tom Regan Staff writer of The Christian Science Monitor

August 10, 2000

It's a sunny Saturday morning, 10 years in the future. You're in a sunny mood as well, because it's the time of the year when you plan your vacation.

In past years, you've successfully used the Internet to find and book your vacations. But this year, you don't need to do anything, because once you turn on your computer (or even your cellphone), you'll find that your personal search engine - that "lives" right on your computer - has done it all for you: It's found a perfect location, booked the airplane tickets, located a hotel room and a rental car at prices you're willing to pay, and lots of information about what you can do while you're on vacation.

Sound farfetched? Well, it's not. In fact, the technology to do most of the above tasks already exists. Imagine the Internet search engine as a car.

When people first began looking for information on the Internet, they used tools like WAIS - Wide Area Information Search. WAIS was, in essence, the Model T: dependable, but not all that interesting to look at and hard to operate. When the World Wide Web came along in the early '90s, and cyberspace began to blossom with Web pages on everything from doughnuts to Da Vinci, Web Crawler - the first true Web search engine - chugged along like a Hudson or a Packard.

Since those early days of the Web, we've gone through a series of new "cars." We've had our VW buses (Infoseek, Lycos, etc.), our ever dependable Chevys (Yahoo), and lately a sports car (Google).

And like real automakers, the designers and "manufacturers" of search engines have often made lots of money.

But what will we be driving in the future? After all, the highways of the Internet are a lot more congested these days: about 1 billion Web pages, 7 billion links, and a million new Web pages every day. That is a lot of information to navigate.

Well, there's bad news and then there's good news.

The reality is that current search-engine technology is not going to get much better. The science of information retrieval has been around for a long time, and the best we're probably ever going to get is 20 to 40 percent of what's available on any particular topic we type into a search-engine window.

The good news is that we're probably not going to be using these Web-based search engines 10 years from now - instead, we'll be using personalized "agents" that will reside on our own computers (or Palmlike devices or cellphones, etc.) which will know our wants and needs better than our own mothers, and will find things for us that we didn't even know we needed, at an appropriate time of day, without giving us a lot of garbage along with the good stuff.

In essence, the search engine of the future will be handmade and precision-engineered to our specific wishes.

But before we look too far ahead, let's look at today.

At first glance, it seems disappointing that we only find 20 to 40 percent of the information available on any given topic. But Victor Rosenburg of the School of Information at the University of Michigan says: Well, so what?

"The reason search engines work for people is that most people do not want everything on the planet," he says. "Most people find something useful in the first few links offered, and then go merrily on their way."

And while it might seem disappointing that we're missing 60 to 80 percent of what's available on the Web, Prof. Rosenburg says the good news is that "nobody knows what they are missing."

Why the holes in fact finding

So why is current search engine technology so spotty? It's all a matter of language and context, according to Rosenburg.

"Writers are targeted to use different phrases and different words. So people who create Web pages may use different words to describe the same thing, or use the same word, and have completely different meanings in mind."

When you type a phrase into a search engine, it has no idea of the context of your search - it just knows that you want to find out something about, say, cars. So you might get information on everything from old Boston-based rock bands to personalized home pages from people with the last name "Carr," along with lots and lots and lots of links to car sites.

The trick with current search engines, according to Rosenburg, is to ask better questions - ones that are more specific, but not so specific that they produce zero results - and to use more than one search engine for important searches.

Another useful trick is to use metacrawlers - sites that allow you to search several search engines at the same time. This method will certainly offer a wider variety of results, but is still limited by the parameters set up by the people who program each individual search engine.

This is especially true of sites where humans help create directory structures and decide where a particular link will reside.

As for the future, Rosenburg says the search engine Google (which creates searches based on how many times a page is linked to sites it considers authorities) is "a remarkable advance," but doesn't see current search engines really getting much better.

No, the search engine of the future will have to move off the assembly line and be hand-built in your own garage - but by the car itself, not by you.

In the future, we will have personalized search engines, or [software] agents incorporating artificial intelligence capabilities, that will "live" on our personal computers and other devices. These agents will learn from interacting with us. First, we'll start by telling them what we like and what we don't like.

But the agent will also watch what we do, and see how we use the information it provides us, and it will incorporate that information into future searches in order to give us better and better information, at appropriate times.

"People's interests are very dynamic," says Prof. John Yen of the University of Texas. "We do different things at different times of the day. We have different roles. During the day, we work. In the evening we may be parenting. Future search engines will recognize people's roles. For instance, it won't bother you with family-related information during the busiest part of your work day, but in the evening, it won't send you stock-market updates because it will know that you spend the evening hours with your family."

"Agents learn about the search space (or the information on the Web)," says Prof. Fillippo Menczer of the University of Iowa. "They are getting information from the user. They are remembering what we did in the past. Agents will learn from our own experiences, and give us results that are much more tailored, and therefore useful."

Both Professors Yen and Menczer point out that future search engines will do much more than just retrieve static documents. Intelligent agents will find our favorite music or movies, for instance, buy them for us, then download the music to our music system, or tape them on the VCR when they appear on the late night show.

What happens to privacy?

Many people, however, may balk at the notion of a software program containing that much information about them. After all, people are concerned about privacy and information when they surrender relatively small amounts of personal data.

Which is why future search engines will not wander far from home.

"I don't want to give my name and preferences to search engines," says Menczer. "But if it's on my own 'client,' my agent, [on my computer only] with all the information it holds about me, it is safer."

Yen agrees and says that while the technology will never be completely safe from "hostile users," security is improving and it will make having this kind of information on your own machine much more practical, and "keep us all safe from telemarketers."

So to return one final time to our car metaphor, 10 years from now, we may be driving - or be driven - in a very comfortable, very personal car, designed to suit our personal tastes and at the speeds we like, depending on where we are, the time of the day, and where we want to go.

Now if only a real car could do that.

Why is Christian Science in our name?

Your subscription makes our work possible.

The little search engines that can't

The little search engines that can't

Help fund Monitor journalism for $11/ month

Unlimited digital access $11/month.

Digital subscription includes:

Related stories

Subscription expired

Session expired

No subscription