Wednesday, March 19, 2008

Google, ontology and magic phrases

Search engines are vital to the web. Search engines also suck. I was just forcibly reminded of those facts as I struggled to find a source through Google.

For my purposes Google is the best of a bad lot. It indexes a huge number of sites, adds material fairly quickly and tries to stay up to date. But the search mechanism is fundamentally broken, because they're all fundamentally broken.

The immediate problem is trying to figure out what to search for. The larger problem is search taxonomy -- how to organize the information so the user can quickly find what he or she is looking for. The method used today is a string search. That essentially means guessing the magic phrase that refers to whatever you're looking for.

This is not only annoying as hell, it is a supremely complicated problem because not everyone uses the same words or phrases for a thing when they search for it.

I got considerable exposure to this a couple of years ago when I was acting as "Chief Staff Ontologist" (hey, I got to pick my own title) for an online yellow pages company. We wanted a way to classify the hundreds of thousands of listings in our database so customers could find the business they were looking for.

Developing a working, and workable, taxonomy just for businesses is a dauntingly complex task. Part of the problem is regionalisms. The same business is called different things in different parts of the country. You can have an "undertaker", a "funeral home", a "funeral parlor", a "mortuary", and several other terms, depending on where you are. And in some areas those terms have specific, differentiable, meanings. For example in some places in the east, mostly in large urban areas, a "funeral parlor" is a place to hold funerals. It doesn't provide embalming or other related services.

Essentially when you develop the taxonomy you have to try to read your searcher's mind, just as a searcher using a search engine has to read the mind of the people who put up the web site.

And since the searches are basically string-based, you've got no way to intelligently cross-reference topics. In fact in a search engine there usually aren't any topics.

I just spent a couple of days trying to find an appropriate "IT compliance consultant" in California for a story I am working on. It turned out the magic phrase was "security consultant" with a sub-specialty in compliance issues. Arrgh!

If you get the impression from this that I have a better solution, I hate to disabuse you but I do not. The answer undoubtedly involves what is called the semantic web -- being able to search by meaning rather than a string of characters. However the semantic web is mostly a pious wish that's struggling to achieve buzzword status.

As far as I know, no one can do a useful, generalized semantic search. The only way I know to do it is to have humans cross reference terms. A whole lot of humans doing a whole lot of cross references.

I suspect we're eventually going to get more useful search through a massive wiki-like project where people enter terms and, after flailing around and finding the magic phrase, provide a cross reference between terms. That's not an elegant solution, but given the power of the web -- and the need -- it's one that can work.

No comments: