News

Diving Deep Into the Web

You think the Web is big? In truth, it’s far bigger than it appears.

The Web is made up of hundreds of billions of Web documents — far more than the 8 billion to 20 billion claimed by Google or Yahoo. But most of these Web pages are largely unreachable by most search engines because they are stored in databases that cannot be accessed by Web crawlers.

Now a San Mateo start-up called Glenbrook Networks — says it has devised a way to tunnel far into the “deep web” and extract this previously inaccessible information.

Glenbrook, run by a father-daughter team, demonstrated its technology by building a search engine that scoops up job listings from the databases of various Web sites, something the company claims most search engines cannot do. But there are myriad other applications as well, the founders say.

“Most of the information out there, people want you to see,” said Julia Komissarchik, Glenbrook Networks’ vice president of products. “But it’s not designed to be accessed by a machine like a search engine. It requires human intervention.”

This is particularly true of Web pages that are stored in databases. Many ordinary Web pages are static files that exist permanently on a server somewhere. But an untold number of pages do not exist until the very moment an individual fills out a form on a Web site and asks for the information. Online dictionaries, travel sites, library catalogs and medical databases are few such examples.

By Michael Bazeley

Mercury News

Full Story: http://www.mercurynews.com/mld/mercurynews/12403171.htm

Sorry, we couldn't find any posts. Please try a different search.

Leave a Comment

You must be logged in to post a comment.