|Cloud, Digital, SaaS, Enterprise 2.0, Enterprise Software, CIO, Social Media, Mobility, Trends, Markets, Thoughts, Technologies, Outsourcing|
Linkedin Facebook Twitter Google Profile
Saturday, September 03, 2005
Bob Wyman points to grabPERF report published recently and claims that PubSub is the fastest search engine. Claims apart, I like Bob’s assessment of the nature of future search engines. Bob writes, PubSub implements a completely different technology than the others on the list. PubSub implements a "prospective," forward-looking search while the others on the list are primarily providers of "retrospective" past-oriented search. PubSub stores user subscriptions (i.e. "queries") and then checks each new incoming document against each subscription the instant that a new document is discovered. The retrospective engines do the reverse. They store documents and then check queries against them as the queries arrive. In the realm of blogs and syndication, the most common form of search is actually the prospective search. "Watch Lists" or "Alerts" are referred to as "repeated retrospective search" and is the simplest, yet most inefficient, way to provide prospective search. It works under light load but doesn't scale without tremendous hardware expense. In a prospective system users' results are built up incrementally as they arrive rather than in response to ad hoc queries, very much like in a desktop aggregator. The cost of "fullfilling" a subscription query is spread across the day - rather than concentrated at the time a query is received. Additionally, it means that we can totallly decouple "matching engine" and the process of new document "ingestion" from the process of serving results. This way the system response time is largely insensitive to the volume of user requests since request processing is trivially simple. Modular, decoupled design, the speed with which results are served are not impacted by the amount of work that ingestion and matching processes are doing. If matching load or ingestion load gets heavy, users never feel it. Bob surprises by revealing that PubSub still uses just a single dual-processor Intel Pentium box to handle all matching and thinks that they may not swing towards the extreme of using exotic and expensive hardware isn't what's needed.
Category :Search Engines |
|Sadagopan's Weblog on Emerging Technologies, Trends,Thoughts, Ideas & Cyberworld