Cloud, Digital, SaaS, Enterprise 2.0, Enterprise Software, CIO, Social Media, Mobility, Trends, Markets, Thoughts, Technologies, Outsourcing


Contact Me:

Linkedin Facebook Twitter Google Profile


wwwThis Blog
Google Book Search



  • Creative Commons License
  • This page is powered by Blogger. Isn't yours?
Enter your email address below to subscribe to this Blog !

powered by Bloglet


Saturday, September 25, 2004

Google's index size has plateaued?? via Sean

Google's index should be growing exponentially to keep up with the growing size of the web. Google's main page indicates that it searches 4,285,199,774 web pages, and this number has remained the same for the last year and seven months, since around February 8-14, 2003, according to the Internet Archive's Wayback Machine. Previously, it had been growing rapidly — 1,326,920,000 on Feb. 1, 2001; 2,073,418,204 on Feb. 6, 2002; 3,083,324,652 on Feb. 8, 2003; 4,285,199,774 on Feb. 14, 2004. The google index had been growing at a rate of roughly 50% per year until it reached its present plateau. Extrapolating, by now it should be over 5 billion.With 4 bytes, which is the natural word-size for the inexpensive ia32/x86-compatible processors they are using, they can store 32 bits, and that means 232 different values, or 4,294,967,296. They may be using some of the values for special purposes, and so haven't reached the absolute maximum, yet they are within 0.22% of the maximum.Adding another bit or byte to store more URL ID numbers would probably slow things down because it would require their CPUs to do much more work when manipulating the IDs. I suspect they have decided that they are in a engineering sweet spot and 4.285 billion URLs are enough for a while. So they won't increase the index size until they switch to using 64-bit processors, which would provide enough bits to easily manipulate 264, or 18,446,744,073,709,551,616, URL IDs (that's over 18.4 quintillion).As a result of the plateau, there are an exponentially growing number "unimportant" (as measured by their PageRank) web sites that are not in Google's index. An increasing number of web site owners and web site searchers will be rather unhappy with Google because of this, and the situation might not improve for months if not years. While this scenario looks plausible, I do think that Google would have some workarounds.

ThinkExist.com Quotes
Sadagopan's Weblog on Emerging Technologies, Trends,Thoughts, Ideas & Cyberworld
"All views expressed are my personal views are not related in any way to my employer"