One of the lesser-discussed facets of Web searching is the spidering limits of search engines. Even if a search engine is a full-text engine, it may not search the entirety of a given page if it's too large. Researchbuzz raises the query -"which one of the two search giants spiders more of an individual HTML page" and gives the verdict. It's Yahoo.Google sets a limit of 101K for HTML pages (its spider will only index the first 101K of an HTML Web page.), whereas Yahoo's limit is the first 150K of a Web page, while its PDF indexing limit is 500K.We earlier wrote about potential limitation about number of pages that can be indexed by Google. The conclusion : For running searches which might tend to focus on large pages (like word listing searches that might point you to dictionaries) try Yahoo first. |
Sadagopan's Weblog on Emerging Technologies, Trends,Thoughts, Ideas & Cyberworld "All views expressed are my personal views are not related in any way to my employer"