Cloud, Digital, SaaS, Enterprise 2.0, Enterprise Software, CIO, Social Media, Mobility, Trends, Markets, Thoughts, Technologies, Outsourcing


Contact Me:

Linkedin Facebook Twitter Google Profile


wwwThis Blog
Google Book Search



  • Creative Commons License
  • This page is powered by Blogger. Isn't yours?
Enter your email address below to subscribe to this Blog !

powered by Bloglet


Sunday, September 19, 2004

Search engines are unreliable tools for data collection for research via FirstMonday

Internet search engines reconstruct the past by updating their indices. This reconstruction does not follow the historical axis of time. While historical analyses aim to reconstruct the developments chronologically, search engines renew the time stamp of Web pages on the basis of the most recent update. In other words, search engines entertain a model of the Internet that evolves with the Internet. Under certain conditions such an evolution can become self–organizing. Unlike self–organization in biological systems, the historical traces of the development are overwritten by search engines to such an extent that they can only be retrieved artificially on the basis of a systematic research design.The search engines continuously reconstruct competing presents that also extend to their perspectives on the past. This has major consequences for the use of search engine results in scholarly research, but gives us a view on the various presents and pasts living side by side in the Internet.Search engines are an obligatory point of passage in Internet research as there is no unmediated access to the Web. The central issue is the reliability of social science data produced by search engines. The difference between the evolutionary and historical dynamics can be measured in terms of the "short" and "long" term memories of the search engines. It is now clear,that both the retrieval of information, and the quality of information retrieved, erodes over time.The study of updating cycles has an especially salient relevance to search engines. Some search engines (AltaVista and Google) can be used to search for information from certain periods of time. However, these "date stamps" are not determined by the first occurrence of these pages in the Web, but by the last date at which a page was updated. The "same" Web page may therefore belong to the year 1995 in a data set collected in 2003, while in a data set collected in 2004 it belongs to the year 2003. If used to search with historical dates, search engines represent the results of interacting frequencies of the updating of Web pages and search engine crawlers, and not necessarily the dates of publication of the documents under study.
Internet search engines continuously reconstruct the past by updating their indices. While the development of the engines remains historical, their dynamics evolve in the present and reflexively to the system to which they belong. Thus, these engines invert the time axis and enable the user to reconstruct a history by looking backwards.Because of the updating effect, this reconstruction will tend to draw Web sites into the most recent past, thus possibly erasing the older representations of that particular Web site. This also means that search engines tend to lose their history while evolving in the present. Yet it remains possible to systematically archive the indices of the different search engines or build up an independent Internet archive (e.g., the Internet Archive at http://web.archive.org). The past in the Internet is constantly overwritten by search engines. This affects the numbers of results as well as the actual Web pages that the search engines retrieve. The present, from where the data is collected, affects search results considerably. Search engines not only lose information quantitatively, but they also erase the structure entailed in the relationships between words in the titles of the Web pages.

ThinkExist.com Quotes
Sadagopan's Weblog on Emerging Technologies, Trends,Thoughts, Ideas & Cyberworld
"All views expressed are my personal views are not related in any way to my employer"