The Search Engine Saga – Pt 1
“A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted…Presumably, man’s spirit should be elevated if he can better review his own shady past and analyze more completely and objectively his present problems.” Vannevar Bush (1945)
In addition to being an American, an engineer, and a developer of the atomic bomb, Vannevar Bush was one of the first to wonder if technology offered a better solution to data storage than dusty libraries and those oh-so-light-weight file cabinets. In a 1945 Atlantic Monthly article he proposed the “memex,” a theoretical hypertext computing system, in the form of a Go-Go Gadget style desk, that could record, store, and retrieve data. Though the memex never made it past the design stage, Bush had unknowingly established the pre-Internet vision of a search engine.
Salton’s Magic Automatic Retriever of Text (SMART) was developed by Gerard Salton and his collaborators at Cornell University in the 1960s. It was mainly a database of notable scientific publications, but it had three innovative features that had not been seen before. First, SMART used a vector space model which represented documents by algorithms that could be automatically read and filtered. Second, SMART provided relevant feedback – where query results, obtained thanks to the algorithms, could be evaluated by relevance to the query. Finally, Rocchio Classification recalled the relevance feedback, weighted the results based on relevance to the user, and continuously improved the results the user received in future queries. However, there were no images, maps, or “I’m feeling lucky” options just yet.
Instead, enter Archie, the first generic user-friendly search engine. Short for “archives” this system gathered script-based data from files and matched the data with expressions submitted by a user query. Since Archie lived in 1990, before Al Gore invented the Internet that we know (yes, I’m just kidding), the bulk of files were retrieved and shared offline via File Transfer Protocol (FTP) networks.
You know what happened next. The death of face to face communication, music stores’ worst nightmare, whatever you want to call it, the Internet came along and rocked everyone’s world. Early search engines were the backbone of this new worldwide interconnectivity. There was no use having a website unless people could find it when they needed it – even if there were only 150 available web pages in 1993. But the system that managed those few is the same that is used to handle the over one trillion web pages in existence today.
The original search engine can be simplified into three basic parts. First, are the “spiders” that crawl the web and request information from web pages. The information they gather in their wanderings is compiled into an index which is being continuously updated as the spiders find new pages or find changes to old pages. The algorithms pioneered by the likes of Archie determine how that index is sorted and ranked. And the third and final piece is the search engine interface software that matches what you type into the box with what’s cataloged in the index.
These three key ingredients had constructed the window Vannevar Bush had envisioned, through which the world and all its secrets, past present and future, could be viewed by anyone able to click the “Search” button. Granted, their computers were the same size as their microwaves in those days, but hey, we had to start somewhere.
In a later post, I’ll discuss what the search engine competition looks like today – the big shots and the wannabes – and how they are changing our world on a daily basis. For now, I’ll let Bruce wrap this up:
From now on, the struggle will not be over mechanical control of the means of information, but over spin-control of the zeitgeist. – Bruce Sterling, (1994)