MIT’s OCW implementation uses MIT’s Google Search Appliance to support searching the OCW site. This Development Note discusses customizing this search mechanism to provide richer, more useful results.
MIT’s OCW site contains static content. Unlike dynamic web sites where content can be assembled into a page on demand, each page at http://ocw.mit.edu exists fully formed waiting to be served. While this approach has advantages, it limits the ability of users to query specific results. The reason is that Google is simply looking for words; it doesn’t attribute any semantic meaning to these words. For the query “Marx”, Google depends on the user submitting the query to distinguish between Groucho and Karl.
Use of MIT’s Google Search Appliance yields multiple benefits. The primary benefit is that it leverages the current solution to return a richer query result set. MIT’s OCW has very high standards for stability, scalability, and conventionality. A solution that used its current solution would make it more attractive for future adoption. Another benefit is the power of the Google search algorithms. After all, “Google” is synonymous with online searching.
Customization is achieved by writing a XSL stylesheet that changes the results page from the familiar Google results page format to a layout of your choice.
The query string sent to the server when executing a search from the Greenfield site looks like this:
Which breaks down this way:
|site||ocw||Limits search results to the contents of the specified collection.|
|client||mit||A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules. Notice that the rendering of the front end is determined by the
|getfields||*||Indicates that the names and values of the specified meta tags should be returned with each search result, when available.|
|output||xml_no_dtd||Selects the format of the search results.
|proxystylesheet||%2Foeit %2FOcwWeb %2Fsearch %2Fgoogle-ocw.xsl||If the value of the
|proxyreload||1||Instructs the Google Search Appliance when to refresh the XSL stylesheet cache. A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested.|
|as_dt||i||Include only results in the web directory specified by
|oe||utf-8||Sets the character encoding that is used to encode the results.|
|q||grandmother||the text of the query submitted by the user|
 Google’s Search Protocol Reference documents how queries are submitted and results returned from the Google Search Appliance.