This site has been archived

Dev Note: Google SA custom search

Summary

MIT’s OCW implementation uses MIT’s Google Search Appliance to support searching the OCW site.  This Development Note discusses customizing this search mechanism to provide richer, more useful results.

Background

MIT’s OCW site contains static content.  Unlike dynamic web sites where content can be assembled into a page on demand, each page at http://ocw.mit.edu exists fully formed waiting to be served.  While this approach has advantages, it limits the ability of users to query specific results.  The reason is that Google is simply looking for words; it doesn’t attribute any semantic meaning to these words.  For the query “Marx”, Google depends on the user submitting the query to distinguish between Groucho and Karl.

Objective

Use of MIT’s Google Search Appliance yields multiple benefits.  The primary benefit is that it leverages the current solution to return a richer query result set.  MIT’s OCW has very high standards for stability, scalability, and conventionality.  A solution that used its current solution would make it more attractive for future adoption.  Another benefit is the power of the Google search algorithms.  After all, “Google” is synonymous with online searching.

Technique

Customization is achieved by writing a XSL stylesheet that changes the results page from the familiar Google results page format to a layout of your choice.

The query string sent to the server when executing a search from the Greenfield site looks like this:

http://search.mit.edu/search__EVENTTARGET=&__EVENTARGUMENT=&
site=ocw&client=mit&getfields=*&output=xml_no_dtd&
proxystylesheet=%2Foeit%2FOcwWeb%2Fsearch%2Fgoogle-ocw.xsl&
proxyreload=1&as_dt=i&oe=utf8&departmentName=web&
courseName=&q=grandmother&btnG.x=11&btnG.y=7

Which breaks down this way:

http://search.mit.edu/search?
__EVENTTARGET ocw
__EVENTARGUMENT
site ocw Limits search results to the contents of the specified collection.
client mit A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules. Notice that the rendering of the front end is determined by the proxystylesheet parameter.
getfields * Indicates that the names and values of the specified meta tags should be returned with each search result, when available.
output xml_no_dtd Selects the format of the search results.  xml_no_dtd specifies custom HTML.  (See proxystylesheet parameter for details.)
proxystylesheet %2Foeit %2FOcwWeb %2Fsearch %2Fgoogle-ocw.xsl If the value of the output parameter is xml_no_dtd, the output format is modified by the proxystylesheet value as follows:  Results are in Custom HTML format. The XSL stylesheet associated with the specified Front End is used to transform the output.
proxyreload 1 Instructs the Google Search Appliance when to refresh the XSL stylesheet cache.  A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested.
as_dt i Include only results in the web directory specified by as_sitesearch
oe utf-8 Sets the character encoding that is used to encode the results.
departmentName web custom field
courseName custom field
q grandmother the text of the query submitted by the user
btnG.x 11
btnG.y 7

Resources

[1] Google’s Search Protocol Reference documents how queries are submitted and results returned from the Google Search Appliance.

[2] MIT’s guide to Google Stylesheets

Creative Commons LicenseUnless otherwise specified, the Greenfield Website by the MIT Office of Digital Learning, Strategic Education Initiatives is licensed under a Creative Commons Attribution 4.0 International License.
Portions subject to the MIT OpenCourseWare Creative Commons License and Terms of Use.