Search – Greenfield http://greenfield.mit.edu A New Way of Thinking about OpenCourseWare and Open Educational Resources for MIT Tue, 07 Jun 2016 15:39:45 +0000 en-US hourly 1 https://wordpress.org/?v=4.5.3 Turning metadata, usage data and stuff into something interesting for OCW http://greenfield.mit.edu/2010/10/24/turning-metadata-usage-data-and-stuff-into-something-interesting-for-ocw/ Mon, 25 Oct 2010 01:40:53 +0000 http://greenfield.mit.edu/?p=156 I ran across this summary of a meeting on “What metadata is really useful?” hosted by JISC CETIS. While structured (administrative) metadata is less useful to me now than it was when I was working on IEEE LOM, the concept of data about the use of resources like MIT OpenCourseWare is of interest to me.

Phil Baker summarizes the meeting outcomes as:

Here’s a sampler of the ideas turned up during the day:
* continue to build the resources with background information that I gathered for the meeting.
* promote the use common survey tools, for example the online tool used by David Davies for the MeDeV subject centre (results here).
* textual analysis of metadata records to show what is being described in what terms.
* sharing search log in a common format so that they can be analysed by others (echoes here of Dave Pattern’s sharing of library usage data and subsequent work on business intelligence that can be extracted from it).
* analysis of search logs to show which queries yield zero hits which would identify topics on which there was unmet demand.

Baker, P. (2010, October 19). “CETIS “What metadata…?” meeting summary.” Retrieved October 24, 2010 from Phil’s JISC CETIS blog website: http://blogs.cetis.ac.uk/philb/2010/10/19/cetiswmd-summary/

Phil has another post on “Analysing OCWSEarch logs” that is also interesting.

]]>
Dev Note: Google SA custom search http://greenfield.mit.edu/2010/06/28/dev-note-google-sa-custom-search/ Mon, 28 Jun 2010 16:43:51 +0000 http://greenfield.mit.edu/?p=63 Summary

MIT’s OCW implementation uses MIT’s Google Search Appliance to support searching the OCW site.  This Development Note discusses customizing this search mechanism to provide richer, more useful results.

Background

MIT’s OCW site contains static content.  Unlike dynamic web sites where content can be assembled into a page on demand, each page at http://ocw.mit.edu exists fully formed waiting to be served.  While this approach has advantages, it limits the ability of users to query specific results.  The reason is that Google is simply looking for words; it doesn’t attribute any semantic meaning to these words.  For the query “Marx”, Google depends on the user submitting the query to distinguish between Groucho and Karl.

Objective

Use of MIT’s Google Search Appliance yields multiple benefits.  The primary benefit is that it leverages the current solution to return a richer query result set.  MIT’s OCW has very high standards for stability, scalability, and conventionality.  A solution that used its current solution would make it more attractive for future adoption.  Another benefit is the power of the Google search algorithms.  After all, “Google” is synonymous with online searching.

Technique

Customization is achieved by writing a XSL stylesheet that changes the results page from the familiar Google results page format to a layout of your choice.

The query string sent to the server when executing a search from the Greenfield site looks like this:

http://search.mit.edu/search__EVENTTARGET=&__EVENTARGUMENT=&
site=ocw&client=mit&getfields=*&output=xml_no_dtd&
proxystylesheet=%2Foeit%2FOcwWeb%2Fsearch%2Fgoogle-ocw.xsl&
proxyreload=1&as_dt=i&oe=utf8&departmentName=web&
courseName=&q=grandmother&btnG.x=11&btnG.y=7

Which breaks down this way:

http://search.mit.edu/search?
__EVENTTARGET ocw
__EVENTARGUMENT
site ocw Limits search results to the contents of the specified collection.
client mit A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules. Notice that the rendering of the front end is determined by the proxystylesheet parameter.
getfields * Indicates that the names and values of the specified meta tags should be returned with each search result, when available.
output xml_no_dtd Selects the format of the search results.  xml_no_dtd specifies custom HTML.  (See proxystylesheet parameter for details.)
proxystylesheet %2Foeit %2FOcwWeb %2Fsearch %2Fgoogle-ocw.xsl If the value of the output parameter is xml_no_dtd, the output format is modified by the proxystylesheet value as follows:  Results are in Custom HTML format. The XSL stylesheet associated with the specified Front End is used to transform the output.
proxyreload 1 Instructs the Google Search Appliance when to refresh the XSL stylesheet cache.  A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested.
as_dt i Include only results in the web directory specified by as_sitesearch
oe utf-8 Sets the character encoding that is used to encode the results.
departmentName web custom field
courseName custom field
q grandmother the text of the query submitted by the user
btnG.x 11
btnG.y 7

Resources

[1] Google’s Search Protocol Reference documents how queries are submitted and results returned from the Google Search Appliance.

[2] MIT’s guide to Google Stylesheets

]]>