Developer – Greenfield http://greenfield.mit.edu A New Way of Thinking about OpenCourseWare and Open Educational Resources for MIT Tue, 07 Jun 2016 15:39:45 +0000 en-US hourly 1 https://wordpress.org/?v=4.5.3 How did we prepare the i.Experience mirror? http://greenfield.mit.edu/2010/07/20/how-did-we-prepare-the-i-experience-mirror/ Tue, 20 Jul 2010 21:17:46 +0000 http://greenfield.mit.edu/?p=19 We recently updated the Greenfield site in preparation for:

  • An internal OEIT project to replace the video players in selected courses with SpokenMedia project player.
  • An internal OEIT project to demonstrate search through SpokenMedia transcript files for delivery via MIT’s Google Search Appliance integrated with Greenfield OCW courses.
  • A joint project with MIT OpenCourseWare to test OER Recommender/Folksemantic.com recommendations with select OCW courses.

Here are the steps we used to prepare the i.Experience mirror of MIT OCW. This is our first second implementation of i.Experience.

Notes:

  • I originally started writing this post in April, but I’ve updated it to include the most recent changes to the mirror site.
  • I expect to produce a new version of this guide in mid-/late-August as OCW issues a new version of their mirror.

Steps to prepare MIT OCW mirror for hosting as the i.Experience copy.

  • To reduce disk storage needs and make the text processing go faster, we removed all of the metadata files (corresponding to the IMS Content Packaging specification) to reduce the size of the mirror copy, and also to speed the text processing described above. This saves approximately .7GB of disk space (from 1.28GB to 0.58GB)
    • In the root of the OCW mirror copy, we executed this shell script:
      find . -type f -name "*.xml" -exec rm -f {} ;
      echo -e 'a'; sleep 0.5; echo -e 'a';echo -e 'a'; sleep 0.5; echo -e 'a';
  • The MIT OCW Mirror has hard-coded links (i.e., anchor links 'a href="/OcwWeb/...') that don’t match with how we wish to serve the site. Since we wanted to have the capability to host multiple mirror sites from the Greenfield server, we needed to address these links. There are a number of mechanisms to do it–the one we chose was to just do a search and replace for the path names:
    • We tried using shell scripting, and also BBEdit, to handle the bulk search and replaces. Both had their limitations–I could not get some of the more complex search and replaces to work with sed and I had to split the files up into 3 groups to use BBEdit (I suspect BBEdit was runnign out of memory).
    • For all of the intra-site links and image references in text files, we replaced "/OcwWeb with "/oeit/OcwWeb
    • Similarly we replaced links for a specific javascript file /OcwWeb/js/Rotatingprofiles.js with /oeit/OcwWeb/js/Rotatingprofiles.js Originally, this replacement of .js files was done as a separate step, but the above search and replace also replaces the links to the javascript files.
  • We commented out the references to the advanced search page. We replaced <a href="/oeit/OcwWeb/search/AdvancedSearch.htm">Advanced Search</a> with <!--<a href="/oeit/OcwWeb/search/AdvancedSearch.htm">Advanced Search</a>-->. We will ultimately be adding the Greenfield site to the MIT’s Google Search Appliance and restricting the search results to only queries originating from greenfield.mit.edu, but we needed to get a copy of the site online for the appliance to index first.
  • We alerted users that the download links are not working (due to the restructuring of the ocw.mit.edu website). We replaced:
    <h1>Download Course Materials</h1>
    <p>

    With:
    <h1>Download Course Materials</h1>
    <p><strong>Note: Download is non-functional</strong></p><p>
  • (This change should not be necessary in the future) We reinstated the RSS functionality–or to be more precise, we updated the RSS links so they provide the MIT OCW feeds. At some point in the future we will create a set of RSS feeds from the Greenfield site. We replaced: http://feeds.pheedo.com/oeit/OcwWeb/rss with http://feeds.pheedo.com/OcwWeb/rss/
  • To further reduce the file size impact for the mirror we are redirecting all of the links to individual resources (PowerPoint slides, PDF documents, etc.) to the live OCW website. This saves approximately 22GB of disk space (not including videos). Well, we were hoping to save space, but it turns out OCW changed their website structure in late-May 2010 and did not put in place server redirects/rewrites for these NR links.
    • We replaced /NR/ with http://ocw.mit.edu/NR/ We replaced /NR/ with /oeit/NR/
  • We added a header at the top of each page to reduce the likelihood of visitors confusing the mirror site with the live MIT OCW site. We replaced (there were 4 separate body tags we had to replace):
    <body id="global" onunload="OCWCustomResearch();">
    <body onunload="OCWCustomResearch();">
    <body onunload="OCWCustomResearch();" id="global">
    <body id="home" onunload="OCWCustomResearch();" onload=" changeQuote();">
    With:
    <body id="global" onunload="OCWCustomResearch();">
    <span align="center"><div style="text-align:center;font-style:bold;color:#ffffff;font-size:8pt;background-color:#808285;width:100%;padding:2px;padding-bottom:5px;position:fixed;top:0;left:0;">This EXPERIMENTAL mirror of MIT OCW brought to you by the MIT Office of Educational Innnovation and Technology :: <a href="http://greenfield.mit.edu/" style="color:#ffffff;">About Project Greenfield</a></div>
  • There are a few .jsp links in the left navigation and top navigation that we did not attempt to get working. We commented out the links, but left the text to preserve the look and feel.
    • We replaced <li id="lftNavActnsFeedback" class="courses"><a href="/oeit/OcwWeb/jsp/feedback.jsp?Referer=">Send us your feedback</a></li> with <li id="lftNavActnsFeedback" class="courses"><!--<a href="/oeit/OcwWeb/jsp/feedback.jsp?Referer=">-->Send us your feedback</a></li>.
    • We replaced <li id=”lftNavActnsEmail”><a href=”javascript:emailPopUp()”>Email this page</a></li> with <li id="lftNavActnsEmail"><!--<a href="javascript:emailPopUp()">-->Email this page</a></li>.
    • We replaced <li id="lftNavActnsNewsletter"><a href="/oeit/OcwWeb/jsp/subscribe.jsp">Newsletter sign-up</a></li> with <li id="lftNavActnsNewsletter"><!--<a href="/oeit/OcwWeb/jsp/subscribe.jsp">-->Newsletter sign-up</a></li>.
    • We replaced <li id="lftNavActnsCite"><a href="/oeit with <li id="lftNavActnsCite"><!--<a href="/oeit.
    • We replaced <a href="/oeit/OcwWeb/jsp/newsletter.jsp"><img src="/oeit/OcwWeb/images/newsletter_signup_trans.gif" alt="OCW Newsletter Signup" width="128" /></a> with <!--<a href="/oeit/OcwWeb/jsp/newsletter.jsp"><img src="/oeit/OcwWeb/images/newsletter_signup_trans.gif" alt="OCW Newsletter Signup" width="128" /></a>-->. (There were two versions of this code we replaced.)
  • To prevent users from emailing questions about Greenfield to MIT OCW, we commented out the email hyperlinks.
    • There were occurences in two javascript files: /js/styleswitch_search.js and /js/styleswitch.js. We replaced <ul><li class="email"><a href="javascript:emailPopUp()">Email this page</a></li></ul> with <!--<ul><li class="email"><a href="javascript:emailPopUp()">Email this page</a></li></ul>-->.
    • And, there were Contact Us links to disable (there were 4 separate versions we had to replace). We replaced <a href="/oeit/OcwWeb/jsp/feedback.jsp?Referer=">Contact
      Us</a>
      with <!--<a href="/oeit/OcwWeb/jsp/feedback.jsp?Referer=">-->Contact
      Us<!--</a>-->
  • We added a Google Analytics link to each page. We replaced </body> with <script type="text/javascript">
    var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
    document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
    </script>
    <script type="text/javascript">
    try {
    var pageTracker = _gat._getTracker("UA-9353349-4");
    pageTracker._setDomainName("none");
    pageTracker._setAllowLinker(true);
    pageTracker._trackPageview();
    } catch(err) {}</script>
    </body>
]]>
Dev Note: Google SA custom search http://greenfield.mit.edu/2010/06/28/dev-note-google-sa-custom-search/ Mon, 28 Jun 2010 16:43:51 +0000 http://greenfield.mit.edu/?p=63 Summary

MIT’s OCW implementation uses MIT’s Google Search Appliance to support searching the OCW site.  This Development Note discusses customizing this search mechanism to provide richer, more useful results.

Background

MIT’s OCW site contains static content.  Unlike dynamic web sites where content can be assembled into a page on demand, each page at http://ocw.mit.edu exists fully formed waiting to be served.  While this approach has advantages, it limits the ability of users to query specific results.  The reason is that Google is simply looking for words; it doesn’t attribute any semantic meaning to these words.  For the query “Marx”, Google depends on the user submitting the query to distinguish between Groucho and Karl.

Objective

Use of MIT’s Google Search Appliance yields multiple benefits.  The primary benefit is that it leverages the current solution to return a richer query result set.  MIT’s OCW has very high standards for stability, scalability, and conventionality.  A solution that used its current solution would make it more attractive for future adoption.  Another benefit is the power of the Google search algorithms.  After all, “Google” is synonymous with online searching.

Technique

Customization is achieved by writing a XSL stylesheet that changes the results page from the familiar Google results page format to a layout of your choice.

The query string sent to the server when executing a search from the Greenfield site looks like this:

http://search.mit.edu/search__EVENTTARGET=&__EVENTARGUMENT=&
site=ocw&client=mit&getfields=*&output=xml_no_dtd&
proxystylesheet=%2Foeit%2FOcwWeb%2Fsearch%2Fgoogle-ocw.xsl&
proxyreload=1&as_dt=i&oe=utf8&departmentName=web&
courseName=&q=grandmother&btnG.x=11&btnG.y=7

Which breaks down this way:

http://search.mit.edu/search?
__EVENTTARGET ocw
__EVENTARGUMENT
site ocw Limits search results to the contents of the specified collection.
client mit A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules. Notice that the rendering of the front end is determined by the proxystylesheet parameter.
getfields * Indicates that the names and values of the specified meta tags should be returned with each search result, when available.
output xml_no_dtd Selects the format of the search results.  xml_no_dtd specifies custom HTML.  (See proxystylesheet parameter for details.)
proxystylesheet %2Foeit %2FOcwWeb %2Fsearch %2Fgoogle-ocw.xsl If the value of the output parameter is xml_no_dtd, the output format is modified by the proxystylesheet value as follows:  Results are in Custom HTML format. The XSL stylesheet associated with the specified Front End is used to transform the output.
proxyreload 1 Instructs the Google Search Appliance when to refresh the XSL stylesheet cache.  A value of 1 indicates that the Google Search Appliance should update the XSL stylesheet cache to refresh the stylesheet currently being requested.
as_dt i Include only results in the web directory specified by as_sitesearch
oe utf-8 Sets the character encoding that is used to encode the results.
departmentName web custom field
courseName custom field
q grandmother the text of the query submitted by the user
btnG.x 11
btnG.y 7

Resources

[1] Google’s Search Protocol Reference documents how queries are submitted and results returned from the Google Search Appliance.

[2] MIT’s guide to Google Stylesheets

]]>