University of Michigan Web Archives
The Board of Regents established the Bentley Historical Library in 1935 to serve as the official archives of the University of Michigan. While the University Archives and Records Program (UARP) has long worked to preserve university records of enduring historical and administrative value, today's electronic formats are just as important as earlier ledgers, catalogs, bulletins, and papers.
Bentley archivists have taken a proactive role in the preservation of digital materials by issuing recommendations for the design and maintenance of web-based content and by capturing select University of Michigan websites since 2000. The increasingly complex nature of online content and the proliferation of academic, administrative, and organizational websites have led UARP to explore new tools to capture a more extensive range of University of Michigan websites and document their changes over time.
After evaluating several subscription services, UARP has entered into a partnership with the California Digital Library's Web Archiving Service (WAS) as of July 1, 2010. WAS distinguished itself by providing essential infrastructure and data management services in addition to ongoing resource development. Please see the WAS homepage for an overview of the service as well as specific information for researchers and webmasters of harvested sites.
This arrangement will allow University Archives staff to focus on the identification, description, and capture of the ever-growing number of online university records. As a result, departments and units may feel confident that UARP will preserve unique websites on a regular basis and provide access to multiple versions of content across time.
Please see below for the respective responsibilities of UARP and the CDL in this initiative as well as opportunities afforded university units.
The University Archives and Records Program will...
- Appraise and select University of Michigan websites that reflect the academic, administrative, research, athletic, and public service endeavors of the institution.
- Contact units when important content is blocked by robots.txt files (click here for an explanation) or other technical difficulties exist.
- Organize collections of archived web content that reflect and complement current archival holdings in the Bentley Historical Library.
- Provide descriptions and contextual information for collections (including links to relevant UARP finding aids, documents that detail a unit's administrative history, describe collected primary source materials, and specify where additional content may be found).
- Exclude external (i.e. non U of M) websites from the collections (unless that content is essential to the display or understanding of a university site).
- Establish a user interface on the Bentley Historical Library website so researchers may search for and retrieve archived U of M web content.
- Distinguish 'archived' sites from 'live' content with a prominent banner and statement at the top of each preserved web page.
- Adhere to the recommendations of the Section 108 Study Group (a committee of copyright experts that examines issues in the digital world) for the preservation of publicly available websites. All content therefore will be embargoed for three months before its public release and content owners may request the suppression of archived material at any time.
The California Digital Library will...
- Maintain the Heretrix web crawler, a computer program ('or robot') that browses websites and saves a copy of all the content and hypertext links it encounters. By default, Heretrix will not degrade website performance and WAS will suspend harvesting if technical difficulties are detected on a target server.
- Provide secure storage of captured web content in a digital preservation repository at the San Diego Supercomputer Center.
- Host web-ready content from servers in the University of California Office of the President Data Center in Oakland, CA.
- Offer general technical assistance and customer support.
University of Michigan units will be able to...
- Rely upon UARP to identify, preserve, and provide access to various versions of unit websites over time.
- Allow the WAS web crawler to harvest your website by including the following exception in the site's robots.txt file (click here for more information on robots.txt files):
User-Agent: cdlwas_bot
Disallow:
- Contact UARP if a website is scheduled to go online, change, or be taken down.
- Request that UARP suppress archived content from public view after captures have been completed.
- Follow recommendations for the design and maintenance of websites for optimal web crawling as set forth in UARPs Guidelines for Web-Disseminated Records and Google's Webmaster Guidelines.
Note that UARP may not be able to preserve the form and/or functionality of all U of M websites as they appear on the live web. The following types of content are particularly difficult to capture and/or display:
- Dynamic scripts or applications such as JavaScript or Adobe Flash
- Streaming media with video or audio content
- Password protected material
- Forms or database driven content that requires interaction with the site
- Exclusions specified in robots.txt files
Access to the University of Michigan Web Archives Collection
Please contact UARP web archive staff by phone (734.764.3482) or email (bhlwebarchive@umich.edu ) for more information on the University of Michigan Web Archives.