Archive-It at the Tennessee State Library and Archives
Todd Wallwork, Digital Materials Librarian
Stephanie Sutton, Director of Library Technical Services
Ann Whitver, Government Documents Librarian
Brian Fussell, State Publications Coordinator
Tennesse State Library and Archives
Originally presented at the 2012 Tennessee Library Association conference.
In accordance with Tennessee Code, the Tennessee State Library and Archives (TSLA) serves as a depository of state documents such as annual and biennial reports, statistical reports, and budget documents. In keeping with its mission, TSLA’s State Publications Committee strives to preserve all Tennessee state publications in a manner that provides free and easy access, regardless of format. While keeping track of such publications to ensure the acquisition and retention of such publications is difficult in paper format, many state entities now publish their documents only on their websites, making the task more challenging.
Around 2004, it became evident that the trend of web publishing state documents was growing at an alarming rate. As more state publications were published in electronic format, TSLA found itself in a predicament. In some cases, TSLA was not receiving paper publications in a timely manner; and, in the case of electronically published materials, TSLA was not receiving them at all. Moreover, material that was only published on state websites would disappear from those websites without providing continued access to the past materials.
With these issues at the forefront, the TSLA State Publications Committee began a quest for a solution that would preserve online based electronic materials while also making them easily accessible to the public in the long term. The solution found was Archive-It, a subscription-based web archiving service of the Internet Archive, a 501(c) (3) corporation. Since Archive-It first launched in 2006, it has grown to over 190 “partner” institutions in 44 states and 16 countries worldwide.
To capture web content, Archive-It uses the Heritrix web crawler, which is an open-source, archival-quality web crawler. Websites are mapped through the links they contain and their content is then preserved and displayed on the Internet Archive’s Wayback Machine for public access. The Wayback Machine is software, authored by the Internet Archive, that makes archived web sites searchable, browsable, and viewable.
Typically, web pages within the Wayback Machine look and act identical to live pages. There are a few tell-tale signs, however, that will indicate that the page is archived. First, the URL begins with the Archive-It Wayback host prefix (http://wayback.archive-it.org). It also contains the collection number, the date and time of the captured content (in year, month, day, hour, minute, and second format), and then the URL crawled (see Figure 1).
Figure 1. Breakdown of an Archive-It URL
Another indicator that one is viewing an archived page is the Archive-It banner at the top of each page viewed in the Wayback Machine. This pale yellow banner not only shows that the currently viewed web page is not live, but also provides the name of the institution that archived the page, the service used to archive the page, and the date and time that the page was captured (see Figure 2).
Figure 2. Breakdown of the Archive-It page banner.
Since the inception of TSLA’s partnership with Archive-It in 2006, TSLA has utilized the service to create 132 collections. Of its collections, two are archived on a daily basis, and 62 collections are archived quarterly. Others are crawled as needed, or are dormant collections for sites that TSLA no longer collects. In addition to web-published material, TSLA uses a wide net approach to archive Tennessee-related content that is not considered “published,” but rather is posted on Tennessee government and regional organization sites.
Beyond simply acquiring the web content, TSLA must also maintain the collections. To this end, TSLA takes great measures to ensure quality control and provide easy access to the archived collections. One significant responsibility is to monitor the collections and scheduled crawls for URL changes or crawl malfunctions, occasionally mandating a change in, or addition of, site URLs. To provide easy access to the archived material, TSLA’s website contains a Tennessee Government Web Archive page, which provides links to preserved government web pages and data (see Figure 3).
Figure 3. The Tennessee Government Web Archive
In addition to the Tennessee Government Web Archive page, TSLA also provides links in its integrated library system catalog records to archived electronic publications, Tennessee agency collections, and government websites (see Figure 4). These links lead to either a captured web page or directly to an electronic publication.
Figure 4. The Tennessee Government Web Archive also links to publications indexed in the State Library catalog.\
These two access methods and the web pages, along with the sites and documents contained therein, can also be directly utilized by other Tennessee library and archival institutions. TSLA creates and maintains these collections expressly for the purpose of providing wide and permanent public access. TSLA highly encourages deep linking to these materials as needed, down to the document level as necessary, for any institution that finds them useful for their patrons. The TSLA State Publications Committee is also happy to offer assistance to any institution looking to provide their patrons with access to these archived resources.
Providing the public with easy, permanent access to state-produced web content is a priority of TSLA. The TSLA State Publications Committee accomplishes this task in partnership with Archive-It and the Internet Archive. This allows TSLA to fulfill its legislative mandate as a state depository, as well as to ensure the public continued access to essential Tennessee publications and information. These materials provide valuable insight into the functioning of the state of Tennessee’s government, information concerning the citizens of Tennessee, and how they serve and support each other. As more and more government information continues to be published and placed online, TSLA has found Archive-It a valuable tool for making sure this information is available to future generations of Tennesseans.