HIST 2041: Historiography

Methods for the researching and writing of history from textual sources.


Historiography and the Internet

By Paul Logsdon, Archivist and Past Director Heterick Memorial Library

In an ideal world, historians would have ready access to original sources. Since this is often not feasible, it is best to work with materials that are as close to the original documents as possible. Today this often means working with various reproductions. Reproductions of documents can take two forms; analog or digital. Traditionally, historians have worked with physical or analog reproductions.

 Analog reproductions can be printed facsimile reproductions, or copies on microforms. With facsimiles or microforms you are working with exact duplicates of the original work, complete with misspellings and any annotations added by earlier readers. In these cases, you are working with the unmediated content, unaltered by others. The arrival of computers and the Internet has opened a second alternative, digital reproduction. Digital reproductions can be created using several processes, each with its own characteristics. One early effort to reproduce complete texts of books was Project Gutenberg. This effort, still in existence, initially used volunteers to re-key books and other materials. This approach can digitize materials at a reasonable cost, but because it requires human workers, it can be slow and it will only be accurate if there is adequate quality control.

Currently, Project Gutenberg, and many other digitization efforts, use another technology, Optical Character Recognition (OCR). In this process, pages are scanned and software converts the scanned images into text with a fair degree of accuracy. The conversion is not perfect, and considerable post-scanning editing must still be done.

In terms of fidelity, perhaps the best solution is to reproduce a document as an image file. This shows the document largely as it appeared. Because it involves a minimum of human intervention, it is the approach least prone to error. It is also amenable to a fair degree of automation. However image files do occupy more storage space than a comparable text file and so may not be feasible for all applications. Another factor to be considered is that a supplemental text or HTML file may be required for indexing purposes if the digital reproduction is to be made searchable.

 Both types of reproductions raise concerns their durability. Low acid paper reproductions, if they are stored in a controlled climate, are extremely stable. Even when they are not, they tend to be fairly “forgiving” of abuse. Archival standards have been established for microfilm, and silver halide film, and film that meets them can be expected to last for 500 years. The longevity of other types of microfilm is not settled, but when stored under appropriate conditions, it should prove a stable medium. Digital reproductions do an excellent job of disseminating information, especially if it is necessary to incorporate the ability t search them. Since this is a new approach, however, questions of preservation are still being resolved.