Electronic Texts, File Formats, and
Copyright: The Christian Classics Ethereal
Library
Review by,
Perry Willett
Indiana University
pwillett@indiana.edu
Willett, Perry. "Electronic Texts, File Formats, and
Copyright: The Christian Classics Ethereal Library"
Early Modern Literary Studies 1.2 (1995): 12.1-27
<URL: http://purl.oclc.org/emls/01-2/rev_wil2.html>.
Contents
- Introduction: Electronic Texts, File Formats, and Copyright
- Christian Classics Ethereal Library. Harry Plantinga, general editor.
- Afterword
- Bibliography
Introduction
- Two factors, file format and copyright, invisibly shape the creation of electronic texts. Electronic files can be created and stored in a variety of formats, with each format having features that allow or limit possible uses of the text. Copyright, and its interpretation, has tremendous influence over the texts that are chosen for transferral to electronic formats. These two issues underlie many or most editorial decisions made when creating an electronic text, and determine the eventual uses for which a text is most appropriate. Moreover, both of these factors, in very different ways, influence the text collections that are available for use over the World Wide Web.
- Before reviewing electronic texts, one must first consider their scholarly uses. The ease of duplication and transmission of electronic files make them ideal for retrieval. Those who wish to find a copy of the Bible, or Augustine's Confessions, may now download the texts through the World Wide Web. They may wish simply to read the text as they would a printed edition, but with the electronic version, they never have to worry about it being checked out from the library or missing from the shelf. As long as the workstation and network connections are running, these texts will be available for reading, saving to disk, or printing.
- Electronic texts can also be used as concordances, for finding specific passages. This is something the most basic word processor or text editor can perform in a rudimentary way, but more sophisticated searches, such as those involving word proximity, require more substantial search engines than those found in word processors. Examples of tools available for analysis of electronic texts include WordCruncher, a commercial software package that creates concordances and allows for complex searching and analysis of electronic texts, and TACT (Textual Analysis Computing Tools), software for searching and textual analysis, freely available from the University of Toronto.
- These two functions mirror current uses of books, perhaps improving on delivery or searching (especially if no printed concordance is available for a particular text). If the ultimate use of an electronic text is as an electronic reproduction of a printed book, then the only formatting required will be to make the typeface and page layout attractive and easy to read. Other types of research are possible using electronic editions. Linguistic, semantic, or syntactic features of texts could be tagged and used in researching a particular text, or across a collection of texts.
- The World Wide Web, in implementing a notion of hypertext, has capabilities very different than printed books, and allows for other, broadened uses of electronic texts. In this medium, texts can be linked to notes, to variants, to other texts, even to graphics or sound or movies, or to itself. Creating electronic editions designed to allow these kinds of research requires planning and considerably more work than scanning or typing in a transcription of a printed version, but some texts would benefit greatly from the availability of such tagging or links, making this effort worthwhile.
- Scholars who wish to take advantage of these features of electronic texts in their research will notice the paucity of texts available in the public domain. Even as the number of electronic texts grows rapidly, there are generally few choices of editions of any particular text. One major impediment to the wholesale creation of electronic editions is copyright. Anyone who creates an electronic text and makes it available for use by others must be concerned with the copyright status of the edition that is chosen. Only texts that are in the public domain can be made freely available over the Internet, and this fact restricts greatly the choices of editors.
- Copyright remains a vexing question when applied to electronic texts. The ease of reproduction and delivery of electronic texts brings into clear conflict the rights of authors and publishers on the one hand, and the desire of researchers for electronic versions of important works or particular editions, on the other. Copyright will continue to cause problems for creators of electronic versions of printed works due to its complexity.
- U.S. copyright laws have changed over the years, with the most important change occurring in 1978. As a minimal outline, works created before 1978 could obtain an initial copyright of 28 years, with a renewal available that in some cases extended copyright to 75 years. Works created after 1978 extended copyright to 50 years after the author's death. And, as stated in the Frequently Asked Questions About Copyright (v.1.1.2), International Aspects, under international copyright law, generally speaking, "an author's rights are respected in another country as though the author were a citizen of that country," creating even larger complications for international distribution of electronic texts over the Internet. Copyright has many more complexities than can be explained in this brief overview, and the general lack of familiarity may prevent more scholars from creating electronic texts. There is a WWW page designed to assist in understanding copyright, with a helpful list of "frequently asked questions." In addition, the Harry Ransom Humanities Research Center at the University of Texas and the University of Reading Library have embarked on a joint project called WATCH (Writers And Their Copyright Holders) to collect and provide access "to the names of addresses of copyright holders for English-language authors whose papers are housed, in whole or in part, in libraries and archives in North America and the United Kingdom." Even with these aids, the question of whether a particular book is in the public domain will rarely be free of ambiguity. The choice for scholars is either to create new editions, such as the Renaissance Electronic Text series ( reviewed in the previous issue of EMLS), or, as is much more common, to make do with older editions until this issue can be resolved (if, indeed, it could ever be solved to satisfy all parties).
Christian Classics Ethereal Library. Harry Plantinga, general editor. Pittsburgh: University of Pittsburgh, 1994-.
- The WWW provides a particularly appropriate environment for biblical editions and studies because of its potential to link the extensive intra- and intertextual references, allusions and glosses. The Bible itself can be thought of as a kind of proto-hypertext as discussed by Delany and Landow in their introduction to Hypermedia and Literary Studies. One could imagine an ideal hypertextual Bible that would allow the reader to follow all the typologies and allusions through a series of links. Instead of having to flip manually, one could simply click on a verse or phrase to follow these links through the rest of the text. One could also, in this ideal edition, easily compare different versions or translations at a touch of a button, or link to an encyclopedia, dictionary, commentary or image file. (As Steven DeRose points out, such a Bible could easily have over one million explicit links.)
- Biblical scholars embraced the computer for their research long before other scholars in the humanities. They have sophistical electronic resources, such as BibleWindows and CDWord for collations of multiple biblical versions, or CATSS-Base for aligned Greek and Hebrew bibles and their variants. These resources combine text with software, and were designed before the World Wide Web existed, to run on single workstations. Robert Kraft began an online journal for biblical studies, Offline, over 10 years ago. There are a number of guides to electronic religious studies materials, such as those by Gresham or Strangelove, that point to the large number of electronic discussion groups, journals, and software libraries available over the Internet. As expertise grows, one expects to see a rapid growth in the amount of resources available through the WWW for biblical studies.
- The Christian Classics Ethereal Library is such a resource, and provides an example of how the two factors of copyright and file format influence the availability and possible uses of electronic texts. Plantinga is a professor of computer science at the University of Pittsburgh, and is creating a growing library of works of interest to literary and religious scholars. Works by Thomas à Kempis, Milton, Bunyan, Calvin, Jonathan Edwards, St. John of the Cross and others are available. The works of some of these authors, such as Milton and Bunyan, are available in different versions available at several different WWW sites. However, the importance of this collection is that most of the works are not available elsewhere. Plantinga's work in creating this library is extremely important for biblical scholars, for he has created versions of texts otherwise not available in electronic form.
- Unlike the electronic texts reviewed in the previous issue of EMLS, most of these works are not encoded using SGML (Standard Generalized Markup Language), but instead use other formats. Plantinga has made most of the texts available in RTF, or Rich Text Format, developed by Microsoft, which can be used with word processors such as Microsoft Word, WordPerfect, Framemaker, or others. Other electronic texts are formatted as PDF or Portable Document Format, developed by Adobe. PDF files can only be viewed or printed with software called Acrobat, freely available from Adobe. (Acrobat works on a limited number of platforms, including DOS, Windows, and Macintosh.) There are also texts available in Hypercard editions, for use with the Macintosh Hypercard program, or in plain ASCII format. Some of the texts are available in multiple formats, providing an opportunity to compare the relative strength of each format.
- RTF and PDF are page description languages and are largely concerned with how the text looks, either on the screen or on the printed page. Clearly, Plantinga conceives of his collections solely for reading after printing the file--he states in an explanation of file formats that "reading these books is easier from a printed version." He has therefore chosen file formats that will allow for attractively printed documents with little effort.
- One may look at his versions of St. Augustine's Confessions as an example of the collection available. All of the versions of the Confessions in their various formats seem to be accurate transcriptions of the original editions. No errors or omissions were evident, and thus any of these versions pass the first critical test of any electronic text: that of accuracy.
- Two translations are available, one by the Rev. E.J. Pusey and the other by Albert Outler. Plantinga has chosen editions in the public domain, as he clearly states at the top of each edition. Neither translation is considered the most important for scholarly uses. James O'Donnell, in an introduction to a plain text version of the Pusey translation, calls it "not the best," listing several other more modern editions, but notes that it is "safely out of copyright."
- As with the other texts in this library, Plantinga asserts that these two editions are in the public domain. In the case of Outler's translation, published in 1955, this is true only if the copyright was not renewed. Under U.S. copyright law, works published before 1978 went into public domain after 28 years unless the copyright was renewed. One would have to review publications from the U.S. Copyright Office to determine whether the copyright on this edition was renewed in 1983, when the work passed into the public domain. To make matters more complicated, the series published by the U.S. Copyright Office listing copyright renewals each year has been published only through 1982. Researchers would have to contact the Copyright Office directly to be certain of the copyright status of Outler's translation. Most editors and researchers, faced with such complexities, make choices in good faith without knowing in all certainty the exact status of particular texts.
- In turning to considerations of file formats, Outler's translation is available in a variety of electronic formats, including plain text, RTF, and PDF. The Pusey translation is formatted in RTF and as a Macintosh Hypercard stack. Plantinga generally gives some bibliographic information about the editions in the library, including author, title, translator and date of publication (and this is much more than given in some electronic texts found at other sites). However, in a serious omission, he neglects to include the publisher. This slip would not be accepted in printed editions, and one hopes that future editions in the Library will include this information.
- The limitations of the plain ASCII text will be familiar to almost anyone who has used a computer to create texts. The limited character set requires either that one omits foreign characters, or changes them to their unaccented equivalents. Outler includes Greek words and phrases in his notes, for instance; these words have been omitted in the ASCII version. He also includes reference to French editions and translations; any accented character has been changed. The notes are at the end of this fairly large file, making following references rather cumbersome.
- The RTF and PDF versions, as mentioned above, were meant to look as much like printed editions as possible. Indeed, when viewing the files with a word processor (in the case of the RTF file) or Adobe Acrobat (for the PDF file), they do look very much like a printed edition, with nice fonts and carefully designed layout. There is no problem in displaying foreign characters, and footnotes are placed at the bottom of each page. RTF files are similar to SGML-encoded files in that features such as paragraphs and quotations are marked using plain text codes. However, RTF is solely concerned with appearance, and most of the encoding concerns the font size and type. RTF does not allow for tagging linguistic, syntactic, or semantic features of the text as recommended by the Text Encoding Initiative (TEI) Guidelines, and therefore the kind of research requiring these elements is simply not possible in this format. One advantage to RTF, however, is that most current versions of word processors can import texts encoded in this format. PDF files have the same limitations of encoded in RTF, with the added restriction that they can only be used with Adobe Acrobat.
- In addition, one notices immediately that there are no hypertext links. Neither PDF nor RTF allows for links to either other sections within the text, or to external texts as is common in HTML documents. Both Pusey and Outler note Augustine's biblical quotations and allusions, for instance; the citations to chapter and verse are merely noted in the electronic versions, just as in the printed editions. Of course, hypertext links are not needed to read the text; the printed editions of Outler and Pusey were quite adequate without them. However, it would be of great benefit to be able to link directly to a cited verse and its fuller context. With the texts already scanned and proofed, there is no reason that Plantinga (or someone else with his permission) could not create editions that take advantage of the hypertextual features of HTML.
- The Hypercard formatted file of the Confessions (the Pusey translation) presents an opportunity to achieve the kind of hypertextual links to biblical verse and other commentary imagined above. Unfortunately, these links do not exist in this edition either. Pusey's introduction and notes have been omitted, leaving the Hypercard shell as a kind of page turner. There are advantages to this format over the others under review, in that one may navigate quickly through chapters. The search feature is superior to that of any word processor, for the software shows how many matches are found in a separate window, and allows for quick movement from one match to another. The disadvantage is, of course, that this version only runs on a Macintosh, using Hypercard software.
- Plantinga has also created a World Wide Study Bible that more closely resembles the hypertextual Bible envisioned by DeRose and others. He plans to link various Bible versions with commentaries, sermons, images, and even musical scores, and encourages world-wide cooperation in this effort (hence the name). He has four versions of the Bible, including the King James and RSV, and has linked them to two commentaries, the Concise Matthew Henry Commentary and Aaron's Bible Commentary. He notes in his introduction to this project the copyright restrictions to the NIV Bible, and sagely cautions other participants in the project to abide by these restrictions. He has left room for links to the other types of materials listed above, but currently only the Bible versions and commentaries are operable.
- In a chapter of any of the versions available, one may easily link to the same chapter in the other three versions. Unfortunately, the commentaries are not linked by chapter, but instead by book, so that one must first exit the chapter to find the relevant commentary. The verses are linked to the commentary, so the commentaries are perhaps a better starting place. Also, intratextual allusions and typologies are not linked, but all of these together would probably approach the million links estimated by DeRose.
- One also notices the limitations of the HTML environment as currently realized by viewers such as Netscape, Mosaic or Cello. It is possible to view only one source at a time, so a comparison of different versions of a chapter, or reading a commentary in conjunction with a chapter, is not possible unless one is running multiple simultaneous WWW sessions. This is of course possible with an Ethernet connection, but would be very unwieldy. The kind of true hypertextual study possible with commercial publications such as Bible Windows or CDWord is not possible with current WWW viewers, and is a major impediment to serious textual research. This is of course not a fault of the World Wide Study Bible but of the WWW environment in which it runs.
- The World Wide Study Bible is an ambitious project and has exciting possibilities. One hopes that others contribute to the wide range of links as envisioned by Plantinga, and that he continues to create electronic editions for other works in religious studies. Such a project incorporates the two current strengths of the WWW, namely its ability to link widely dispersed materials, and the opportunity for collaborative projects among widely dispersed contributors.
Afterword
- The issues of file format and copyright inform and shape the electronic collections that are available through the WWW. As with the Christian Classics Ethereal Library, editions available through the WWW are limited to those in the public domain, in order to comply with copyright laws. In this medium, where scholars create and publish electronic editions outside of traditional publishers, the opportunity for misinterpretations and misunderstandings regarding copyright restrictions is very great. Scholars must either trust the editor's interpretation of copyright in regard to a particular text, or else be prepared to double-check.
- In addition, the multiple uses of electronic editions have spawned, in some cases, multiple versions of the same text in various file formats, requiring a level of technical understanding rather burdensome for an editor. Researchers must consider and be conversant with these file formats also, in order to choose the most appropriate version for their purposes. Many of the limitations on the uses of electronic texts arise when their editors create them solely for current software and technologies, particularly proprietary software and technologies, leaving an unclear future as technologies change or software companies upgrade formats. For instance, Adobe Acrobat runs only on a limited number of platforms, making it useless to those with only mainframe or UNIX accounts. As discussed in the previous review, SGML and its various instances, such as the Text Encoding Initiative Guidelines, seem to present the best format for electronic texts because they are then not linked to any particular software or operating system, and have the greatest flexibility for storage, interchange, enhancement and reuse. However, while providing richer possibilities for analysis, TEI-encoded files generally remain more difficult to handle than those files using other formats, and require both the editor and ultimate user to have specialized software and/or skills in order to realize the full research potential of electronic texts. There is some hope that popular word processors will accept TEI-encoded files as another routine format, but this hope has not yet been realized.
Works Cited
[http://asgard.humn.arts.ualberta.ca/emls/EMLS footer.html]
- Augustine. Augustine: Confessions and Enchiridion. Trans. Albert Outler. The Library of Christian Classics, Vol. 7. Philadelphia: Westminster Press, 1955.
- ---. The Confession of S. Augustine. Trans. Rev. E.B. Pusey. Oxford: John Henry Parker, 1840.
- Bible Windows. Ver. 3.0. Cedar Hill, TX: Silver Mountain Software, 1994.
- CDWord. Ver. 1.0. Dallas, TX: CDWord Library, 1989.
- Carroll, Terry. Frequently Asked Questions About Copyright. ver.1.1.2. Columbus, OH: Ohio State University, 1993.
- Delany, Paul and George P. Landow, eds. Hypermedia and Literary Studies. Cambridge, MA: MIT Press, 1991.
- DeRose, Steven. "Biblical Studies and Hypertext." IN Hypermedia and Literary Studies. Landow, George, ed. Baltimore: The Johns Hopkins U P, 1992. p. 186-202.
- Gresham, John. Finding God in Cyberspace: A Guide to Religious Studies Resources on the Internet.Sterling, KS: Sterling College, 1994.
- Henderson, Cathy and David Sutton. Writers and Their Copyright Holders. Austin, TX: The Harry Ransom Humanities Research Center, University of Texas, 1994-.
- The ILTguide to Copyright. New York: Institute for Learning Technologies, Columbia University, 1993-.
- Kraft, Robert, ed. Offline Review. Philadelphia: Center for Computer Analysis of Texts, University of Pennsylvania, 1984-.
- Landow, George. Hypertext: the Convergence of Contemporary Critical Theory and Technology. Baltimore: The Johns Hopkins U P, 1992.
- ---, ed. Hyper/Text/Theory. Baltimore: The Johns Hopkins U P, 1994.
- Liu, Alan, ed. The Voice of the Shuttle: Religious Studies. Santa Barbara, CA: University of California, Santa Barbara, 1995-.
- Sperberg-McQueen, Michael and Lou Burnard, eds. TEI Guidelines for Text Encoding and Interchange (TEI P3). Chicago, Oxford: ACH, ACL, ALLC, 1994.
- Stover, Mark. "Religious Studies and Electronic Information: a Librarian's Perspective." Library Trends, Spring 1992, 40(4) p. 687-703.
- Strangelove, Michael. The Electric Mystic's Guide to the Internet: a Complete Bibliography of Networked Electronic Documents, Online Conferences, Serials, Software and Archives Relevant to Religious Studies. Volume 1 (version 2) and Volume 3 (version 1.3). Ottawa: University of Ottawa, Dept. of Religious Studies, 1992-1993.
- United States Library of Congress, Copyright Office. Catalog of Copyright Entries, Fourth Series. Part 8, Renewals. Washington, D.C. : Copyright Office, U.S. Library of Congress, 1979-.
- TACT: Textual Analysis Computing Tools. Toronto: U of Toronto.
- WordCruncher software. Version 4.50. American Fork, UT: Distributed by Johnston & Company, 1992.
(RGS, rev. 14 February 1998)