"Look What Thy Memory Cannot Contain": The Shakespeare Electronic Text Archive By Kenneth B. Steele [Published in _Shakespeare Bulletin_ 7:5 (September/October 1989): 25-8] Twenty-five years ago, when T.H. Howard-Hill performed his first test concording of Shakespeare's _Measure for Measure_ at the Oxford University Computing Laboratory, literary computing required custom software, mainframe facilities, and thousands of keypunched cards.[1] Even as late as 1973, Howard-Hill quite rightly cautioned that it was still "too difficult, time-consuming, and expensive for all but the most determined scholar to work with the computer."[2] As most academics know, however, in the 1980s the tables have turned: to "all but the most determined scholar," word processing alone has made the computer indispensable. For the more computer-literate humanist, dissertation abstracts and the _MLA Bibliography_ are available on-line, the _Oxford English Dictionary_ has been published on CD-ROM, and the Oxford and Riverside editions of Shakespeare are available commercially in electronic form. Text retrieval software has become increasingly "user-friendly": anyone who can manage WordPerfect or Nota Bene can master WordCruncher or TACT in minutes.[3] Literary computing need no longer reduce volumes of poetry to fanfold pages of numerals and z-scores; current computer software moves us, not further away from the text, but closer to it than ever before, with a level of precision and exhaustiveness seldom practical without prolonged research. Interactive text retrieval is above all else _interactive_: many of its primary benefits are immediately evident on-screen, but difficult to convey in print. It is remarkably convenient, often a mere keystroke away from your wordprocessor, able to perform complex operations instantly, and to insert the resulting quotations, with citations, directly into your work-in-progress. The computer becomes an invaluable tool to accelerate and refine traditional approaches, such as close reading, imagery analysis, or source study. It renders printed concordances virtually obsolete, and removes much of the drudgery from more technical textual or orthographic research, increasing rather than diminishing scholarly creativity. The computer user is not limited to a mere index of references for each keyword, but instead has instantaneous access to the complete context of every occurrence of every word, every partial word (e.g. all words ending in "-ick," all words with the prefix "pro-," all words containing hyphens, etc.), every complete or partial phrase, every punctuation mark, and every textual code. Most software can generate charts or graphs of distributions by play, scene, character, compositor, genre, or even chronological period, as an aid to detecting overall patterns and relating them to the specific occurrences. The software can search an electronic text for co-occurrences of words or phrases, such as x with y, x without y, x and y without z, or x or y with z, and so on, all within a user-specified range (any number of characters, lines, scenes, etc). Furthermore, lists of thematically-related words can be created, such as Petrarchanisms or metatheatrical references, and can then be examined or manipulated with immediate results, enabling a novice to chart Shakespeare's imagery in a fraction of the time it took Caroline Spurgeon.[4] Obviously, it would be virtually impossible for printed concordances to document the infinite number of possible co-occurrences or thematic distributions. Electronic text retrieval can also revolutionize one's conception of the text. Re-reading the works of Shakespeare via WordCruncher or TACT is a new stimulus to the critical imagination: a diachronic corpus suddenly becomes synchronic, multiple plays interpenetrate on a single word, and one is reading vertically, by cross-section across various plays, rather than horizontally, from beginning to end of each. This is, obviously, no way to be _introduced_ to Shakespeare, and is certainly not to be taken as an invitation to "drown [our] book," like Prospero, or deflect attention from theatrical performance, where the plays ultimately come alive. Text retrieval software does, however, turn a linear text into an imaginative "doodle pad," a testing-ground on which casual browsing and random experiment can lead to new and suggestive ideas. Just as computerized spreadsheets have revolutionized accounting, making possible a rapid and flexible "what if" approach to financial variables, text retrieval software supplies immediate answers to critical experiment: literary scholars can now ask questions which were previously unthinkable, or unthought-of, accomplishing in moments analyses which might have been lifetime occupations in previous decades. Electronic _editions_ of Shakespeare are now commercially available for use with such software, but recent scholarship, particularly that by Steven Urkowitz and Gary Taylor,[5] has brought about a new textual awareness throughout the scholarly community. The intriguing distinctions between Q1 and F1 _King Lear_ reveal consistent and deliberate differences in characterization, plot, and poetry, but this play is not a special case: seventeen of Shakespeare's thirty-eight plays survive in two variant forms, and two more, _Romeo and Juliet_ and _Hamlet_, survive in three. Scholars cannot afford to discard texts as "Bad Quartos" simply because they are awkward to collate, nor should inconvenient variants be dismissed as "indifferent"; as E.A.J. Honigmann remarks, the word "can as aptly describe the beholder as the thing observed."[6] Although most scholars remain reluctant to "unedit" Shakespeare entirely,[7] the challenging indeterminacy of the original Quarto and Folio texts certainly rewards investigation. In early 1988, the Centre for Computing in the Humanities at the University of Toronto initiated the Shakespeare Text Archive project, to produce an accessible and convenient textbase of the original Quarto and Folio texts for use on an IBM microcomputer. The Oxford University Computing Service Text Archive generously supplied the Howard-Hill texts of the First Folio and many important Quartos, and CCH computerized a number of so-called "bad" Quartos which had been previously ignored. The encoding in all the files was then standardized for use with WordCruncher software. The Archive is still incomplete, and the process of verification continues, but currently 55 early texts of Shakespeare's 38 plays occupy a mere 10.2 megabytes of hard disk space on an IBM AT (18.8 mb with the WordCruncher indexes). Unlike commercially-available electronic _editions_ of Shakespeare, these texts endeavour to be exact "electronic facsimiles" of the original Quarto and Folio texts. Current technology cannot yet replace photographic facsimiles, such as the Norton facsimile,[8] but unlike more traditional reproductions the text archive can be used in conjunction with any concording, text retrieval, or collation software. These texts are quickly proving themselves a significant new tool for Shakespearean textual scholarship, offering instantaneous access to all of the authoritative texts in minute detail with unfailing precision. The electronic texts encode such typographical details as pagination, signatures, italicization, justification, and turned-over lines, and meticulously retain typographical errors. For convenience of reference, codes have been added to identify act, scene, and line divisions, stage directions, speech prefixes, and title pages. Hypothetical differentiations of compositor stints and homographs are encoded but can be disregarded. The text archive contains an embarrassment of textual riches, and for less technical research an electronic edition may prove both simpler and more productive. For literary rather than textual purposes, the user must account for a bewildering array of duplicate texts, variant spellings, typographical eccentricities, textual cruxes, and apparent compositorial "errors" familiar to anyone who has tried to read Shakespeare in facsimile. The user cannot merely search for all occurrences of the word "aye," but must also consider "ay," "aie," "ay," and "I" (and less obvious misprints). Ideally, the Quarto and Folio texts will be electronically linked to an edition, permitting searches of either textbase in correlation with the other (a process which will ultimately require hypertext technology). The potential of text-retrieval technology can perhaps be best illustrated by a number of brief examples. These sample queries and results are certainly not the most intricate or intriguing possible, but will hopefully be suggestive. First, some more general literary applications, for which an electronic edition might serve equally well. If we were developing a theory about "The Murder of Gonzago" in _Hamlet_, for example, text retrieval would allow immediate investigation of the 603 references to the ear in Shakespeare. Twenty-five of these references seem to carry distinctly sexual undertones (specific evidence which seems more solid than vague Freudian generalizations). In several cases, moreover, pouring poison or infection in the ear is closely associated with some form of male sexual jealousy (see Fig. 1): the ghost of Hamlet Senior reports the "iuyce of cursed Hebona in a viall" which Claudius poured "in the porches of my eares" (_Hamlet_ Q2 1.5.66-7); which is of course echoed in the dumbshow stage direction, in which we are told that the murderer "pours poyson in the sleepers eares" (_Hamlet_ Q2 3.2.120sd). In later plays, Iago vows to "poure this pestilence into [Othello's] eare" (_Othello_ Q1 2.3.331), and Pisanio exclaims of his master Posthumous, "what a strange infection / Is falne into thy eare?" (_Cymbeline_ F1 3.2.3- 4). This recurrent association in Shakespeare's work, initiated, we may note, in _Hamlet_, may help to explain the symbolism of _Hamlet_ as it formed in Shakespeare's mind, and as it reverberated throughout his career. More complex research is made possible by creating lists of related words, such as animal references, Petrarchan images, or Italian proper names. Obviously the approach is somewhat mechanical and limited to explicit references rather than more subtle allusions (unless the investigator is already aware of these subtleties), but it does provide a rapid survey to help direct one's future efforts. For example, a preliminary study of theatrical metaphors (or metatheatricality) across the corpus could begin with a list of theatrical terms (see Fig. 2). My word list matches 3384 occurrences, [9] largely in the tragedies and good Quartos, and concentrated in the Elizabethan plays. The histories and the First Folio seem comparatively lean in metatheatrical references. _Hamlet_ and _A Midsummer Night's Dream_, understandably, have by far the greatest concentration of theatrical language, owing to their internal performances, and their overall thematic concerns. The software will supply the specific quotations which cannot be ignored in formulating a theory, regardless of their location in the canon. Any electronic edition would permit the preceding investigations, and would simplify matters by modernizing orthography and emending textual difficulties, but would obscure the distinctions between the variant texts. For textually- sensitive research into matters of bibliography or revision, the Text Archive becomes indispensable. Modern editions generally insert stage directions as they are deemed necessary; the Quarto and Folio texts contain only those 6276 lines of stage directions which could possibly be authorial. It is immediately striking that no comedy has more than about one hundred lines of directions, while history plays and tragedies especially almost never have fewer than one hundred. Stage directions are heaviest in _Antony and Cleopatra_ F1 (202), _Coriolanus_ F1 (176), _The Two Noble Kinsmen_ Q1 (194), and _Richard III_ F1 (191). Permissive stage directions, once considered evidence of authorial manuscript or "foul papers," can be quickly isolated and analyzed. Shakespeare's infamous negative capability found frequent expression in stage directions which include words such as "others" or "other lords": 96 such stage directions are present in the early texts. Surprisingly, however, these seem to be concentrated in the so-called "bad" Quartos and the Folio, not the "good" Quartos usually called foul paper texts. The overall concentration of "other" characters seems to follow the concentration of stage directions themselves, in the tragedies and histories -- where processions, spectacle, and crowd scenes are prevalent. Permissive stage directions also include numerical ambiguities, such as "one or two" (7 occurrences), "two or three" (22), "three or four" (16), "four or five" (3), "five or six" (4), and even "seven or eight" (1). These stage directions are most prominent in _2 Henry VI_ F1, _Romeo and Juliet_ Q2/F1 (Q1 has none), and _Coriolanus_ F1 (with 4 each), _2 Henry IV_ Q1 (F1 has none), _Hamlet_ Q2/F1 (Q1 has none), _All's Well that Ends Well_ F1, _Antony and Cleopatra_ F1, and _Pericles_ Q1 (with 3 each). This form of permissiveness does indeed seem concentrated in authorial copy, and takes even more interesting forms: "Bullingbrooke or Southwell reades" (_2 Henry 6_ (F1) 1.4:21); "Enter Menenius to the Watch or Guard" (Coriolanus (F1) 5.2:0); and "...Branches of Bayes or Palme in their hands" (Henry 8 (F1) 4.2:90). _Coriolanus_ F1 has more such stage directions, including "or" or "other", than any other play text (12), followed by _Antony and Cleopatra_ F1 (9), _2 Henry VI_ F1 (8), and with six each, _Richard III_ F1 (Q1 has 5), _Romeo and Juliet_ Q2/F1 (Q1 has 2), _Hamlet_ Q2 (F1 has 5, Q1 has 2), and _Timon of Athens_ F1. Verbal evidence such as spelling or hyphenation can also be examined electronically. The fifteen occurrences of the obsolescent plural "eyen" (in its various spellings) reveals that in all cases but one (the questionable text of Q1 _Pericles_), the word is used primarily for purposes of rhyme (and eight times to rhyme with "mine"). Furthermore, it seems clear from Shakespeare's use of the word in _A Midsummer Night's Dream_ and _As You Like It_ that he considered it preposterous, appropriate in circumstances such as Bottom's performance as Pyramus, or Phebe's love poem to Rosalind. The 8561 hyphenated words in the Quarto and Folio texts can immediately by summoned up by the retrieval software, and it can quickly be determined that the concentration is heavier in the Folio (probably because of narrower column widths). _The Merry Wives of Windsor_ F1, in particular, has 532 hyphenations, exactly ten times the number of Q1, and they are concentrated in scenes 1.3-4.2 and 5.5. Strangely, most of the hyphenations seem not to be required by lineation, but by compositorial spelling-habits (although this is a very preliminary observation). Punctuation study is quick and painless on the archive texts. The 138,198 commas can be viewed immediately (although you would have to page through more than 23,000 WordCruncher screens to see them all), as can the 104,928 periods, 26,974 colons, 5820 semi- colons, 15,785 question marks, and 1162 exclamation marks. Distributions can be examined to identify compositorial habits, or more generally to observe that the Folio seems light in periods, but heavy in semi-colons and colons. Such study has already been done manually, of course, but never has so much raw material been so accessible: rather than hunting through decades of printed criticism to discover what Charlton Hinman concluded, one can instantly examine the texts themselves in any way desired. The Shakespeare Text Archive project is currently dependent upon the labour of a single volunteer, and as a result it is still some distance from completion. A number of texts have yet to be obtained from Oxford, and a number of texts will have to be entered manually. Once the Archive contains electronic texts of every significant early edition of the poems and plays, and has been carefully proofread yet again, it will be made available to the scholarly community as a whole through the Oxford University Computing Services Text Archive for a nominal fee. Ultimately, it will also incorporate non-copyright texts of Shakespearean source materials, and either an encoding of important emendations or a parallel edited text. Many of the Text Archive's ultimate aims will be realized only when a more advanced software engine is developed: currently WordCruncher is unable to index non-sequential hierarchies (i.e., it can index Play/Act/Scene/Line but not speaker or compositor stints, which are scattered throughout the texts), and TACT, which would solve this difficulty, is unable to manage the size of the complete textbase.[10] Ideally, software will be developed to perform on-screen collation of the variant texts, automated comparisons of multiple inquiries, and the analysis of repetition itself, as an abstraction: for example, the distribution of rhetorical repetitions such as anaphora, epistrophe, epanalepsis, anadiplosis, and perhaps even alliteration and rhyme. The text files, in standard ASCII format, have weathered twenty-five years of technological revolution, which have seen ten times the computing power of Howard-Hill's first mainframe appear on the average academic desktop, and in all likelihood their usefulness will continue despite the unimagined technological developments to come. The "electronic facsimiles" of the Quarto and Folio texts are a powerful and flexible approach to the textual indeterminacy of Shakespeare's works. Random Cloud has insisted that critics must look "to process in creation rather than to hypostatized artefacts," and "away from the editor's ideal single version -- the so-called 'definitive text' -- to the author's actual multiple versions: an infinitive text,"[11] which he elsewhere defines as "a polymorphous set of all versions."[12] The Shakespeare Text Archive, which may ultimately become an electronic variorum, is perhaps a first step toward realizing the goal of a fluid "infinitive text" in electronic form. ___________________________________________________________ Computer Book: d:\etc\SHAKESPE.BYB Reference List: ear,eare,eare-,eares,ears,ears' ___________________________________________________________ |l65 Vpon my secure houre, thy Vncle stole |l66 With iuyce of cursed Hebona in a viall, |l67 And in the porches of my eares did poure |l68 The leaprous distilment, whose effect |l69 Holds such an enmitie with blood of man, |l70 That swift as quicksiluer it courses through (Hamlet (Q2) 1.5:67) ___________________________________________________________ him: anon come in an} *{#other man, takes off his crowne, kisses it, pours poyson in the sleepers eares}, *{and leaues him: the Queene returnes, finds the King dead, makes passionate} *{action, the poysner with some three or foure come in (Hamlet (Q2) 3.2:120sd) ___________________________________________________________ |l330 And she for him, pleades strongly to the Moore: |l331 I'le poure this pestilence into his eare, |l332 That she repeales him for her bodyes lust; |l333 And by how much she striues to doe him good, |l334 She shall vndoe her credit with the Moore, (Othello (Q1) 2.3:331) ___________________________________________________________ |l3 Oh Master, what a strange infection

|l4 Is falne into thy eare? What false Italian, |l5 (As poysonous tongu'd, as handed) hath preuail'd |l6 On thy too ready hearing? Disloyall? No. (Cymbeline (F1) 3.2:3) ___________________________________________________________ Fig. 1:WordCruncher report showing occurrences of ears in association with poison imagery. Report for: theater,thea-tre,stag'd,perfourme,costume,revells, apeer's,plat-formes,platforme,gloabe,page-ant,part, parte,partes,parts,applau'd,applaud,applaude, applauded,applauding,applause,applauses,audience, auditor,auditorie,auditors Total References in List: 3481 Frequency -- Percentages -- Range Names Count Actual Expect Difference ----------------------------------------------------- First Folio 2098 60% 66% -6% Good Quartos 1014 29% 26% 3% Bad Quartos 266 8% 7% 1% Minor Poems 103 3% 2% 1% Comedies 942 27% 27% 0% Histories 702 20% 27% -7% Tragedies 1221 35% 31% 4% Romances 318 9% 9% 0% Authorial 2326 67% 70% -3% Elizabethan 2502 72% 70% 2% Jacobean 935 27% 30% -3% Prefatory 44 1% 0% 1% Frequency -- Percentages -- Play Count Actual Expect Difference ----------------------------------------------------- Folio 44 1% 0% 1% 1 Henry 6 (F1) 30 1% 2% -1% 2 Henry 6 (F1) 37 1% 2% -1% 3 Henry 6 (F1) 35 1% 2% -1% Richard 3 (Q1) 42 1% 2% -1% Richard 3 (F1) 40 1% 2% -1% Venus&Adonis (Mod) 20 1% 1% 0% Comedy of Errors(F1) 19 1% 1% 0% Sonnets 83 2% 1% 1% Titus Andronicus(Q1) 37 1% 2% -1% Titus Andronicus(F1) 31 1% 2% -1% Taming the Shrew (F1 46 1% 2% -1% Two Gentlemen (F1) 27 1% 1% 0% Love's Labours (Q1) 69 2% 2% 0% Love's Labours (F1) 70 2% 2% 0% King John (F1) 55 2% 2% 0% Richard 2 (Q1) 37 1% 2% -1% Richard 2 (F1) 37 1% 2% -1% Romeo & Juliet (Q1) 41 1% 1% 0% Romeo & Juliet (Q2) 69 2% 2% 0% Romeo & Juliet (F1) 65 2% 2% 0% Midsummer (Q1) 109 3% 1% 2% Midsummer (F1) 107 3% 1% 2% Merchant of Ven (Q1) 79 2% 2% 0% Merchant of Ven (F1) 72 2% 2% 0% 1 Henry 4 (Q1) 51 1% 2% -1% 1 Henry 4 (F1) 51 1% 2% -1% Merry Wives (Q1) 14 0% 1% -1% Merry Wives (F1) 30 1% 2% -1% 2 Henry 4 (Q1) 66 2% 2% 0% 2 Henry 4 (F1) 74 2% 2% 0% Much Ado (Q1) 48 1% 2% -1% Much Ado (F1) 48 1% 2% -1% Henry 5 (F1) 77 2% 2% 0% Julius Caesar (F1) 52 1% 2% -1% As You Like It (F1) 65 2% 2% 0% Hamlet (Q1) 107 3% 1% 2% Hamlet (Q2) 194 6% 2% 4% Hamlet (F1) 168 5% 2% 3% Twelfth Night (F1) 44 1% 2% -1% Troilus & Cress (Q1) 76 2% 2% 0% Troilus & Cress (F1) 80 2% 2% 0% All's Well (F1) 36 1% 2% -1% Measure (F1) 54 2% 2% 0% Othello (Q1) 59 2% 2% 0% Othello (F1) 63 2% 2% 0% King Lear (Q1) 47 1% 2% -1% King Lear (F1) 43 1% 2% -1% Macbeth (F1) 41 1% 1% 0% Antony (F1) 93 3% 2% 1% Coriolanus (F1) 73 2% 2% 0% Timon (F) 38 1% 2% -1% Pericles(Q1) 62 2% 1% 1% Cymbeline (F1) 63 2% 2% 0% Winter's Tale (F1) 79 2% 2% 0% Tempest (F1) 41 1% 1% 0% Henry 8 (F1) 70 2% 2% 0% 2 Noble Kinsmen (Q1) 73 2% 2% 0% ___________________________________________________________ Fig. 2:WordCruncher distribution report for theatrical metaphors across the Shakespeare canon. N O T E S 1. Howard-Hill describes the process in detail in his article, "The Oxford Old Spelling Concordances," _Studies in Bibliography_ 22 (1969): 143-64. The later history of the texts is described in an anonymous note, "Shakespeare and the Computer," _ALLC Bulletin_ 8:1 (1980): 72. 2. T.H. Howard-Hill, "A Common Shakespeare Text File for Computer-Aided Research: A Proposal." _Computer Studies in the Humanities and Verbal Behaviour_ 4:1 (1973): 54. 3. Electronic Text Corporation's WordCruncher has been thoroughly reviewed by John J. Hughes in his article, "WordCruncher: High Powered Text-Retrieval Software," in _Bits and Bytes Review_ 1:3 (February 1987): 1-8. TACT is a new public domain program distributed by the Centre for Computing in the Humanities at the University of Toronto. 4. With apologies to Caroline F.E. Spurgeon, _Shakespeare's Imagery and What It Tells Us_ (Cambridge: Cambridge University Press, 1935). 5. See, in particular, Steven Urkowitz, _Shakespeare's Revision of King Lear_ (Princeton: Princeton University Press, 1980), and Gary Taylor and Michael Warren, eds. _The Division of the Kingdoms: Shakespeare's Two Versions of King Lear_ (Oxford: Clarendon Press, 1983). 6. E.A.J. Honigmann, _The Stability of Shakespeare's Text_ (London: Edward Arnold, 1965) p 167. 7. As proposed by Randall McLeod, "Unediting Shak-speare" _Sub-Stance_ 33 (1982): 26-55. 8. Charlton Hinman, ed. _The Norton Facsimile: The First Folio of Shakespeare_ (New York: Norton, 1968). 9. In this case, my choices were the various spellings of act, appear, comedy, costume, globe, illusion, mask, pageant, part, perform, platform, play, revel, scaffold, scene, show, stage, theater, and tragedy. 10. A "beta-test" version of TACT will shortly be tested with the Shakespeare Text Archive, and may be available by the time of printing. 11. Random Cloud, "The Psychopathology of Everyday Art," 100- 168 in G.R. Hibbard, ed. _The Elizabethan Theatre IX_ (Waterloo: P.D. Meany, 1986). p. 111. 12. Random Cloud, "Commentary: The Marriage of Good and Bad Quartos," _Shakespeare Quarterly_ 33 (1982):421-31, p. 422. ___________________________________________________________________ The contents of this electronic file are copyright (c)1990 Kenneth B. Steele, University of Toronto. Quotation for scholarly (non-commercial) purposes is permitted, but please contact the author ( or ) to verify the material in question and advise him of your intention. Please do NOT distribute.