Stephen N. Matsuba Department of English Stong College York University 4700 Keele Street North York, Ontario CANADA M3J 1P3 (416) 736-5166 MATSUBA@WRITER.YORKU.CA (Bitnet) The "Cunning Pattern of Excelling Nature": Literary Computing and Shakespeare's Sonnets **NOTE: The original version of this paper included a number of diagraphs illustrating some of its material. Because of the format used to storing these papers, I am not able to include them here. I have, however, kept the references to them in this version. Anyone interested in obtaining a copy of the diagraphs and Table 1 can request them from me at the above address. --SNM In his summary of the proceedings of the Literary Data Processing Conference held in 1964, S. M. Parrish declares: The thing we may not understand, though we ought to soon enough, is that in a revolution of this sort there is no holding back, and no turning back. The movement of events becomes compelling--inevitable. The success- ful completion of a computer concordance makes the making of concordances by hand old-fashioned, expensive . . . and obsolete. The making of dictionaries or of large bibliographies by hand will soon enough in the same way become obsolete. And not only the making but the using of them is involved. As Professor [Alan] Markman has observed, when all the libraries or at least all pertinent bibliographical references are readily available on tape or in core memory, there will be no excuse for ignorance. But the real force of the revolution has not even yet been intimated. More ominous, some of us may think, but surely just as inevitable, the perfection of attribution study or source study or influence study by computer techniques will make obsolete the studies that rely on the judg- ment and the memory of one poor fallible human scholar. (5) Parrish's prediction has not only become true, but has been sur- passed. The application of the computer in literary research has gone beyond the types of studies that he outlines into areas involving critical inquiry and theory. Computer-generated studies of Shakespeare's work are not new. In 1973, Dolores M. Burton published a study of grammatical style in *Richard II* and *Anthony and Cleopatra*, and Walter A. Sedelow edited a series of papers on the application of the com- puter in Shakespeare Studies in *Computer Studies in the Humanities*. Stanley Wells and Gary Taylor's *William Shakespeare: A Textual Companion* (1987) provide tables of stylometric data to support claims about authorship, Shakespeare's style, and the chronology of the plays. The son- nets have also been the subject of this kind of research. M. G. Tarlinskaja and L. K. Coachman pursued a statistically-based study correlating text and theme in seven sonnets by Shakespeare. Employing an algorithm to set an "objective" parsing of semantic elements, they identified "thematically relevant semantic com- ponents of the content" (339) while outlining a method of compar- ing texts containing similar themes. The computer-assisted study of Shakespeare's sonnets outlined in this paper was first con- ceived in 1988 as a test of the application of DiscAn, and the preliminary results were presented with Ira Nadel at the Dynamic Text conference held in Toronto in 1989. DiscAn is an IBM-PC compatible version of a mainframe com- puter package for content and discourse analysis designed by Pierre Maranda, Professor of Anthropology at Laval University. The program can process a single machine-readable text or an entire canon, with the only limitation being the space available on one's disk drive. Originally designed to assist in the analy- sis of myths and folktales, following the work of Propp and Levi- Strauss, it has two main components. The content analysis sec- tion includes word-frequency generators, contingency searchers, and programs that assist the user in creating a library of codes and tagging a database. These codes, or tags, can be whatever elements interest the user: rhetorical devices, sound patterns, imagery, et cetera. Once the text has been processed, it can be run through the frequency generators to determine the paradig- matic weights of each tag. The discourse analysis section calcu- lates the probabilities of incidence linking the various tags to each other using Markovian analysis. The output can then be con- verted into diagraphs, thereby providing a visual presentation, or map, of not only the patterns of co-relatives within the corpus being analysed, but also the strength of the connection between them. Our study was limited to a rudimentary stylistic analysis of 15 sonnets chosen at random: 4, 26, 34, 57, 68, 82, 116, 122, 137, and 149. Shakespeare's sonnets provided the ideal test: they are known by most people, but are reasonably unencumbered by a vast amount of criticism. Moreover, they allowed us to look at a relatively short body of work with diverse themes and struc- tures written by a single author. The tags we designed followed, to some degree, the syntactic units described by John Porter Houston in *Shakespearean Sentences* (see table 1). After processing the tags, we used DiscAn's content analysis component to determine their frequencies. The tags with the highest occur- rence were transitive verbs and conjunctions (each making up 8.18 percent of the total). Pronouns involving the speaker as the subject and the direct object made up 2.97 percent and 0.19 per- cent respectively. Pronouns involving the addressee as the sub- ject occurred only 0.37 percent of the time. The discourse analysis output allowed us to examine common patterns within Shakespeare's syntax (figure 1). DiscAn lists each tag in alphabetical order, indicates the tags that precede and follow it, and notes the probabilities of moving from one tag to another. As well, measures of each tag's frequency and dynamics are given at the end of the output. The most interest- ing patterns that emerged involved pronouns denoting the speaker and the addressee (PV and PW). The strength of the connections between auxiliary verbs (VA) and these pronouns is the same (24.14 percent). This pattern is the only one in which the lat- ter appear. Following the diagraph, the most likely syntactic order is: subject (SU), followed by a *wq*-question word (WQ), then an auxiliary verb, a pronoun involving either the speaker or the addressee as subjects, and finally a transitive verb. We discovered that this syntactic pattern denoted a stylistic pat- tern in Shakespeare's sonnets in which a noun (or noun phrase) is placed at the beginning of the clause to act as a description of either the speaker or the addressee, as is the case in Sonnet 4: "Unthrifty loveliness, why dost thou spend/ Upon thyself thy beauty's legacy?" When we compared this result to an analysis of five sonnets from John Donne's *Divine Poems* (see figure 3), we noted a sig- nificant difference between the two corpora. The reflexive ele- ment that appears in Shakespeare's sonnets do not appear in Donne's. In fact, *wq*-question words play no significant role in the latter. And while the Shakespeare-corpus showed a sig- nificant co-relation between the verb *to be* and negation (NG), the Donne-corpus was much more prone to one between *to be* and adjectives (AJ). We also examined the effect that eliminating specific syntactical tags would have on the structure of the sonnets (see figure 2). Removing conjunctions from Shakespeare's sonnets affected only two tags: intransitive verbs (VI) and coordinate nominal *that*-clauses (CT). Conjunctions, therefore, have a primarily coordinating role in this corpus. But in Donne's son- nets, a large number of tags were affected when conjunctions were removed, indicating a subordinating role. The larger conclusions drawn from the original study focussed on the application of DiscAn in traditional methods of literary study. Professor Nadel and I noted the program's value in traditional literary studies, and felt that it could be used as a supplement to the critical positions outlined by critics like Houston to verify, or refute, their hypotheses and conclu- sions. Is there proof, as he asserts, that Shakespeare's sentence development advances, and then remains stable for the rest of his career? Can one, as Joel Fineman does, assert a linguistic disjunction in the treatment of imagery in Shakespeare's sonnets which, in turn, creates a new poetics of subjectivity? DiscAn can easily generate evidence that could support or refute such claims. The program could also be used in matters concerning dis- puted authorship. Rather than relying on the paradigmatic weights within a corpus, DiscAn allows the further step of using syntagmatic dynamics. Profiles of the patterns of language and style in the known plays could be statistically compared to the same type of profile in, say, *The Two Noble Kinsmen*. Sig- nificance tests such as Chi2 or the Mann-Whitney test, as well as factor and cluster analyses, would point to whether variations between the two corpora indicated a real difference, and allow the critic to determine which passages were written by Shakespeare and which by Fletcher with a greater degree of surety. The present study includes all 154 of Shakespeare's sonnets, and incorporates the tagging of not only syntax, but also semantic elements and imagery. In this kind of study, a more thorough process of conceptualization is necessary in which a "coding manual"--consisting of a structured list of null words, a structured list of tags, and the operational principles to define both these lists--will emerge. This "filter" rests on reduction formulas that the analyst must make explicit as well as ensure replicability. I am currently experimenting with a more complex set of syntactic tags based on the three-digit York Syntactic Code developed by Robert Cluett (see *Prose Style and Critical Reading*, 20-21), and developing a set denoting meaning. My semantic tagging system focusses on the relation of meaning to the speaker and addressee in the poem. Thus I would mark the first three lines of Sonnet 142: Love is my sin, and thy dear virtue hate, Hate of my sin, grounded on sinful loving. O but with mine compare thou thine own state, as follows: love/be/speaker/transgression/addressee/purity/hate hate/speaker/transgression/build/transgression/love speaker/compare/addressee/addressee/addressee/condition This set of tags is still in the developmental stage. Note that some words, like conjunctions, were not included in this group of codes. However, I now feel that they should not only be left in this coding scheme, but that there should also be a differentia- tion made between coordinating and subordinating conjunctions. A test run of the Markovian analyser on these semantic tags revealed some interesting clusters. For example, the tag indicating "culpability" is used only in relation to the tag indicating "hate". Moreover in those words associated with "hate", one finds a higher incidence of clustering involving the speaker than is the case with the addressee. I plan to analyse individual sonnets to see if some have patterns that are statistically close enough to say that they can be grouped together. I also plan to compare the corpus as a whole to the sonnets of other writers, and to those that appear in Shakespeare's plays (most notably *Romeo and Juliet*). Some have questionned the validity of statistical analysis in literary study. Houston criticizes its use in stylistic studies: Some statistics are almost inevitable in a stylistic study. I do not consider them admirable in themselves; nor do I like tables of them, since it is easy to miss the really important figure buried among the trivia. It is quite possible, despite my rechecking, that some of the figures I give here and there are not absolutely accurate. However, I do not regard the difference between, say, nineteen and twenty-two occurrences of a stylistic device to be significant, and I base no argu- ment on such slight variations. (ix) Houston supports his declarative statements by a traditional method: with referential evidence. But his claim that no argu- ment will be based on "slight variations" may actually bury "the really important figure." A thorough study of the stylistic pat- terns within a body of work cannot present *all* the relevant passages that support a claim. Something must be left out. However, statistical studies condense the same material into a form that can reveal both large patterns and minute variants within a text or an entire canon. In this way, one can more effectively support claims involving large-scale studies involv- ing areas like character-types, the literature of specific peri- ods, and even entire genres. And the computer is the most logi- cal tool for this kind of inquiry. But when the computer is used in analyses involving meaning and connotation, the researcher using this tool must consider matters involving critical theory. Creating an encoded text invites a deconstructive critique. The need to produce replicable results raises issues concerning reader response and intentionality. In fact as computer-assisted research becomes more complex, the raising of theoretical questions becomes inevitable. So far, many computer applications in literary study have not dealt with the implications of intertextuality, semi- otics, or deconstruction. But the ignoring of these issues will become more and more problematic as computer-aided studies advance. By way of example, I point to my projected doctoral thesis. The larger study of Shakespeare's sonnets is a part of an inquiry into the development of expert systems for literary study. My interest focusses on the nature of allusions, particularly those identified in Shakespeare's works by various critics. Despite their importance in literary criticism, and in Shakespearian criticism in particular, little in the way of a detailed, systematic study has been done to determine how allusions work, or even to define what they are. Carmela Perri notes that "allu- sion remains a notion inadequately defined as `indirect or tacit reference', and is used with no further agreement concerning its characteristics and theoretical status" (289). My thesis will examine allusions in three main stages, each involving a computer-analysis of the texts. The first will examine the positioning of Shakespeare's allusions to determine if they are clustered together, taking into account different character types, acts and scenes, and genres. In the second stage, I will analyse the syntactic, semantic, and connotative patterns of Shakespeare's allusions. These results will be com- pared to critical material on the subject, and, based on this analysis, I will design a computer program that searches for allusions in different texts. The third section of my thesis will compare the allusions identified by the computer to those identified by different scholars. And while I am limiting my main research to Shakespeare's works, I hope to draw some conclu- sions about a more general theory of allusion. But in defining allusions and their structure, one runs into the question of whether or not a single system exists. Reader response, intertexuality, and all the theoretical implications they entail further complicate matters. For example, can one measure how a Victorian critic identifies an allusion? Will a single parser be able to determine the relation between sig- nifiers and signifieds over different periods and genres, or must a separate one be created for each? I will have to deal with all of these issues in my dissertation, and other researchers will have to do the same if they wish to pursue similar work. But computer-assisted research will not stop here. Fuzzy logic and chaos theory presents interesting connections between discourse analysis, theories of the brain, and artificial intelligence. Parallel processing brings the computer closer to the structure of the human brain, and the development of optical computer processors may eventually eliminate existing limitations on processing times. There will be a day, not very long in the future, when the computer will be as much a part of literary study as the book. This is not to say that the student or scholar will be able to rely on the computer for critical output. Parrish noted that the computer will not replace us as critics, but will make us better ones (7-8). The computer can only act as a tool, albeit an extremely powerful one, that manipulates and condenses masses of data for us. The coding of texts requires an understanding of those texts, and the determination of what may or may not be sig- nificant will always our own. It is as Shakespeare's supposed double, Christopher Marlowe, observed: If all the pens that ever poets held Had fed the feeling of their master's thoughts And every sweetness that inspired their hearts, Their minds and muses on admired themes; If all the heavenly quintessence they still From their immortal flowers of poesy, Wherein as in a mirror we perceive The highest reaches of human wit-- If these had made one poem's period And all combined in beauty's worthiness, Yet should there hover in their restless heads One thought, one grace, one wonder at the least, Which into words no virtue can digest. (*Tamburlaine* 5.1.163-73) Works Cited Bessinger, Jess B., Jr., Stephen M. Parrish, and Harry F. Arader, eds. *Literary Data Processing Conference Proceedings*. New York: MLA, 1964. Burton, Dolores M. *Shakespeare's Grammatical Style: A Computer- Assisted Analysis of* Richard II *and* Anthony and Cleopatra. Austin: U of Texas P, 1973. Cluett, Robert. *Prose Style and Critical Reading*. Pref. John Stedmond. New York: Teachers College, 1976. Fineman, Joel. *Shakespeare's Perjured Eye: The Invention of Subjectivity in the Sonnets*. Berkeley: U of California P, 1986. Houston, John Porter. *Shakespearean Sentences: A Study in Style and Syntax*. Baton Rouge: Louisiana State UP, 1988. Nadel, Ira B., and Stephen N. Matsuba. "Literary Applications of DISCAN: A Content and Discourse Analysis Program." Literary Computing sess. 1. 9th International Conference on Com- puters and the Humanities and 16th International Association for Literary and Linguistic Computing--"The Dynamic Text," Toronto, 6 Jun. 1989. Perri, Carmela. "On Alluding." *Poetics* 7 (1978): 289-307. Tarlinskaja, M. G., and L. K. Coachman. "Text-Theme-Text: Semantic Correlation Between Thematically Linked Poems (seven Sonnets by Shakespeare)." *Language and Style* 19.4 (Fall 1986): 338-67. Wells, Stanley, and Gary Taylor. *William Shakespeare: A Textual Companion*. Oxford: Clarendon, 1987.