The Theory and Practice of Lexicons of Early Modern English

Ian Lancashire
University of Toronto

  1. The University of Toronto Press and the University of Toronto Library jointly launched my Lexicons of Early Modern English (LEME) on April 12, 2006. LEME offers both free, public access to and a licensed scholarly site for half a million word-entries in more than 155 lexical works, print and manuscript, written by English speakers from about 1480 to 1702. Growing year by year, it documents, analytically, what those alive in this long period thought that words meant. Because we can only understand a language partially and locally—its words tend to be the product of a place, a date, a social community, and even a person—LEME builds on whatever surviving documents that describe words. LEME does not make its own definitions. In deciding word-meaning, contemporary authors have an authority above those of us who live centuries later. People do not have to be great writers or lexicographers to share this authority. It is just enough that they spoke the language and made observations about the English words they used.

  2. There are three ways to find words in LEME: typing in a query string; and browsing either a comprehensive alphabetical word-list, or a table of editorially normalized headwords from LEME word-entries. LEME word-or-string searches retrieve individual word-entries, sometimes with a link to the EEBO image (which usefully grounds LEME transcriptions in facsimiles of its single documents). LEME can also restrict queries by language, date, title, author, genre, and subject. Each set of results graphs the distribution of the word over two centuries. Or the researcher can approach the collected word-horde through the full LEME alphabetical word-list: it enables a researcher to browse every word-form in the transcribed lexical works and call up each word-entry in which the queried word-form is found. This word-list parallels the Early English Books Online/Text Creation Partnership (EEBO/TCP) master word-list. A third option, intended for researchers who want old-spelling occurrences of words collected under one modern-spelling headword, is to use the modernized, editorially-lemmatized headwords list. A lemma—on which the name LEME quibbles—is an inflectionally-related group of words. For example, the lemma for old spellings like "gives", "giues", and "giueth", and inflectional variants like "giue", "giuing", "gaue", "giues", and "given", is the general infinitive form, "give." Normally, LEME lemmas are editorially standardized on Oxford English Dictionary headwords.

    Revising the Early Modern English Lexicon

  3. LEME indexes bibliographically some 1,200 lexical works from about 1480 to 1702. My major sources for printed texts are Robin Alston's in-progress bibliography, EEBO, and pioneering scholarship by Gabriele Stein and others. The comprehensiveness of LEME's index of lexicons distinguishes it from its sources: it lists all kinds of lexical works, whether bilingual or monolingual, by author, date, title, subject, and genre. It includes manuscript lexical works as well as small glossaries within treatises. The large secondary bibliography that annotates this index—again, a resource not available elsewhere—adds other overlooked primary works to this bibliography. For example, last fall, with the help of a British Library reference librarian I found—and notified EEBO, which now includes it—what seems to be the first Renaissance English bilingual glossary to be printed, the anonymous Floures of Ouid in 1513. This gives balanced English-Latin and Latin-English glossaries for grammar-school students and serves their study of sentences from Ovid's Art of Love.

  4. University of Toronto membership in EEBO/TCP helped me to locate texts that explain Early Modern English terminology. Frederick Furnivall founded the Early English Text Society (EETS) to assist Oxford English Dictionary lexicographers to find words and citations for them in hitherto unedited texts. EEBO/TCP is both the new OED's and LEME's digital EETS. The EEBO/TCP master word-list, when searched for terms like "glossary," "vocabulary," "definition," "dictionary," and "lexicon," identifies otherwise elusive lexical commentaries, tables, and pockets of word-explanations that translators and treatise-writers bundled with their works. EEBO/TCP also gives LEME valuable digital transcriptions for several lexicons, especially Thomas Wilson's Christian Dictionary (1612), Edward Phillip's World of Words (1658), William Lloyd's dictionary of the semantic analyses of words in Bishop Wilkin's Essay (1668), and Elisha Coles' English Dictionary (1676). These added significantly to the total word-entries in LEME. Perhaps there are more helpful projects in research infrastructure for the Early Modern English period than EEBO/TCP, but I do not know what they are.

  5. How, then, does LEME differ from a collection like EEBO/TCP? LEME has proofread texts and emended erroneous readings. Its database implementation allows for selective retrieval of words by language, position (headword, explanation), and lemma, gives a detailed profile of the vocabulary of the period, decade by decade, offers ready access to all period lexical works at once, and represents a start on the yet unrealized period dictionary of Early Modern English (Bailey 1985), the successor to the Middle English Dictionary. A comparison of the original image of Sir Thomas Elyot's Bibliotheca Eliotae (1542), an EEBO/TCP transcription of part of its text, and the LEME parallel transcription will show some of these differences.

    <p><em>Auersus.</em> straunge, vnacquaynted: sometyme backeward,
    or on the backe halfe. also angry <em>Aduersus, &amp;
    auersus,</em> forwarde and backewarde.</p>
    <p><em>Auersa pars,</em> the backesyde of a thynge.</p>
    <p><em>Auersa pecunia publica,</em> the co~mon treasure to a
    particuler aduauntage.</p>
    <p><em>Auersis post crura planus,</em> the feete tourned
  6. EEBO/TCP distinguishes Latin headwords from English translations or equivalents by the use of emphasis tags, but it also removes the distinction between a main word-entry such as "Auersus" and its sub-entries (e.g., "auersa pars"). Lost is the indenting by which Elyot subordinates sub-entries under main entries. LEME adds the encoding of these semantic hierarchies (in which sub-entries fall under main entries) manually.

    <wordentry><form>Auersus.</form> <xpln lexeme="strange(a)" lexeme="unacquainted(a)" lexeme="backward(a)" lexeme="back half, on the(adv)" lexeme="angry(a)" lexeme="forward and backward(adv)" lexeme="backside(n)"  lexeme="treasure, common(n)" lexeme="feet, turned backward(a)">straunge, vnacquaynted : sometyme backeward, or on the backe halfe. also angry
    <term lang="la">Aduersus, &amp; auersus,</term> forwarde and backe&shy;
    <subform>Auersa pars,</subform> <subxpln>the backesyde of a thynge.</subxpln>
    <subform>Auersa pecunia publica,</subform> <subxpln>the c<expan type="+_">om</expan>mon treasure
    to a particuler aduauntage.</subxpln>
    <subform>Auersis post crura planus,</subform> <subxpln>the feete tournedbackewarde.</subxpln></xpln></wordentry>

  7.  LEME also adds lemmatized word-forms for all English equivalents within "lexeme" attributes in the so-called <xpln> (explanation) tag. At present our semi-automatic lemmatization software functions reasonably well for between 84% and 98% of English terms, depending on the lexicon. The remainder of lemmatizations have to be done manually.

  8. This extra encoding is labour-intensive but, in the long run, valuable. Manual grouping of sub-entries under main entries offers a wider context for understanding why, for instance, a Latin phrase for a form of torture (turning the feet around backwards) belongs under "Auersus": one of the translations of the headword is "angry." Lemmatization has other advantages. It enables users to retrieve all forms of the same word, whether different spellings (like "backeward" and "backewarde") or different inflections of verbs (like "tourned" and, say, "tourne"), at the same time. As important, once all orthographic forms of a word can be associated with its OED headword, LEME can measure, over this period, when words first were recognized as English. In this way we can determine the rate at the national lexicon expanded.

    Supplementing the Oxford English Dictionary

  9.  The OED takes 25,324 quotations from the works of Shakespeare, whose authoritative concordance has just over 29,000 headwords (which include plenty of duplicate entries, because alternate spellings, inflectional variants, and emended word-forms are given separate headwords). It would probably be fair to say that OED quotes almost every word-type that Shakespeare used. Jürgen Schäfer (1980) shows how this over-represents Shakespeare's lexical inventiveness. Compare the above numbers with the OED coverage of Renaissance lexicons that serve English. The OED takes 17,624 quotations from fifty-two sizable British dictionaries found in LEME, most of them printed, from 1499 to 1623 (see Table 1), but these lexical works have a total of 412,847 word-entries, the majority of which illustrate more than one English word. The OED draws citations from only 4.3 percent of word-entries in these lexical works.[1]

  10. LEME significantly supplements the OED, documenting new words and senses, antedating first-usage information, and delineating Latin, French, Italian, and Spanish words that were thought to correspond to English words. The sheer scale of the data that a LEME query retrieves shows minute changes in language usage over time. For example, LEME can sometimes tell us when a word drops out of favour, and when a new loan-word fails to establish itself. Although OED remains an unrivaled authority for etymology, inflectional history, and language usage by non-lexicographer authors—it truly describes "the meaning of everything"—in matters Early Modern its selection of quotations is biased towards one playwright, and against harder-to-locate works that express how the period documented its own tongue. The OED also occasionally observes a theory of language that reflects late Victorian and early twentieth-century thinking—for instance, that words signify mental ideas—rather than Early Modern beliefs, which take nouns as names for things.

  11. Fifteen years ago, when developing the Early Modern English Dictionaries Database (EMEDD) that preceded LEME, I used these lexical works to re-annotate passages from Shakespeare in the hope of showing how revisionist the use of contemporary lexical sources was in scholarly editing and close reading. My finding that Shakespeare's first villain, Aron in Titus Andronicus, took his name from a common English weed, priest's pintle, aron, wake robin, or ramp, rather than from Moses' brother Aaron, was persuasive for some (1997). No matter the Early Modern text, however, using LEME brings out surprising senses and nuances (Lancashire 1993, 2003).

  12.  Let me illustrate this again with two words, "dodkin" and "personate", from a sentence chosen entirely at random from the EEBO/TCP transcript of Thomas Nashe's Strange News (1592):

    Trust mee not for a dodkin, if there bee not all the Doctourship hee hath, yet will the insolent inke-horne worme write himselfe Right worshipfull of the Lawes, and personate this man and that man, calling him my good friend Maister Doctour at euery word. (d4r)

  13. A LEME search for "dodkin" delivers nineteen word-entries from 1570 to 1676 devised by ten glossographers from Peter Levins (1570) to Elisha Coles (1676). They give somewhat different information to what appears in the OED entry for "dodkin", which has two senses, the small-value Dutch coin named a doit, and a bud or pistil. John Cowell (1607), Thomas Blount (who copies him; 1656), and Edward Phillips (who copies Blount; 1658) all support the OED reference to a coin, associating it with the French shilling. In two earlier word-entries, however, John Baret (1574) has a different gloss: first, "a Dandeprat: or dodkin" as "Hilum, li, Teruncius, tij m.g. A knaue scante worth a dandeprat. Trioboli, vel triobolaris homo. Plaut"; and, secondly, "a Dodkin" as "of small value: a thing of naught. Hilum, li, n.g. Cic. Teruntius, tij, m.g. Cic. Vng quadrin, vn liard. He has not sent a Dodkin, or one farding. Ne teruntium quidem misit Eras." Randle Cotgrave (1611) echoes Baret's gloss in illustrating the French word "Zest": "Il ne vaut pas vn Zest. He is not worth a dodkin, straw, rush, a pinnes head, the taking vp." Nashe thus uses the senses that Baret and Cotgrave document rather than ones found in the OED.
  14. The LEME modern headwords word-list gives the word "personate" in six lexicons, five hard-word tables by Robert Cawdrey (1604), John Bullokar (1616), Henry Cockeram (1623), Thomas Blount (1623), and Edward Phillips (1568), and a general dictionary of the entire language by John Kersey (1702). Their dominant sense is "to counterfeit, represent, resemble, or act" another person, but Blount and Phillips add another meaning, "to sound out … or make a great noise." Again, the OED and these lexicons have somewhat different senses. Both document the main sense, "to represent," but LEME also has evidence for a minor one, "sound out."

  15. This little exercise shows that Early English vocabulary splits into two, mother-tongue words on the one hand, and loan-words or terms of art, on the other hand. The word "dodkin" appears in bilingual lexicons (Baret, Cotgrave) as an insult, a sign of belonging to the mother tongue, as well as in hard-word tables (Blount, Phillips, and Coles), a sign of its foreign extraction from Dutch. In contrast, "personate" occurs uniformly in hard-word lexicons: it is a neologism, a loan-word from Latin. LEME results also supplement the OED with novel information about senses. Only Blount documents a  sense, "sound out" (for "personate"). Phillips plagiarized this from him, and then the sense exited the language.

    Revising Language History

  16. A creature of postmodernism, LEME affects our understanding of the history of the English language. Using it introduces interpretive doubts by giving readers so much information that their assured certainties about what even common words meant fall away. Yet it is also faithful to its period, when language grew without controls, giving linguistic opportunities to the adventurous, confusing readers and listeners who were lobbied to learn languages like Latin, French, and Italian and who had few resources to learn or maintain their own mother tongue, English. Until 1623, the English language had no general dictionary of English hard words, and until much later, with the work of John Kersey (1706), no comprehensive lexicon of both hard words and the mother tongue. LEME supplies what would have been useful in Shakespeare's London: a guide to how the mother tongue squared itself against a polyglot insurgence from within.

  17. R. F. Jones believed that the late sixteenth century marked "the triumph of English," but if there was any victory, it took place much later. It is well known how uneven was the teaching of English in the early Tudor period. People educated their male children to speak that great Renaissance interlingua, Latin, in grammar schools like Magdalen College, Oxford, that refused students permission to use their mother tongue: Latin was the rule for them at all times. One student says, in a late fifteenth-century collection of passages for Latin translation: "Iff I hade not usede my englysh tongue so greatly, the which the maistre hath rebukede me ofte tymes, I shulde have ben fare more lighter (or, conyng) in grammer. wis men saye that nothyng may be more profitable to them that lurns grammer than to speke latyn" (Nelson 1956: 22). To foreigners like Erasmus, Tudor English sounded like dogs barking, so many were its monosyllabic Saxon words (Giese 1937). If Erasmus did not bother to learn English when he lived in the country for some years, what could be expected of foreigners? John Florio's first dialogues for the teaching of Italian, in 1578, confirmed that little had changed since Erasmus' time. Visiting foreigners thought English "woorth nothing" beyond Dover:
    it is a language confused, bepeesed with many tongues: it taketh many words of the latine, & mo from the French, & mo from the Italian, & many mo from the Duitch, some also from the Greeke, & from the Britaine, so that if euery language had his owne wordes againe, there woulde but a fewe remaine for English men, and yet euery day they adde. (n2v)
    To judge from the anonymous English-Latin Promptorium Parvulorum (1499), John Palsgrave's English-French Lesclarcissement (1530; two large bilingual works), and William Tyndale's popular translation of the New Testament (1525), which he wrote with "a boye that dryueth the plough" in mind, it had only about 20,000 words in circulation in the reigns of the early Tudors.

  18. As John Florio notes, Early Modern English lexicons tell us that native speakers in Renaissance England believed that their vocabulary extended along a lexical spectrum from common, spoken words (the mother tongue) to technical trade words (terms of art) and words borrowed from other languages (so-called hard words). Over two hundred years, as new knowledge and technologies introduced concepts for which no ready vernacular term existed, English became engorged with new words (McConchie 1997; Nevalainen 1999; McDermott 2002), especially from Latin and French, even as basic everyday English, until the late sixteenth century, appears to have been untaught except for its spelling. By about 1525, publishers had introduced the first stand-alone encyclopedic lexicons to explain professional vocabularies (law and herbs) and the first in-text hard-word glossaries (Hüllen 1999). These explicated neologisms that were usually coined, on-the-spot, by translators who either could not find native equivalents or wanted their writing to sound authoritative and impressive. Jürgen Schäfer first showed (1989: I, 8) how, after three-quarters of a century, the quantity of hard words introduced by individual translators increased to the point that publishers could sell stand-alone lexicons of hard words for all subjects.

  19. These hard-word lexicons grew in size from about 1,700 words in Edmund Coote's glossary (1596) to 28,000 words in late-seventeenth-century "English" dictionaries by Thomas Blount, Edward Phillips, and Elisha Coles. Robert Cawdrey, author of the first stand-alone English hard-word lexicon, Table Alphabeticall (1604), relied heavily on Thomas Thomas's Latin-English dictionary (1587) but only glossed "Hard Vsual English Words", a limitation that excluded objectionable inkhorn terms. Other larger hard-word dictionaries followed: John Bullokar (1616), Henry Cockeram (1623), Thomas Blount (1656), Edward Phillips (1658), and Elisha Coles (1676). By the 1640s, even a man as well-educated as Sir Thomas Blount confessed himself "gravell'd" in his reading, that is, stuck when he came upon a novel English word he did not know and could not interpret. Anyone who has suffered from kidney stones will know how Blount felt. The Renaissance transformed English gradually into a multilingual Babel, the ancient language of monosyllabic and disyllabic words taught by mothers, and a new hard-word, term-of-art, often polysyllabic vocabulary, sufficiently omnipresent to remind one of a third linguistic invasion, not by Saxons after the sixth century or by Normans after 1066 but by publishers and their hungry, neologizing and translating authors. By the early eighteenth century, Nathan Bailey gradually dominated the English dictionary market by collecting more hard words than anyone else, up to 60,000. However, it was Samuel Johnson who secured Jones' "triumph of English" when he met Robert Dodsley's challenge to produce the first great dictionary of core English in mid-century in 1755. His work offered a modest 40,000 word-entries.

  20. LEME shows how those living in the Renaissance regarded English and its two tongues, basic and hard-word, as maturing, intertwined quasi-polyglot languages. Lexicon entries for words such as "dictionary" reveal how self-conscious were lexicographers in this. William Mulcaster (1582) says that hard words earned their place in mother English by being enfranchised, that is, given citizenship. He recognizes two word-streams, one adopted by its native land, and the other countryless and unauthorized. Randle Cotgrave in his French-English dictionary (1611) iterates this in translating the French word "Espave" as "Maisterlesse; without author, or owner; also, forreine, farre-borne; of vnknowne birth, or beginning. Mots espaves. Strange, new-forged, vnaccustomed, words." John Florio lacked even the concept of a "monolingual dictionary" in 1598, titling his Italian-English lexicon a "world of words." John Kersey's surprising word entry in 1702, "A Glossary, or dictionary, explaining divers languages" (my italics; 1702), confirms Mulcaster and Cotgrave. Renaissance dictionaries in England served two or more tongues.
  21. Paradoxically, LEME shows that Early Modern English-first bilingual dictionaries, glossaries, and vulgaria are, outside Bible translations (which offer only small glossaries, mostly of proper names), the best guides to the mother tongue. Bilingual lexicons had to employ common English words as equivalents if they were to make foreign terminology understandable. Those with a native-to-foreign directionality, that is, those with English headwords, and foreign-language equivalents in the post-lemmatic position, served to translate their common words into other languages. When no English term was available, the lexicographer would use a phrasal headword. John Withals' popular English-Latin dictionary (1556) shows that English did not yet possess the words "posthumous" and "abortion": he had to use the roundabout sentences, "He that is borne after that his father is deade" and "childe borne afore his time", to translate the Latin equivalents. English-first bilingual lexicons by such as Withals slowed the inroads made by loan-words, as did hard-word and terms-of-art lexicographers. A mid-century spate of English-Latin dictionaries by Richard Huloet (1552), Withals, Peter Levins (1570), John Baret (1574), and John Rider (1589) put English first. They were followed, however, by huge bilingual dictionaries with a foreign-to-English directionality, that is, those with foreign-language headwords and English equivalents in the post-lemmatic position, and they promoted neologizing. Great, often reprinted bilingual lexicons—Sir Thomas Elyot (1538-48), Thomas Cooper (1565-84), and Thomas Thomas (Latin, 1538-87), Richard Percival and John Minsheu (Spanish, 1591, 1599, 1617), John Florio (Italian, 1598, 1611), and Randle Cotgrave (French, 1611)—conferred authority on the headword's foreign language, not on English, a directionality that accelerated borrowing. 

  22. These varied lexicons balanced two needs, the expansion of English into a world language, and its standardization as a native language. Lexicographical self-regulation simultaneously accelerated and braked the growth of English. A LEME query can determine into which language, the mother tongue or the new hard-word vocabulary, a term falls by searching for it separately in multilingual texts first, and second in monolingual English texts. If a term occurs most frequently in multilingual lexicons, it very likely belongs to the mother tongue, no matter its etymology.


  23. LEME first came about because I realized that the directionality of bilingual lexicons that placed their all-but-inaccessible (to a facsimile reader) English equivalents in post-lemmatic explanations could be reversed once these lexicons were digitized. A lexicon alphabetized by Latin headwords could be read as if it was alphabetized by English headwords. Once all lexicons of whatever directionality had been digitized and conflated, it seemed to me, we would have a very large virtual Early Modern English dictionary, one written entirely by lexicographers of the period. This mega-lexicon would even look like a historical period dictionary insofar as it consisted of quotations from works of the time. Because only four percent of these citations are found in the OED, we stand to gain a new understanding of much Early Modern English vocabulary if this mega-dictionary is made accessible to everyone online. Gradually, in collecting lexicons for data-entry, I saw that they suggested a stranger-than-expected contemporary understanding of Early Modern English (cf. Anderson 1996). Early Modern English lexicographers interpreted words as signs more for things (Lancashire 2003) than for ideas, and the boundary between a lexicon and an encyclopedia broke down. Second, the different distributions of vocabulary into bilingual and monolingual hard-word lexicons suggested that English had divided into two tongues. Third, the high demand for bilingual dictionaries, and the low demand for English-only lexicons, hinted that English had become, in effect, polyglot.

  24. James Howell, in his Tetraglotton (1660), asserted the emergence of English as a European tongue by producing a polyglot lexicon of four languages, English, French, Italian, and Spanish. His frontispiece engraving, "Associatio Linguarum" (prominently displayed on the main LEME page), emblematizes the inextricability of English and its sister languages. Three of the four mother tongues take the arm of English in a picture that supports Howell's main contentions: that English is a major European language and  that it owes many of its words to these three sisters and their mother, Latin. This frontispiece symbolizes the findings of LEME.

  25. How are we to disseminate this lexical information to researchers whose use of dictionaries is so infrequent? Convenience is an important factor: anyone, anywhere, whether or not they have licensed the full version of LEME, should be able to cite and call up any of its word-entries. For that reason, LEME gives every analyzed word-entry its own URL. An online scholarly edition of an Early Modern English work can thus freely cite, in its annotations, any analyzed LEME word-entry. And any reader can freely call up and verify that entry. Early in the year before I first gave a paper on the EMEDD—at the third joint conference of the ACH and ALLC at Tempe, Arizona, in March 1991—Tim Berners-Lee invented the Uniform Resource Locator. It was thus the most natural thing in the online world, by 2006, to give each LEME word-entry its unique URL.

[1] The online second edition of the OED was searched March 18-19, 2007.


