“Corpus Electronicum Cano”:Some Implications of Very Large Electronic Emblem Corpora

David Graham
Concordia University, Montreal

David Graham, “’Corpus Electronicum Cano’: Some Implications of Very Large Electronic Emblem Corpora,” Emblem Digitization: Conducting Digital Research with Renaissance Texts and Images, ed. Mara R. Wade. Early Modern Literary Studies Special Issue 20 (2012): 2. <URL: http://purl.oclc.org/emls/si-20/WADE_Graham_EMLS.htm>.


1.      The title of this paper borrows simultaneously and unapologetically from the opening lines of Virgil’s Aeneid and Whitman’s “I Sing the Body Electric.” One can only hope that readers will not scorn such deliberately anachronistic mixing of the ancient and modern, of the epic and lyrical, of the heroic and the intensely personal, of the old world and the new. The paper excavates and builds on what now seems like some truly ancient work to take stock of current trends in “digital emblematica” and to cast a look forward; it concludes with an exhortation to think in bold terms of digitizing an emblem corpus far larger than any yet contemplated.

2.      In 1999, at the Fifth International Emblem Conference held in Munich, I presented a paper that, like this one, attempted a kind of conspectual overview of the field of digital emblem studies. That paper was written with the object of working it up into a longer study, but I continued to find in the months and years that followed, as I had found before, that the field was evolving so rapidly that whatever I might write would necessarily be rendered obsolete almost before it found its way into print. Undeterred by these obstacles, Peter Daly has of course now provided the kind of overview I had once thought to write. His study, though enriched by his characteristic flair and encyclopedic knowledge of the field (Daly 2002), is necessarily flawed by the rapid advances of the many digital emblematic projects currently underway and by the highly fluid evolution of digital technology, so that it must be said many of the assessments it contains have already been rendered invalid or moot.[1]

3.      Proof of this assertion is rendered by doing a quick count of the number of early modern emblem books already digitized or in the process of being so. The University of Glasgow site now holds 27 volumes of French emblems and 22 editions of Alciato, with seven volumes of Italian material also made available thanks to the recent efforts of Donato Mansueto.[2] The Emblem Project Utrecht site lists 28 volumes of mainly Dutch emblems as well as a special section on Hugo’s Pia Desideria, while the early Mnemosyne project, which at one point placed twelve volumes online, has now been incorporated into the commercial venture Arkyves, which offers very considerable digital resources. The most ambitious effort to date in terms of the sheer number of books digitized is the “Emblematica Online” project jointly undertaken by the Herzog August Bibliothek, Wolfenbüttel, and the University of Illinois at Urbana-Champaign, home of the OpenEmblem Portal.[3] By the end of 2012, this project should provide access to a total of some 700 digitized emblem books. In contrast to those housed at Glasgow and Utrecht, these editions do not include full-text access so far, but in addition to high-quality page images, the mottos are all transcribed and the editions fully indexed using Iconclass. Other projects, including major efforts hosted by the Bayerische Staatsbibliothek in Munich, by the Universidade da Coruña and by the University of Pisa, in addition to the commercially available digitized emblem books available through Studiolum,[4] should make several dozen additional titles available before long. Projects such as Google Books and the enormous Internet Archive are also making available large numbers of emblem books in the course of large-scale digitization efforts.[5] In other words, it now seems that the bulk of the basic early modern emblem corpus either is or soon will be online.

4.      It is nonetheless worth taking stock, I think, of some of the advances that have substantially altered the “big picture” of digital emblem studies in the last dozen years. Such is the pace of this evolution that while a number of the forecasts I provided in Munich have proved to be essentially accurate, my overall assessment of the feasibility of much of what I examined at that time turns out to have been very wide of the mark indeed. At the same time, the many digitization efforts have proceeded according to very different timetables, with very different ends and supported by more or fewer financial resources. As a result, the final products are very dissimilar, despite the existence of the OpenEmblem consortium, which is working largely to a common framework. A certain number of core elements, in my view, are desirable in a maximally developed emblematic digitization project. The digitized material should include full-page images of sufficiently high resolution to permit the examination of details in the emblem picture. It should incorporate a thumbnail page to enable simultaneous display of a large number of emblem pictures. It should include the full text of the emblem books in the original language, and a vernacular translation for emblems originally composed in classical languages. The emblem pictures and text should be thoroughly tagged using a standardized vocabulary, including a standardized name list for personages from mythology, literature and religious lore. The user should be able to search the database by keyword or for words in the full text; this in turn necessitates the standardization of idiosyncratic or archaic spellings. A further stipulation is that the books should be made publicly available free of charge and accessible via the Internet. Very few of the digitization projects now available meet all of these requirements, though all projects meet at least some, usually by providing full-page scans or digital photographs. In other words, there is still work to be done; at the same time, it is clearly opportune to envision what our next steps should be.

5.     Without going into all the details of what I wrote in 1999, I do think it worthwhile to recall one or two aspects of the paper that seem to have been borne out by subsequent technical developments. The first of these concerns XML. In 1999 XML was at best, from a user’s point of view, merely an exciting prospect still in its infancy. Nonetheless, I wrote, “[I]t now seems very likely that XML will in the next few years increasingly replace HTML as the standard for encoding web pages. This has potentially very exciting implications for computer-assisted emblem research.” (Graham 1999) It seemed to me that the nature of those implications was clear from the point of view of emblem scholars anxious to ensure the preservation of the greatest possible amount of information in the transfer of emblem books from print to digital media:

Because XML allows document structure and not just presentation to be encoded, it would be possible for electronic emblem books to contain information such as the following: bibliographic data, including author, publisher, date and place of publication, book format, number of signatures, and so forth; identification of all parts of a given emblem together with a specification of the language each is written in and whether they are in verse or in prose…. (Graham 1999)

6.     It was heartening when three years later, at the sixth conference of the Society for Emblem Studies held in A Coruña, Peter Boot (then of the Emblem Project Utrecht), gave a highly lucid talk showing precisely how this could be accomplished (Boot 2004). Thomas Staecker of the HAB has since made essential contributions through the development of a powerful schema,[6] and XML has now achieved very widespread acceptance not only among developers of digital emblem projects but in the mainstream of online applications. It seems safe to say that it has firmly established itself as a common open standard whose advantages are far too significant to ignore.

7.   As everyone who has investigated the possibilities inherent in “digital emblematica” knows, however, it is not enough to develop the kind of rigorous description of document structure required by XML. In order for an electronic corpus to be useful to users, some form of descriptive markup is required. This need is felt with particular sharpness in the searching of visual images, and in 1999 I wrote that what was probably required was:

Iconclass or keyword description of the image, together with the image itself; identification of potentially interesting features of the text, such as classical or Biblical allusions, references to historical persons, real or imaginary places, current events; source information on the emblem; and links to other emblems derived from each emblem. All this information would then be available to a browser for display as needed. (Graham 1999)

8.      In 2002, Hans Brandhorst and Peter van Huisstede presented a talk at the sixth international emblem conference that addressed some of these issues; Brandhorst and Stephen Rawles have continued to play vital roles in the evolution of digital standards for document structure and textual and visual markup, most notably through their contributions to the Digicult volume edited by Mara Wade (Brandhorst 2004; Rawles 2004). Iconclass has been adopted by the Glasgow Emblem Digitisation Project (GEDP) and the “Emblematica Online” project as the standard controlled vocabulary for both visual and textual motif tagging, and while it has been less widely adopted than XML – Peter Daly, in particular, has focused attention on the fact that its development for use in art history may limit its applicability to emblem books – the advantages of using a standardized controlled vocabulary seem so compelling that the use of Iconclass seems likely to continue to grow. Iconclass was acquired in 2006 by the Netherlands Institute for Art History, which immediately committed itself to making Iconclass more accessible; a first step in that direction was the early Libertas Browser, and since followed by the Iconclass 2100 Browser, available through the Iconclass web site.[7]

9.      With the increased use of emerging standards like XML and Iconclass, I argued in 1999, would come significantly improved display capabilities as emblem databases became more platform- and vendor-independent and as web browsers continued to acquire new abilities enabling far more flexible display of digital emblem books than had previously been the case:

In theory, a web page could easily display the information in any number of ways. For example, creating a title list would be relatively simple, as would creation of a ‘thumbnail’ page showing only the images, or a selected subset of them. Search engines could be asked to locate sharply defined subsets of emblem books. For example, one could choose to browse only French emblem books with Latin mottoes, or German Jesuit emblem books first published before a given date. (Graham 1999)

10.  Once again, events have overtaken such predictions: the Emblem Project Utrecht and Mnemosyne/Arkyves, together with the Glasgow Emblem Project and others, have convincingly demonstrated the usefulness of the searching and filtering capabilities enabled by tools such as XML, PHP and Iconclass. This is in fact where the reality of the early twenty-first century decisively parts company with the predictions of 1999, when I had reluctantly been forced to conclude that everything I had described was simply “pie in the sky.” I had reached that conclusion because of several obstacles that had at that time not yet been dealt with. In addition to some technical barriers standing in the way of the emerging technologies, these included the lack of buy-in from the libraries holding the most substantial collections of emblem books, the lack of a common approach to the technical, logistical and intellectual property issues arising from emblematic digitization projects, and some serious financial requirements that would have to be met. It is heartening to be able to report that most of these issues have now been dealt with satisfactorily: the major library collections are now fully supportive of the various emblematic digitization projects that are underway in Europe and North America, and very substantial financial backing has been obtained for major projects involving the Stirling Maxwell Collection, the Herzog August Bibliothek and the Bayerische Staatsbibliothek as well as the Emblem Project Utrecht. Where a common approach to technical standards is concerned, it would be premature and incorrect to speak of unanimity, but a consensus formed some time ago around the use of XML for online emblem databases, and Iconclass is now taken very seriously by several teams as a potentially powerful, though somewhat quirky and at times frustrating tool for the tagging of both visual and textual motifs.

11.  It is important to stress at this juncture how much the emblem digitization movement has benefited not just from the adoption of externally accepted standards such as XML but from the move towards internally common standards for data exchange. Here the coordinating work of colleagues like Glasgow University’s Stephen Rawles, the University of Illinois team (including Mara Wade, Nuala Koetter, Beth Sandore and Thomas Kilton in particular, while later Tim Cole and Myung-Ja Han replaced others on the team with Wade and Kilton), and Thomas Stäcker and Andrea Opitz of the Herzog August Bibliothek cannot be overemphasized. The descriptive “Spine” developed by Rawles as the framework for a common database format, though unlikely ever to be adopted in its entirety, has provided an invaluable benchmark for a truly comprehensive outline of the information structure encoded in the emblem corpus, and has shaped our thinking in fundamental ways since its creation nearly a decade ago. The University of Illinois team, in emphasizing the benefits of data harvesting using OAI standards and the use of library tools for metadata management such as the Dublin Core, have brought a degree of rigor to our thinking which has proven truly invaluable, while Stäcker’s work on developing an XML schema for data transformation, already mentioned above, is likely to prove of fundamental importance for years to come.

12.  The fact that we have spent so much time discussing standards in the last decade and so little complaining about the more basic technical stumbling blocks that so plagued us in the “early days” fifteen years ago suggests to me that those basic issues have now been satisfactorily dealt with, at least for the moment. It is no longer necessary or productive to spend time worrying about lack of processor speed, lack of network bandwidth, the difficulty of digitizing images, and the high cost of storage capacity: advances in all these areas have quite simply eliminated whole classes of problems that in the beginning loomed as major technical issues for which solutions were required before it would be possible to proceed with any satisfactory digitization project, let alone a globally comprehensive and pervasively connected one.

13.  Two emerging issues for emblematic digitization projects since 2005, both of which are direct results of the kinds of technical advances just described, can be characterized as debates about accessibility and corpus selection. Now that it seems clear that very few technical barriers stand in the way of giving essentially open access to large portions of the emblematic corpus, it is increasingly urgent that we turn our attention to the question of how we intend to ensure that the material we place in the public domain is made intellectually, linguistically and culturally accessible to an audience that potentially extends far beyond the very small group of scholars who have the intellectual, cultural and linguistic tools to make sense – more or less! – of a five-hundred-year-old corpus of texts and images.[8] As well, the lifting of the technical obstacles, together with widespread consensus on common open standards, has for the first time made it possible for us to contemplate digitizing not only portions of the emblematic corpus but the entire such body of work. It is this question that forms the subject of the remainder of this paper. It is important to underscore one key point immediately, which is that the problems posed by such a venture seem much more likely to be political and financial than technical or intellectual.

Some implications of “doing it all”

14.  A number of key questions arise in thinking about the “big picture question” of digitizing the emblem corpus in its entirety. It is essential to consider these questions seriously and realistically, because the very feasibility of digitizing the complete corpus has already been called into question, for example by Peter Daly[9]. Defining what we mean by “the complete corpus” is therefore the first order of business. Only once we have a useful working definition of that most basic of all variables can we attempt to answer, if only provisionally, the numerous secondary questions that arise subsequently, concerning the resources that are needed – technical, financial and human – and the all-important question of what we stand to gain by digitizing the complete corpus and whether the benefits likely to arise from it are worth the probable projected cost. Only a realistic analysis of this sort can put us in a position where we can have any hope of securing the necessary resources to begin work on what is certain to be a monumental endeavor.

15.  Monumental it seems certain to be, given Peter Daly’s suggestion that “the complete corpus, say, to the year 1700 may well number over 3,000 titles, [and] no one library has or will ever have anything approaching the complete corpus….” [Daly, 2002, ibid.] If Daly’s estimate is correct, we will surely need to be selective (by author, by language, by period, by importance), because we will have no other option. Verifying Daly’s estimate may prove to be a non-trivial problem, and in fact even an accurate verification of it might not give us the most useful count of the number of items to be digitized. This arises because the number of titles may not be the key variable: for a given title, the “theoretically digitizable subcorpus” could be variously defined to include one copy of every edition in a given language, of every edition published in all languages, of every state of every edition, of every extant copy of every edition, or perhaps of every “interesting” copy, that is to say every copy that is not identical to at least one other copy. Given that most variants (e.g. press variants correcting some error noticed during printing) can be satisfactorily noted through textual markup, the number of potentially “interesting” copies requiring individual digitization can in all probability be defined, however, to be congruent with the number of copies having textual or typographical features worth reproducing (e.g. substantive changes in the text, change in type face) or bearing manuscript marginalia.

16.  By way of establishing a working hypothesis, let me suggest immediately that it will soon be reasonable and feasible to consider digitizing at least one copy of every edition, in every language, of every emblem book from 1531 to the present day. In order to appreciate fully the implications of such a hypothesis, we need to understand what the size of the corpus, so defined, might actually be, what the cost of storing the final digitized materials would entail, and what demands in human time and other resource costs would be involved. Table 1 shows an estimate of the total cost of storage capacity for the entire digitized corpus, derived from an initial assumption of a lower and higher bound for the number of titles in the complete corpus[10] as defined above, the number of editions per title on average and the average number of pages per edition.


Table 1: Basic Assumptions and Cost of Storage

Data element



Number of discrete titles






Number of editions






Total number of pages



Image size/page (MB)



Total storage capacity required (MB)



Total storage capacity required (TB)



Cost/TB (USD)



Total cost to store digitized corpus




17.  Table 2 shows a similar calculation for the cost of employing human beings to check and annotate all pages in the corpus, again using “best guess” estimates for the lower and upper average bounds of the time needed to treat a single page[11].


Table 2: Human Resources Cost

Data element






Total hours






Total cost




18.  It is thus apparent that the cost for the human resources needed to conduct a full-scale digitization effort vastly exceeds that of the hardware involved. It should be emphasized as well that this is to a large extent an intractable factor: while the cost of hardware continues to drop steeply—my estimate of the cost of mass storage is now one fifth of what it was five years ago—the cost of labor continues to rise in real terms. The total cost to digitize every edition, in every language, of every emblem book is unlikely to fall substantially regardless of technological progress, and could ultimately stabilize somewhere between $400,000 and $40,000,000; in all probability, however, the best estimate lies about half way (logarithmically) between these two orders of magnitude, in the area of $5,000,000 to $10,000,000. This intuition can be tested by looking at the actual cost per volume incurred by the original Glasgow Emblem Digitization Project, which produced 27 digital volumes at a total cost of £160,000, or about £6,000 or $12,000 per volume. Such a cost clearly lies at the high end of my spectrum of calculations, since to digitize 3,000 volumes at this rate would cost $36,000,000. It seems certain, however, that much of the initial investment can be leveraged to reduce the cost of future digital initiatives, since a great deal of time was spent developing software, making decisions about the many issues surrounding standardization of tagging methods, incorporation of Iconclass into the conceptual and technical framework, and so forth. The amount of time required to tag the images successfully is not negligible, but the pace should increase as the database grows in size, since the individual elements in emblematic images tend to recur with some frequency. It must be said, nonetheless, that attempting to predict the final cost of digitizing the entire corpus remains something of a “mug’s game”; this difficulty is to some extent rendered moot, however, by the fact that a given digitization project typically proposes to work only on a small portion of the corpus, which renders the cost more manageable. The scale of the human cost, however, does make it imperative for us to collaborate as much as possible on efforts to standardize our digitization and mark-up protocols, and thereby to simplify the task in order to gain as much leverage as possible from the efforts of all project teams.

19.  Before turning to the benefits that a fully digitized emblematic corpus could deliver to scholars and to the general educated public, it may be prudent to examine some of the potential drawbacks – other than financial ones – to launching such an initiative. It is possible, for example, that scholars could become “blinkered” with a full digital corpus at their disposal: a reduced need to travel could mean less exploration of libraries, with correspondingly fewer new editions unearthed. As well, an incomplete corpus could be mistaken for a complete one, while the lessened visibility of originals could weaken the case for library funding. Finally, diversion of funding to digitization projects could conceivably have a substantial impact on funding for other, potentially more worthwhile efforts of greater benefit to the scholarly community.

20.  What would scholars gain from making the investment needed to digitize the full emblem corpus? The first benefit would quite simply be convenience: the ability to consult any edition of any emblem book of one’s choice, wherever and whenever one wishes to do so][12]. The proliferation of portable wireless computing devices seems certain to continue, and already it is possible to find wireless global network access throughout major cities, so that truly pervasive and “always on” network access is increasingly available. One can thus readily imagine a heated discussion about early editions of Alciato being decisively settled in a café rather than in a library, simply through immediate consultation of the relevant volumes in our comprehensive online corpus!

21.  A second benefit would be that of standardization. By this I mean that all scholars would have access to a growing corpus of emblematic material that would be the same for all who consulted it. The advantages of this are by no means negligible. At present, there is no guarantee that a copy consulted by one scholar wishing to check another’s research data or results will be identical to the copy used to conduct the original research, unless that is the second scholar is able to travel to the library where the original consultation took place. The use of a standard corpus would mean research results obtained by one scholar should be reproducible by others, a test that for many years has been considered normal and essential in the field of the natural sciences.

22.  Increased richness of consultation is sure to result from this effort as well. Bringing together multiple copies of a given emblem book in a single (virtual) location would enable scholars to conduct comparisons of editions, states and copies in a way that is simply not feasible at present unless one obtains photographic or other reproductions of all the copies of potential interest. Even then, the ease of comparing editions side-by-side on screen is very much superior to comparing microfilms or even photographs, though it is unlikely that libraries will be willing to make available online copies of sufficiently high resolution to enable comparison of the finest details in the same way now possible with two originals open side by side or even two high-quality photographs, together with a good magnifying glass.

23.  The combination of standardization and richness makes possible a third benefit of a comprehensive online corpus, which is the greatly increased potential for data mining: the use of specialized software to discover previously hidden patterns in data. As the software for textual and visual analysis becomes more sophisticated, it seems likely that previously undiscovered genetic or semiotic relationships will be discovered in the lineage of the European emblem which in turn will help scholars to clarify the sequence of editions and the chronology that governed the spread of the emblem as it radiated outward from its earliest days to become a truly pan-European phenomenon.

24.  The environmental advantages of digitization should not be minimized. The first of these is local: as editions are made available online, the need to consult the originals is greatly reduced, since few scholars need to have the actual book in hand in order to study the emblematic material in it, and a high-quality page image coupled with full-text searchability can readily substitute for the physical artifact in nearly all cases. This in turn means reduced physical strain on and consequent damage to these comparatively fragile and rare artifacts. On a larger scale, if the corpus grows to sufficient size, it may well be that the reduced need to travel in order to consult original materials may help to reduce the environmentally harmful effects of air travel. In the long run, reducing the need to be physically co-located with a given copy of a book may well be considered the single greatest benefit of whole-corpora digitization.

25.  A final potential benefit concerns broader public access to the early modern emblem corpus. Properly annotated, many emblem books could conceivably be of considerable interest to a segment of the educated public. Chris Anderson’s recent study of Internet marketing and sales, The Long Tail (Anderson 2006), suggests that a viable public exists on a global scale even for material occupying an extremely narrow niche. It seems likely that a comprehensive digital emblem corpus could well attract members of the public with an interest in symbolism, analogy, Aesopic fable, classical mythology, the Bible and so forth. This in turn reinforces Rawles’ contention that making emblematic material intellectually as well as technically accessible should be a high priority for scholars promoting digital emblematica, since public support will be needed if public funds are to be devoted to these projects.

26.  The remaining question, then, is whether these benefits are sufficient to justify the cost of undertaking the Herculean task of digitizing the entire emblem corpus. In my admittedly biased view, there is no doubt that they are. While the total cost of carrying out the full digitization may appear daunting, we must not lose sight of the fact that the project is admirably suited to being broken into a large number of much smaller components. It will not be sufficient to persuade ourselves that such an effort is required, however; given the amount of financial support needed to undertake even the separate components of a long-term holistic digitization project, a high degree of government support will almost certainly be required. What levers, then, can we use to move the academic colleagues, the bureaucrats, the politicians without whose support we will go nowhere?

27.  In many ways, the project we propose to undertake is similar in its implications to the “big science” projects that have been so frequently debated by our colleagues in the natural sciences. Very large scale infrastructure proposals can win the support of the academic colleagues who make up peer review committees if we can show that they will not take funding in the long term from other equally worthwhile projects. We should be able to make persuasive arguments here, since once installed, the projects should entail relatively low operating costs, unlike many projects in the science sector.

28.  The strongest political arguments in favor of large-scale emblem digitization are almost certainly those that take advantage of the fact that early modern illustrated books are a vitally important component of the European cultural heritage. This argument has already been used to good effect by many involved in the field, and it seems to resonate well with elected officials, with civil servants and with granting agencies. A promise simply to preserve the full corpus of this fascinating material, however, will not be enough to win their support, since it is likely to be seen as purely self-serving. What will be needed is a firm commitment on the part of scholars to go beyond “technical access” to true accessibility, when we make clear our intention to come down with our emblem books from the ivory library tower into the crowded and grimy streets of today’s increasingly commercial and populist Internet. Only when we are prepared to tackle the task of providing full vernacular translations and readable interpretations of the corpus, together with explanations of how and why it remains connected to the every-day interests of today’s public – in other words, how the emblem corpus is fundamental in the history of visual rhetoric and persuasive communication including cartoons, comics, advertising and propaganda – will we have a chance of gaining broad public agreement to provide the funding necessary to realize this grand and worthwhile vision.  


[1] As have those in Graham 2004; the field continues to evolve with great rapidity.

[2] See http://www.emblems.arts.gla.ac.uk/.

[3] See http://www.hab.de/forschung/projekte/emblematica-e.htm.

[4] See http://www.studiolum.com.

[5] See http://www.archive.org/.

[6] http://www.hab.de/bibliothek/wdb/emblematica/regelwerk.htm

[7] See http://www.iconclass.nl/ (main site) and http://www.iconclass.org/ (browser).

[8] This topic formed the subject of a panel organized by Stephen Rawles for the 2006 meetings of the Renaissance Society of America, and will consequently not be treated here.

[9] “[D]igitizing and analysing the complete corpus in all languages from 1531 to the present is probably a pipe dream….” [Digitizing the European Emblem, 2002, p. 36]

[10] In what follows, the lower bound can be taken to represent a “best guess” at the number of emblem books defined stricto sensu, that is to say actual collections of emblems rather than books containing a few scattered emblems, books of emblem or symbolic theory, books of fables or natural history resembling emblem books in their presentation, and other paraemblematic material, which the higher bound is intended to include a much larger number of emblem books more loosely defined to include much if not all of the foregoing.

[11] It should be noted that while the average time needed per page varies enormously, a mean time per page can be stated with some degree of confidence.

[12] Recent developments in “cloud storage” and mobile application technology are helping to ensure that this desideratum becomes a reality.


