 Well, this is my first time at CNI. I'm terribly excited to be here, more than a little bit fearful. I have only been a resident of the library for about a year and a half, having begun my career and had what I hope is just but the first part of it, as a professor in the Department of Classical Studies and only recently a joint appointment to the library as well. So I'm here today to give you a bit of an update on an effort that's underway at the Duke Collaboratory for Classics Computing, which is a three-person team in Duke University Libraries, which consists of myself. I'm a professor of Classical Studies and History and now jointly appointed in the library. And my two colleagues, a couple of brilliant software developers, Hugh Kalis, who is a Classics PhD and a librarian and a software developer, and Ryan Bauman. Our mandate is essentially research and development of tools and services that furnish core infrastructure to the various fields that fall under the general domain of Classical Studies, but whose impact we hope will reach well beyond the walls of that rather narrowly construed discipline. And we're able to do this exciting new thing in no small part thanks to the generosity of the Mellon Foundation, which I am absolutely thrilled to acknowledge. The project I'm going to talk about today is called IEDS, Integrating Digital Epigraphies, and builds on work that we've been doing for about a decade now in epigraphies much younger and much kinder sister discipline. Attending an epigraphic conference was once described to me by my advisor as like walking into a room full of velociraptors wearing a suit made of pork chops, which I assume is why the audience is the size as it is for today. So first, I want to give you a little bit of a word, a little bit of introduction to a project that we spent a good bit of the last decade working on. Again, with the generous support of the Mellon Foundation, has a way of furnishing a bit of context and introduction to this other project that's underway called IEDS. So the late 19th century, in the late 19th century, documents written on papyrus, antiquities paper, this is our word paper, started coming out of the dry sands of Egypt. We discovered letters, tax returns, love mail, hate mail, marriage contracts, divorce agreements, sales, leases, receipts, magical potions, petitions, administrative correspondence, minutes of court proceedings. The list could go on and on. These things, as you are seeing up here, are pretty fantastic. This is a letter from a husband to his expecting wife, with particularly awful instructions. These things, in other words, are fascinating. And they can be hard to deal with as well. They're often damaged. The Greek is non-standard. These documents cover an array of highly technical subjects. Their provenance is tricky. Their chronologies are tricky. The barrier to entry is just very, very high. Well, one result of this is that from the very beginning of the field, the curation of the physical object and the scholarly interpretation of the text born on it have always been intertwined. Collaboration has always been necessary from the very beginning of the discipline. Moreover, since the papyri offer almost our only clear view on the kinds of things that appeared in ephemeral documentation, our only part of classical antiquity from which women's voices are directly heard, for example, their contents are of really very wide interest, not just to technically-minded specialists, but to normal people, with the result that accessibility has also been in imperative from the very beginning of the discipline. And finally, libraries have historically been the owners, conservators, and catalogers of these objects so that a culture of sharing and work based on clearly developed standards has also been a key development, a key feature of development in this space since the very beginning. Now as early as 1982, pepperology started to develop a number of digital tools to support research in this very technical space. Duke at the lead, I was 10 years old and not yet there. And in the early 2000s, we at Duke started working to integrate these various resources under a common standard in a shared and open environment called papyri.info. Very quickly, we bring together the so-called Duke Data Bank of Documentary Papyri repository of some 60,000 Latin, Greek, and Coptic Egyptian documents written on papyrus in similar surfaces, the Heidelberger Gesamtversichnes, which is a database of scholarly metadata covering more or less the same set of texts, date, provenance, document, genre, controlled keywords, the advanced papyrological information system, a set of institutional curatorial catalog records for many of these same papyri and many thousands others besides, and very importantly, images of the same. The bibliography Papyrologique, a quarterly scientific bibliography of the discipline, and last, a project called Trismegistos out of Leuvin, which is a crazy, varied and rich set of scholarly metadata on a wide array of subjects from prosopography to onomastics to institutional inventory numbers. They keep better track of inventory numbers than the institutions that hold the objects do. And we brought them together under a site called papyri.info. Now, we didn't just bring them together, but in fact, we allow anyone to create an account, to log in, to add and amend existing texts, to translate them, to redate them, or otherwise change the metadata, and not just the text and translation, but also these institutional metadata records, which in the case of APIS means people in the crowd are in effect authoring or emending the catalog records of institutions like Duke, which is a pretty interesting phenomenon. We then pass all of these through a rigorous and completely transparent, non-anonymous peer review process, eventually push the results back to the canonical repository. This is all citable, this is all permanent, this is all transparent, this is all get in the back end. Now, since going live, about five years or so ago, thousands of already print published papyri have been entered into the system. Hundreds of emendations have been proposed and many of them accepted by junior and senior scholars alike. I can't tell you how many translations have been added, some of them by undergrads, there is a team of undergrads at BYU that is systematically translating the largest archive of Greek documentary papyri from antiquity and passing through peer review by internationally recognized scholars. There have even been documents that have been published directly to the database in effect skipping over traditional print publication. This place has in less than five years become in many ways the center of a discipline. And in that place, we hear, I think, more loudly and clearly than we did before, the voices of graduate students, the voices of our junior colleagues, the voices of women who constitute now 50% of the editorial board of the project. There is among cognate journals in this field, I don't think a single one that can boast those kinds of numbers. So far as I can tell, there's been no blowback about the non-anonymous peer review. We gently told one colleague who tends to be a little mean that his meanness would be a possession for all time and exposed to everyone. And rather than quit, he promptly started being nice. Who'd have thought? There's, so far as I can tell, no reluctance on the part of junior scholars who are afraid that if they do their work here, they won't get tenure. And as far as I can tell, no reluctance on the part of senior scholars who are afraid that this is a piece of junk. It is great and it works. So fresh from this experience, it took about a decade and wasn't terribly easy. We decided we'd try to create something similar for Greek epigraphy. This is what I'm gonna talk about today. Now, epigraphy is the study of public documents carved on stone and erected for all to see. And antiquity was just carpeted in these things. Maybe something like 500,000 of them in Greek survive and somewhere around 300,000 published in some form. Countless thousands of those republished many times over in many print editions. In this space, we have laws and decrees, oracles and sacred regulations, letters from kings and emperors and letters back to them, records of interstate arbitration, funerary epitaphs by the thousands, hymns, inventories, financial documents, building contracts. Again, a very huge variety of document types in very, very large number from across the footprint of Greco-Roman antiquity spanning in effect from Ireland to Afghanistan. Again, this is a situation in which there is a very high return on scholarly investigation into this domain and also one with a very high barrier to entry. This isn't very clear on my screen, but I hope what you can see is not much of an inscription. Now, all of the complexities that in here in the study of papyri belong in this space as well and plenty of others besides. The texts are more numerous by an order of magnitude and come from a wide range of ancient places and no less significant perhaps even more so modern nations which tend to have potentially different dispositions with regard to how nicely they play with others when it comes to their cultural heritage. The date and provenance of these documents is often much harder to control than as is the case with the papyri and many, many stones are now lost or destroyed owing to war or other misadventure. And the objects tend to be housed in museums rather than in libraries with the result that the field's disposition to openness is often quite different from that of papyrology which tends to be driven by libraries. So here too, the work is hard and just too vast to do without collaboration. There aren't enough epigraphists to do this work. There aren't enough papyrologists in the world to do this work. The very best papyrologist ever to live, Herbert Udy who did nothing, had no teaching responsibilities and whose window in the office had a direct line of sight to the window in his house and his wife would wave when it was time to do things like eat. He said working full time he could edit about one papyrus every two weeks. That's 26 a year. There are 180,000 in Vienna, another 150,000 in Berlin. This is not the work that even a handful of the Michael Jordan's of papyrology and epigraphy can just do. But on the bright side, in the case of epigraphy as in the case of papyrology we are fortunate to be graced by a number of very useful tools. So part of what we're doing with this project that we call IDES is to bring together a handful of tools that already exist, clean up the data in certain ways and start to build capacities on top of that. Before I describe what's available here in the work we've started to do, I'll offer a word on what we hope eventually to help contribute to. That is the kind of thing we fantasize about in our joint space in the library. And this is a world in which say an undergraduate studying abroad or a tourist or a citizen who's concerned about risks to cultural heritage sites can download a simple smartphone app, snap a photo of an ancient inscription, upload the image to their flicker account or whatever and also the metadata including who they are and the geo coordinates under an open license to our institutional repository. Something that triggers a set of processes that verify whether we have other images of the same. We're pretty good at that right now. And ask the user, is this the stone you're looking at? If so, we think we have another photo of it here in the archive. The student casts the vote, yes, no. We record and attribute that vote. Align the image with existing transcriptions if possible. Again, asking the user, we think this text has been published already. Does this look like the Greek text you see on the stone? User casts a vote, yes or no. We record that. Alert the reader to relevant secondary literature on the object if we know any about it. Now, we're not yet there, but we've made a start. And again, working with others as we did with the papyri. As I mentioned, Greek epigraphy is fortunate to have several substantial online tools at its disposal. And all of them, though, are derived from prior and in one particular case, a concurrent print resource, which turns out to have really considerable impact on the development of linked approaches to collaboration across the multiple resources, which I'll get to in a minute. So first, the Packard Humanities Institute, which has been for, I don't know, 30 years now, digitizing something like 220,000 transcriptions of ancient Greek texts from inscriptions. Thousands of these texts exist in duplicate editions. Each edition carries a unique identifier, but there is no such thing in this project or any other as an identifier that corresponds to the object from which these multiple editions derive. We control only bibliography, not the objects, in other words, as is done in print. And texts are named and discoverable by a system of abbreviated bibliographic reference that is intended to be read by a human rather than by a machine, some of which is in reasonably wide use, much of which is not. That is, the names by which they call things are not the names by which most other people call things, but that's okay. Second, the Supplementum Epigraphicum Graicum, or SEG, is an annual publication that summarizes books and articles that concern Greek epigraphy in some way. And it's arranged and cited by volumes and sequentially numbered entries within those volumes. It's tens of thousands of entries feature hundreds of thousands of citations to published inscriptions. This digital project is the successor to a print project that began about a century ago. SEG began its life that long ago on paper and continues on paper as well, concurrent to its digital instance. All past volumes have been retro digitized and as a few volumes ago, each new volume appears both in print and online at more or less the same time. Most but not all of these hundreds of thousands of references have been extracted from the digital versions of the records by a combination of machine and hand processes and reside in a series of XML files, which the director of the digital effort very generously shared with us. Now these citations also use a system of human readable abbreviated bibliographic references, which only partly overlaps with that of Phi. Also like Phi, SEG has no control over objects, only over these semi-standardized bibliographical references to the objects. Now I mentioned 28 minutes ago that thousands of these texts have been published many, many times over, repeatedly in journals and books as is the standard practice of the field, each time each new publication getting its own newly minted abbreviated bibliographic reference. Well the Dizionario Grego Espanyol years ago decided that if they wanted to cite Greek inscriptions, they had better keep track of republications. Nothing would be more embarrassing than citing the 10 instances of a word as appears in Greek repigraphy only to have someone point out that you just cited the same inscription that has been republished 10 different times. This has happened, you don't want it to happen to you. So they invented yet a third system of abbreviation and started entering the indexes of journals, books and articles that refer to inscriptions into a database that they've called Claros. And there exist here more than a million and a half pairs that in effect take the form of an index for publication X, C publication Y, all right? These doublets contain no more semantics than that, no verbs, these are not triples. They just entered thousands of indexes, A, B, two columns and they have 1.6 million of those. They do not distinguish, publication X is a republication of the text found at publication Y, for example. From publication X contains a translation of the text found in publication Y. From, for example, publication X contains a totally unrelated inscription but which nevertheless was carved on the back of the stone that contains the text published in publication Y. Just A, B, all right? They just entered indices. These are really profound differences and like the others, Claros also can't control objects but only references to them. So all three of these projects feature data the extent of whose semantics is essentially for this, see that and we have those by the millions, all right? All using only partly overlapping naming systems. In effect, all three projects have operationalized in the digital space the logic of a print object and especially the index. The impact of this is really profound if you wanna actually build something atop that where the semantics are of the utmost importance. Okay, so at the risk of overwhelming you with details I'll mention just a couple more and I'll try to wrap it up at around the 23 minute mark so that we can have some discussion. Now of course, thousands upon thousands of inscriptions are referred to in published secondary literature. In almost all cases using yet different systems of bibliographic abbreviation. JSTOR has very kindly shared both page images and OCR for classics journals with us so that we've been able to start extracting some of these bibliographic references. This of course is not easy because a standard epigraphic bibliographic reference looks like this which is totally unambiguous to a person or at least to an epigraphist pretending to be one and is also very difficult on a machine in order to get good OCR recognition or machine parsing of what is here both a bibliographic citation and a citation to the document structure collapsed into one. This is a real bear to OCR right and then for us to pull out of the data and align with the other resources that we have. And finally each year countless students and tourists photograph inscriptions in some cases inscriptions for which there is no previously published photograph anywhere. In other cases inscriptions that have been thought that are thought to have been lost and put them up somewhere online on places like Flickr and we've started harvesting open licensed images of inscriptions from Flickr and based on a training set of a couple thousand hand verified text image mappings we're now able to a relative degree of accuracy to group multiple images of the same inscription which is a non-trivial task and much more interestingly we're now on track to being able to map computationally images of inscriptions to existing transcriptions where they exist using humans then only to accept or reject alignment candidates. That's going to be spectacular when it's in a place where I can show you and we can talk more about that if you like. The first point I wanna make here is that a number of projects have collected a mountain of data painstakingly in some cases over a century for the most part exacting and reliable but done under the reigning logic of the day that is the print book not anything that we can use out of the box now. Second, our partners have been extremely generous about sharing there has been no barrier to cooperation here and these are projects that have variously different funding models SEG is a publication of BRIL which is in the business of trying to make an awful lot of money and it's to their incredible credit they've been so generous in sharing data with us. Third, the semantics of the data however are so lean that it's nearly impossible to leverage the work of one of the projects to the benefit of the other without doing a lot of heavy lifting and the heavy lifting is what we've been about. We want for example users to be able to assert in a stable and sightable and vetted way that the text in one database has been translated in another and in yet another has a corresponding image that shows that the text should be reconstructed in some different fashion which it turns out runs contrary to an argument advanced in an article that happens to be in JSTOR. Nothing like this is even almost achievable now in this space despite the wealth of information that looks like it ought to be relatable to each other. To be able to support this we'd have to in the first case be able to mint unique IDs for the epigraphic objects not the derivative editions of them which is the box within which all epigraphists think today to relate publications of the same to those objects and to each other in controlled and semantically rich ways. We need to be able to say that X translates edition Y which is an edition of inscription A. We need to be able to attribute complex assertions about these relationships. Mary says that translation X has to be changed in accordance with Fred's new proposed correction which itself was offered in the light of a newly identified fragment of that same stone. This kind of thing happens all the time in this trade and we have no way of controlling and providenceing those kinds of assertions. So the first step here is to parse and align the millions of semi-normalized only partly overlapping bibliographic references to these publications that we've harvested from Phi, from SEG, from Klaros, and from JSTOR doing our best where we can to infer some semantic significance from the structure of the data. SEG, these entries are written in prose and they have developed a kind of structure to them so that we can tell when the first string that appears in an entry is a non-word and seems to be an epigraphic citation that the entry is in fact about that text, principally not just mentions that text. And if it is followed by another reference that is in parentheses but not in square brackets then this was the convention for indicating that this was an addition that superseded a prior addition, right? So there's some semantics that we can squeeze out of the free text that comes into us. Not a whole lot, but some. Our first pass at this lives in a project, lives at ides.io. So I'll just walk you real quickly down what the human interface looks like right now. This is meant for machine consumption and right now we just have the human interface for debugging. So let's say I want to go visit an inscription that humans call IG1 cubed 40. I drill down to what we think we know about IG and from there to what we think we know about IG1 cubed and from there what we think we know about the document called number 40 within IG1 cubed 40. You can see here that we have minted a unique identifier for the object, not for the bibliographic reference to it and will later support annotation that allows users to lump and split things against the identifier so that we can start to clean up the gigantic harvest of publications. You can tell that we've been able to infer from the data that this addition supersedes to others where it says sites. These are prior editions of the same text and we just had to sort of with some gymnastics infer that from the structure of a variety of records and we've been able to infer that this text is the main topic of a bunch of SEG entries that it is cited by a number of other resources and by hitting the JSTOR API just for these three strings that we know of as common references to this object not as their other cognate bibliographical representations were able to pull in articles that cite them. There are no open licensed photos of this stone on Flickr if there were, you'd see. So the next step which we've only just begun is a user interface that allows users to create new relationships, to modify or deprecate existing ones, to mint new identifiers where for example a composite text has to be split up to deprecate old ones where for example previously discreet fragments are conjoined. Again this happens all the time in this field and in general to allow users to add the kind of rich semantics that the data streams that we have by enlarge lack that is to add in a general way to the vast property graph that stores these provenanced and contingent relationships behind all of this that I'm showing you now. Now this is only a part of the effort. We've also started similarly hair-brained and massive effort to align geospatial data associated with these inscriptions that we pulled out of PHY and SEG aligning the very loosely standardized representation of provenance in those resources with the much more closely vetted geodata in Pleiades, a vetted gazetteer of ancient place names, and with the crowdsourced data from geonames. And we've just started to align that set with the recently released Getithosaurus of geographic names. If I start in on this we'll never get out of here but the basic point here is that the geodata is as gnarly and horrendous and semantics free as the publication data are. Okay, so eventually what I'm showing you up here is just that we have a variety of very preliminary APIs that allow you to get different representations of the data out of the graph. So eventually IEDS means to be at least two things. First of all, the future basis via a set of services, tools, trusted repositories for curating the relationships across digital epigraphies, multiple resources. And in the meantime, the place and mechanism for in effect redoing to modern machine actionable spec, not just the 19th century's enormous effort at an initial pass at data creation but also the late 20th century's effort at migrating all of this to digital, both of which need to be redone from the very beginning or else we'll build nothing on top of it. Now, neither of these endeavors I fear sounds very sexy or perhaps even very fundable but they are I think the sine qua non for meaningful collaborative network based progress in this field. There is no other way forward than one that begins with redoing two centuries worth of work. It's worth it. That's where we've been concentrating our efforts with IEDS and we'll either crack it or it'll crack us and we'll find out which of those is the case soon enough. So I thank you for your patience and let's get the details if you like it.