 This is my first time presenting on this project without my project co-director, whom I'll introduce in a second. And so the first thing I'll say is that I have no technical background. So in answer to, maybe in philology, but not in computing. So partly in answer to Tom's question about digital humanities, I'm going to be quite evasive and say that I work on a project that's computational, and I believe it falls under the purview of the digital humanities. But I'm not actually going to talk about the digital humanities in any capacious sense in the next 40 minutes. This is my project co-director. As you can see from his PhD, he's a biologist. And you might be wondering, what on earth does a Latinist have to do with the biologist? And I'm actually not going to answer that question now. I'm going to come back to it later. And the story of our collaboration and its origin is also interesting from a pedagogical perspective and is relevant, some of the things I'm going to talk about. And don't worry, I won't keep you hanging on that as well. I'll tell you about that too. If you do have questions about the technical side of the project, the code underlying the tools, the biological models that we employ, please don't feel that you can't ask. I'll just forward the questions to Joseph, and he will give you an answer, rather than me garbling what is no doubt a complicated and important question. So intertextuality is probably a familiar topic to many of you, but I did want to just provide you with a definition from the person who's most closely associated with the term and who invented the term, Julia Christeva, the Bulgarian, French critic and semitician. And I picked this quote in particular, which is translated, any text is constructed as a mosaic of quotations, any text is the absorption and transformation of another, primarily because of this word mosaic, actually. And again, you'll see why that word mosaic is important. And intertextuality has really been the bedrock of criticism in Latin literature for, I mean, I would say principally from about the mid 80s onwards, associated in particular in the US with the work of people like Richard Thomas and Stephen Hines, and in Italy was Gian Biaio Conte. But really the kind of work that involves systematic comparison of two texts goes back to canonical works like Knauer's and the 1960s and much before that too. Now that definition of intertextuality to some of you will seem particularly narrow. It's a world of illusion and reference. And obviously, intertextuality is more capacious than that. It refers to relationships of the broadest kind, reception histories, and also to microscopic elements of meter and language. So I wanted to give you some example of the narrow kind because that's where I think the computational tools that we've developed are most instructive and helpful. But while I'm not going to talk about them today, we are working on other tools that approach questions of intertextuality from a larger reception history perspective, looking at the evolutionary histories of texts. So we here have a very simple case of an ancient critic, Lactantius, that's probably not his name, we don't know when he lived, we don't know who he is really, but this is the name attached to the commentary. And his commentary is on an ancient epic poem written in the first century AD, Statius is the Bired. Lactantius, whoever he is, is writing several centuries later. Lactantius quotes a bit of Statius at the top. I'm going to give you the translation in a second. I just want you to focus on on the Latin for a second since it shows what our tools do. So Lactantius quotes a little bit of Statius. And then he quotes the Comparandum. He quotes Virgil. And we know Statius is thoroughly intertextual with Virgil. And we have a nice illustration here and I've just bolded the relevant phrase. Now what's interesting about these two phrases is they're obviously very similar. You don't need to know Latin to see that they're very similar. But you can also see slight differences, changes in word order, and also changes in spelling. Okay, and immediately, you're probably thinking to yourself, well, that's going to pose a problem for a search method, right? You can't just search for an exact string and find it because well, those two words aren't spelled the same. That already poses a certain problem for detecting similarities among texts. And here's the translation. This is obviously an example drawn from the book I wrote a couple of years ago. And then here is a modern equivalent of precisely what Lactantius was getting at. This is a commentary written in 1991. His and this is the character Caponea speaking in The Bired Nine. His words here recall a speech of Byzantius, a character in Virgil. Okay, so good fodder for the literary critic to work on the ways in which these two characters, these two works might interrelate. Even though the particular example in question might look quite trivial, it's obviously part of a larger network of associations that are the nourishment for any kind of literary critical interpretation that we might engage in. So, why does intertextuality matter? I'm going to create an artificial division between language and philology on the one side and hermeneutics and interpretation on the other. It's an artificial division, but I think it's quite helpful. So, I think intertextuality matters. And again, this can be familiar to many of you, because from the perspective of a linguist or a philologist, we want to know whether there are other examples of this phrase or similar examples, whether there are any patterns in their appearance or usage, and what bearing meter or genre might have on usage. That's the kind of thing that a philologist wants to know in the regular course of events. It's the kind of thing any instructor of language wants to impart to a student of language. That's how we want students thinking about language in a technical way. But as literary critics, we also think intertextuality matters because we want to know in this local case what the effect is of Statius using Virgillian phrasing. Why is it that he's using this phrase if there is any specific reason at all? And in what ways do these two characters resemble or differ from one another? Is there a way in which intertextuality can help us think about a particular text as a locus for similarity or difference? And what literary reasons might account for differences in phrasing? Is there some kind of significance that goes beyond just fitting a formula into a particular metrical rhythm, for instance? So I'm going to introduce you to three tools. Some of you, I know there are a couple of classes here, are going to be very familiar with these. I'm going to introduce you to three tools which are not in each case designed to detect intertextuality, but that's how they're used. Two of them are designed to detect intertextuality. One of them is a browsing and searching tool, the first one, which is instrumental for most classes in finding intertext. I'm going to talk about three of them briefly, focus obviously more closely on my own project, and then I want to talk about some other things like pedagogical uses towards the end of the presentation. So this is Diogenes, it's a browsing and searching tool. Now the key thing to know about this tool, it's freely accessible, you can download it, it is completely useless on its own. The way in which this tool works is to use text provided by the Packard Humanities Institute, or the Thessaurus Linguigriki, texts of Latin and texts of Greek respectively, huge high quality corpora of those texts. It uses those texts as the basis for its browsing and searching, so you need access to those texts. This has changed recently, so the Thessaurus Linguigriki has a kind of an abbreviated version that you can search online freely, it has a more expanded version that you probably have institutional access to through your institutions. The Latin equivalent is now available online. The reason why I'm still presenting on Diogenes is because it is an exquisite tool for searching those corpora, it remains the best tool out there. Nothing I'm going to say about these three tools suggests that one replaces the other. Our hope is that they work complementarily and are useful to all literary critics and to teachers of language. So you can still get a free CD-ROM, the Packard Humanities Institute Latin text by the way, if any of you still have a CD-ROM drive. I mail-ordered a couple the other year. So Diogenes is designed by Peter Hezlender, class assistant at Durham. It was originally designed in 1999, but the latest version is from 2007. It's a tool for searching and browsing as I said, and not just Greek and Latin literary text, but also papyri. The method is exact matching. Again, this is key, because we're talking about ways of detecting similarities between texts, and already we've seen that sometimes these similarities are inexact or imprecise. Any time you kind of look for something using control F, I mean you know there are some restrictions on what you can find, here your options for flexibility include wild cards or partial words, and I'll show you how you might get at the result that I showed you earlier using Diogenes, even though those two words are spelled differently. It has numerous other useful features when you go to a bit of text and you click on a word, it gives you the definition drawn from the dictionaries provided by the Perseus project, for instance, and there are various other features that are really very useful, and it's routinely used by classes, many of whom like me have no particular technical background, and I cannot emphasise enough how important that is. Any kind of intimidation in the technical ability required to use a tool renders it completely useless, basically, even if the tool is quite useful in itself. This is routinely used by classes, which is great. So I put in some search parameters, you can see here that I put in the first four letters D, E, X, T, from that phrase we looked at. I put in square brackets E, R, so it would search for D, E, X, T, E, or D, E, X, T, R to encompass both possibilities. To be honest, I could have just left that out and just written D, E, X, T, and it would have been fine. And I wrote the word for me, Mihi, and I said, find these two words within a phrase in a pre-selected corpus of five epics and ten tragedies. So that's pretty sizable. I'm talking about many thousand lines of Latin. I mean, sure, it's not like searching Google books or something like that, but we're talking about really targeted searches here. And I picked this set of texts because it forms the test corpus for our own project as well. So I wanted to use the same search corpus. The results come almost immediately. I mean, one of the great things about Diogenes because it's doing exact searching is that the results appear very fast. There were 20 results and you can immediately see here the result that we were interested from book nine. And you can see that it captures both forms. So it's a really great tool. You can just click on the context and get access to the passage from which the verse appears or prose. It gives you very clearly the line reference. And as I said, because it's based on these databases of high quality text, that's kind of a big deal in classics. Our texts aren't all of the same quality. We have access to a lot of text through things like the Latin library, for instance, but a lot of those are free texts, right? Out of copyright. Sometimes there are typographical errors. And the quality of that text matters hugely, both for just getting the text right and not having garbled information as a researcher. But it's also difficult for students if they're confronted with text where there's a mistake in the text or there's something funny about it that hasn't been corrected by an editor. So it's great that it's able to use these high quality texts. There are some developments in that area that I can talk about in the Q&A that probably aren't going to come online for a few years. But there are some developments in the area to make higher quality texts accessible to the general public freely. So that's tool number one, Diogenes, which has been around, as I said, from 1999, although probably it's only been used a good deal from the early 2000s. In 2008, a project began at SUNY Buffalo, led by Neil Coffey, again a classicist, and Tesserai means mosaic or mosaic tiles, and you can see the logo is made up of mosaic tiles, hence why I chose that particular quotation from Christopher, right? They're drawing on Christopher's image of the mosaic, and this tool is designed precisely to detect intertextuality. Now, I talked about intimidation, right? Looking at an interface for a tool and how it can be off-putting. And actually, I deliberately put this up because it is a little off-putting. This is not their launch page. What I've done is I've shown you the advanced features, which are fantastic. It's really good to have these features, but I just wanted to just put in your mind the thought, well, I as a user, and it doesn't really matter whether you work in Latin or Arabic or whatever, because part of the point of this presentation is to show what we do in Latin or in Greek, right? And for you to think about maybe how you might incorporate similar tools in your own instruction in your own languages, I hope that's one of the outcomes of today that we think about the different tools that are employed in different languages and how we might collaborate and export them. Well, if you face something like this, and you weren't very technically or linguistically minded, you might actually just be put off. So they've done a very clever thing, which is that their launch page actually just causes all of this to disappear or just doesn't show it to begin with. And you have to actively reveal it. So what you have is just these six boxes at the top. And here's where Tesserize Insight is fantastic. So Tesserize decided instead of targeted queries, where you input a particular phrase that you're looking for, they decided why don't we just have global comparison of two texts. So what you're interested in is not this particular phrase and Statius or in Virgil, but I'm just interested in Statius and Virgil and their interrelationship generally. So how am I going to find the intertext between these two works? So what you put in is a target text and a source text, and it, the program will focus on two word phrases. So the intuition is that two word phrases, whether relatively banal ones like Mickey Dextra, my right hand, or less banal ones, that those two word phrases are the core of intertextual elusive or referential language. And so the program will show you all common two word phrases. And this is the important next step. It has a scoring system to identify intertext of greatest interest and it does that by ranking the phrases according to the relative infrequency of terms because you're going to be more interested in two word phrases where the terms are rarer rather than having words like me or, you know, verbs like be in them. You're going to be interested in more substantive and rarer language or diction. And there are several experimental tools in addition to the core intralingual search. So there are some cross linguistic search tools, which are experimental. And we can talk a little bit about that at the end of the presentation as well. And the great thing about Tesserai is that they have a very quantitative systematic approach to evaluating their tools. They have very impressive benchmarks for the discovery of meaningful intertext. The work was published in the transactions of the American Philological Association in 2012 with a partner piece in what was then literary and linguistic computing also in 2012 on the underlying technical aspects. And what they showed was that you could recover a lot of the references that we see in scholarly commentaries. You could recover them using this tool. Now, this is what Tesserai output looks like. I don't want you to read the individual words. I know it's very small. I just want you to get a quick visual picture of what the output looks like. There are numbers that I took off the side starting with one and just going down. But the important thing is you have the reference in your target text, the reference in your source text, and you can clearly see the line in question highlighted in red, the etymological or linguistic similarities. The particular words in the two word phrase are set aside here. And then there's the score based on the metric of the relative infrequency of the word. It's also based on a distance metric, i.e. words that are closer together in a phrase. So it's treated more as a tidy unit, score more highly than cases where the two words are further apart. Now, there's an obvious problem with this, which they very much take account of, which is that you generate an awful lot of results. If you look at the top there, you probably can't make it out of the back, but there are 828 results just for a comparison of this isn't the whole work. This is just a book nine of the thread and book 10 of the Aeneid. That's a huge amount of data to plow through. Now, the scoring method certainly helps in that it reduces, you can presumably just not look at anything below a six. And in the advanced search feature there's an option to just not look at anything below six, seven, eight or nine. But nevertheless, you're going to get a lot of information. And plowing through that is going to require some expertise in the language yourself. This is not a tool for amateurs. Now, because of the nature of the intertext that are identified, you're also going to get things that aren't very significant. One of the things that Tesserai has done very well is to explain the difference between, and I know that literary theorists have been doing this for a long time, but it's very useful to see this in the context of a digital mountains discussion, the difference between meaningfulness and significance, literary significance. So what we get in a lot of these cases are meaningful comparanda, right? Phrases that are similar in some way that as philologists or linguists or people who are interested in commentary or people who are interested in teaching students about similarities and what have been referred to as code models. So language that's generically similar, but not necessarily significantly similar for a literary critic. Tesserai has been very good at explaining how this tool is very effective at giving you those code models in a systematic way. I mean, for a long time literary critics have had intuitions about generic language, but this is a fantastic way of establishing quantitatively and demonstrably the basis for similarities in generic language between authors. And in fact, there's a project currently led by Neil Bernstein at Ohio, Kyle Gervais at Western Ontario and Wei Lin, also at Ohio, where they use Tesserai to ascertain which epic poets in Latin are following which particular epic models and to what extent, using this kind of large scale systematic tests as the basis for their analysis. And that's coming out in Digital Humanities Quarterly this year. I think it's forthcoming in the next month or so. So this is a very brief account of Tesserai's strengths. It takes account of semantics, by the way, as well as morphology, although that's not immediately clear from these examples. That was a recent addition in 2015. So it looks at the definitions of the words in English. And so even when there isn't necessarily a direct Latin etymological equivalent between two words, the semantic equivalency between them will cause them to score more highly. And the processing time is fast. It's not as fast as diogenes, right? I'm filming for obvious reasons because of the number of calculations required, but it's still pretty fast. And again, that's important. I mean, it sounds trivial, but I mean, all of you have sat in front of a computer and are tempted to load something or use a tool. And it's very off-putting if something isn't rapid. Okay. Here's where the biology comes in. This is the slide that I'm most nervous about. So it's kind of shock and awe. And that shock and awe will disappear if you ask me questions about it. Okay. So the motivation for our own tools, I've told you about two tools now, diogenes and Tesserai, Tesserai relatively new. This is brand new. In fact, our first interface is not yet publicly available, although it is available by arrangement. If you want to play around with this tool, then just ask me and I'll give you the password and you're welcome to use the website and play around with it. It's so new that we face the typical problems that, you know, people who create new tools do. The server crashes, you know, I have to phone system admins and say, please help. You know, someone's trying to access the tool and they've been extremely helpful in addressing any problems. Our method is based on something called sequence alignment. And sequence alignment is a technique used in biology and computational biology and system biology for highlighting reasons of conservation and dissimilarity in a protein. And that's extremely helpful for understanding what the functional structural or evolutionary role of a protein is by looking at it's the presence or absence of certain amino acid sequences in various organisms. And this is just a random figure from a paper by someone in Joseph's group, my co-director's group at Harvard. But you can see any number of sequence alignment figures like this in science journals. BLAST, the basic local alignment search tool, is an algorithm for more efficient aligning of sequences and it's really standard in bioinformatics. And the paper presenting it came out in 1990. And you can go and have a look at it. I mean, I can't really make head or tail of the detail. But if you look at it, just visually, if you look at the tool, you'll see the intuitive connection to the kind of thing we're doing. Now sequence alignment has been used for language. It has been used in natural language processing for text reuse and plagiarism detection. There's a really great paper by Olson. Carl, I don't know whether you know this, because it's actually based on a French corpus. And it looks for sequences, obviously, sequences of words that match from one text to another. I mean, one could say, well, you know, our motive, our inspiration comes from the natural language processing origin. But I see that's just not the case because I just happen to work with a biologist. So I actually does, in fact, come from the original biological instantiation of the technique. So here's the core difference about what we're doing. We're not looking at sequence alignment of words. We're looking at sequence alignment of characters. And as far as we're aware, that hasn't been done before. And for calculating the difference between two strings, remember, we're looking for inexact matches, right? Exact matches are easy to find. What we're interested in is sequences of characters, i.e. words that differ in some respect or other, but are not radically different, right? They're similar enough that they may be of literary significance. So edit distance is a measure of character by character similarity. Here's a really simple example just drawn from the Wikipedia article on edit distance, I think. And it shows you how we score a change from one word to another word here with three additions, substitutions or deletions. In this case, one substitution, two substitutions, and then a third addition. So this is our, this is our, our interface. And as you can see, it's very rudimentary. You know, the final version will not look like this, but this is very functional. And it's quite clean, which is helpful for the people testing it, including me. So the way this works is by you inputting a query. So in that respect, it's closer to diogenes than it is to tessari. Although again, I'll come back to that in the end. It's not the last word on it, but you input a phrase of interest, you specify the maximum edit distance that you're willing to tolerate. Now intuitively, if you specify a very large edit distance for a very short phrase, you're going to get all the results. You're just going to get the entirety of the text. If you specify a very small edit distance for a very long sequence of words or characters, you will get zero results, except for the identical phrase itself. So that's just an intuitive way of thinking about how one uses edit distance. We've recently added some new functionality. So you can align either the whole phrase continuously starting from the first letter and looking for all matches for every single letter in a line, or you can align each word separately in the query. So the program will look in a phrase like Dexter or Mihi for every match starting with Dexter, every match starting with Mihi, and including any gaps between the words. So that's the number of words and range function. You can and that allows you to align your search query with non-adjacent words. And remember, for some languages, especially a language like Latin, there's huge flexibility in word order. So it's actually, it's not going to be useful if you were restricted to just looking for matches of phrases that are just considered as single sequences. Latin poets all the time will move phrases around so that a word is at the beginning of the line and its counterpart is at the end of the line. So you do need that option, especially for studying a language like Latin. And then there are various options for corpus selection here that I mean, I won't go into the details, but I just wanted to acknowledge that Tesserai helped us out a lot with just making the text accessible more rapidly. We had a parser for for making a lot of the free texts usable. But theirs was simply better. They had they had some experience doing this from further back. And actually that's not an insignificant point. I mean, I was saying last night over dinner that one of the nice things about the digital humanities community, even though we're talking about two very much two projects that are very much based in classics rather than the digital humanities is that it has been, at least in my experience, a very welcoming community. And even though we are in some sense in competition, it has never felt that way. And we've been very, I hope we've helped each other a lot. And certainly I'm very grateful to Tesserai for the for the help they've given us. So some sample output from our tool, I put in the phrase Dexter Amihi, you can see here no wildcards. I just put in the phrase that I was interested in. I imposed a very low cut off just for the purposes of this exercise. I said, you know, give me any order. A range of two means I actually wasn't in this case looking for the individual words separated with any gaps in between any intervening words. I was actually just looking for the words in either order. And we can see that one of the advantages, I mean here we can see the capturing of the two relevant intertexts from in ear 10 up at the top and the by nine down at the bottom. But we can immediately see that one of the advantages of edit distance is because it is blind to diction, semantics, etc. All it's doing is calculating character by character similarity. It actually is good at including spelling variations like Dextra and Dextra, because when you, you know, you have a difference like that, all you're doing is adding a cost of an edit distance of one, which is relatively insignificant. It's not non significant, but it's relatively insignificant. So it's a nice way of capturing those small variations, which could be spelling variations, they could be orthographic variations. It's a nice way of doing that. But, and this is probably most interesting, if you look, and I know it's going to appear small to the people at the back. But if you look under Ovid metamorphoses, the entry at 13361, you can see the phrase Tibi Dextra. Tibi here means to you or for you or you're in this case. Well, that's in no way like Mihi, me. I have a pronoun, sure. But, right, it's a different word. The way that sequence alignment can capture this is purely through the similarity in character between Tibi and Mihi, right, the substitution of T and M, right, and H and B. And so you're able to catch phrases that are related to your search query, but are crucially different in a way that would be challenging for diogenes and for tesserae. OK, and again, I say that not at all as a criticism. It's just that you want to use these tools in a complementary way to capture the broadest range of intertext. So I want to give you two more examples. So I work a lot on Flavian Epic. That was one of the core. This is the epic poetry written in the late 1st century AD. And one of the reasons I work on that poetry, or one of the advantages, sorry, that I have in working on that poetry is that there's been a lot of interest in it over the last 20 years. And that's fantastic because it means there are a lot of scholarly resources to work with. And they're recent. And so they employ the full range of intertextual techniques, not the search techniques, but the interpretive techniques that have been built over the last 20 years. And also, and this is key, those texts of the late 1st century AD have models that are extant. We actually can read Virgil and Ovid, who are the models for these late 1st century AD texts. The problem with doing intertextual search for Virgil and Ovid is we don't have their predecessors. We have fragments of Enneas. We have very, very few fragments of other Republican epic poets. So it's pretty hard to do any kind of intertextual search. You have no corpus to work with. If what you're looking for is prior reference and illusion. The great thing about working with late 1st century AD texts is you do have their principal models. So especially if you're testing a method, you actually have scholarship that tells you what kinds of references and illusions there are and you can measure your results against that scholarship. You just can't do that if you don't have an extant corpus to work with. So it's a huge advantage in working with late 1st century AD epic. So I put in a phrase from this poem. It's a Silicitellicus as Punica, an epic on the Second Punic War, the Hanabalic War. I put in this phrase and it's the same target corpus. The 15 works Five Ethics, 10 Tragedies. And we can see here this phrase in Virgil's Enneas, which is a really famous phrase in Virgil's Enneas where the goddess Juno says, if I can't turn the gods, then I will move hell. And this occurs in the middle of the Ennead. And the phrase is Akeronta Movebo. The phrase that I put in is Akeronta Videbo. I will see Akeron or I will see hell. Well, I will see and I will move have no kind of linguistic relationship to each other, except in their morphology, the first person singular future active indicative. That's the only thing that's similar about them. OK, so again, most computational search techniques are not going to be able to give you that parallel. And it's a clear parallel for reasons that I won't go into. But this technique will give that to you because the edit distance cost of getting from Videbo to Movebo is very small. It's only three. All you need to do is change those first three letters. And you're starting to get a sense of the compositional technique of a poet here, right? So the poet has a phrase in mind. There's a desire to alter the phrase, right, to generate new meaning. You alter the phrase in some way that where it's recognizably elusive to a prior model, right? Enough so the reader gets some satisfaction from it, from identifying it, gets some meaning from the relationship between these two texts. But nevertheless, there is this there is this key difference there. And sequence alignment provides you a way of detecting these these intertexts. Now, you could say, well, you could you could find this this intertext just by looking up the word acron, because acron can't be that can't be that in frequent. And that's true. Sorry, can't be that frequent. And that's true, right? It appears 27 times in this target corpus. You could look through 27 examples and you would find your parallel. Okay, so let me give you a slightly different example. So yes, so more VEVO and VIDEVO hourly similar, but semantically different. I couldn't give you the search parameters for this, but I looked for a phrase commune nefas, a collective evil. These texts that really dwell on collective evil, I mean, and hell. But anyway, so I, which is a really important programmatic phrase at the beginning of Lucan's poem on the Civil War written just before Silas Italicus and Statius. So this is in the middle of the first century AD. Again, hugely intertextual with Virgil. And I wanted to show you two interesting comparanda. So the phrase commune nefas that I looked for occurs, as you can see just under Lucan Bellum, Kibla, commune nefas one six. And the two phrases, the two comparanda that I'm most interested in are one, the the direct identical phrase in Seneca's Thiestes, that's a tragedy by Lucan's uncle. So he's clearly getting this language from his uncle's writing. But here's another interesting comparandum. Virgil's anird in a passage located in the underworld in book six, famous passage of the anird im man nefas, which refers to the worst possible crimes committed by the paradigmatic sinners in the underworld, so Titios and Ixian figures like tantalus figures like that. And what's interesting about commune nefas and imane nefas is precisely again that they are hourly similar, commune and imane, but in fact they bear no etymological relationship at all. I think there's literary significance there. Lucan's very interested in putting hell very much on earth in the Civil War fought between Caesar and Pompey. And Lucan is thoroughly intertextual with Virgil systematically over the course of his poem. But again, you're not going to find that intertext through conventional means. Sequence alignment allows you to detect it at a relatively low edit distance, showing you the Arles similarity. Now, nefas, unlike Acheron, occurs very, very frequently in these works because they're so concerned with wrongdoing and evil. And so you're going to have to go through 256 instances if you want to find these parallels. So now we're starting to see the benefit of sequence alignment as an efficiency, as an efficient method of detecting intertextuality. We, like Tesserai, have ongoing work to systematically validate the method. We're using another Flavian poet, another first century AD poet called Valerius Flaccus because there are three recent commentaries, two from the late 2000s, one from mid to late 2000s, one from the 90s or early 2000s, which gives us a lot of conventional scholarship against which to measure the success rate of sequence alignment at recovering intertext. There again are 15 texts in the comparison corpus. You don't need to look at the details on this chart since it's going to appear quite small. The key thing is this is a snapshot of the kind of validation data that we have. You have the Valerius Flaccus reference, the intertext reference, the commentary in which the intertext is noted and then a note on whether the intertext has been recovered or not. We have been very impressed with the success rate so far. We've managed to recover 385 out of 451 parallels for about 400 lines of the text, which is about half the book. It's not a faultless method. In some cases, the number of results in which the key result appears is reasonably large. So this is obviously going to need to be refined over time, but it's extremely promising as an automated method of detecting intertextual parallels. But it isn't just about detecting intertextual parallels in text that we know to be intertextually connected. These tools have to be helpful with new applications and well as well. And this is in some ways the thing that I'm most excited about and also the area in which I'm most ignorant. You'll probably have some sense that classes work on classic poetry or prose. We work on the period of classical antiquity and broadly conceived. But there are good arguments that we should be working on Latin and Greek wherever they're found, including in early modernity. But actually, Neolatin is relatively, relatively poorly studied. I mean, there are obviously great specialists in Neolatin, but as a proportion of the greater community that works on either early modernity or Latin, it's actually a very, very small number. One of the great things we can do is aid people who want to work on Neolatin text, right, but don't have the time or the energy to read thoroughly in Neolatin, apart from anything, the employability level if you work on Neolatin is not exactly great. So what we want to do is provide an option for uploading. And this is all very temporary. I mean, this is, you know, brand new, is providing an option for uploading a text file that has a Neolatin text. Actually, Neolatin has a lot of online text available. It's reasonably well supported in that. And so you can search for intertext using this target file that you've uploaded. So I took an example of a 15th century play written by this guy who's very young at the time, he's 18. He then has this religious conversion where he says, I'm never going to write any Latin poetry again, because he doesn't explicitly say it, but it's very clear that he has a turning away from it as being somewhat corrupting. So when he's very young, he's a great Latinist and he composes the Seneca style tragedy. So I searched for this phrase in a passage that I know is intertextual with Korea's progny. And we can see so I searched for two word phrase Umbrarum Arbiter. You can see there and then a three word phrase here. And we can see this result Furiarum Agmina or Dira Furiarum Agmina. And I put it in bold here right at the bottom. What's interesting about this comprandom is that none of those words are the same. Not a single one of those words is the same. But hear the sound of it. Dura Sumbrarum Arbiter. Dira Furiarum Agmina. The rhythm is the same. Some of the key sounds are the same. D, R, Arum, R, right? If that wasn't enough to convince you, look at the rest of the language, right? Adite, Adi, right? Poinas appears there and is also a synonym for Suplikia, right? Sequid, Sequid. So these passages are, I mean, incontrovertibly intertextual, right? Korea is using Seneca very, very explicitly here. But sequence alignment allows you to see the close similarity in sound, right? And sound is crucial to poetry. If we can impart that to students by using this method and showing how poets compose using sound just as they do meaning, right? That's a huge benefit. So just a couple more slides and then I'll turn it over to questions. So the strengths and weaknesses of sequence alignment. It's really good at finding parallel phrases where one word is in common, but the other is or are merely similar in sound. All I've given you a blockbuster case where none of the words is similar. You can have a lot of fine tuning and controlling the cutoff, i.e. the maximum meta distance that you want to accommodate. The disadvantages, interesting results can easily be lost amongst copious, largely uninteresting output, which is actually worse than Tesserai. The nice thing about Tesserai is even results that aren't hugely interesting often are somewhat interesting, whereas in a lot of cases with sequence alignment, you're getting garbage. Processing time is slow for large corpora. This is a real problem. It's actually, it's apparently and I again, please don't ask me about this, but there was a paper published this last this past summer that showed provably that sequence alignment for extremely large datasets was impossible. If you wanted global comparison of sequence alignment, if you're searching for an individual phrase, that's fine. If you're searching a small passage against a large text, that's fine. If you wanted a global comparison of the form that Tesserai does, you couldn't use sequence alignment unless you were to apply some kind of efficiency heuristics. And we're exploring that possibility, but there are some there are some computational obstacles there. So future development, we're going to try parallelized search. We just need to find, as I say, some heuristics. We would like to have even finer tuning and we would like to incorporate some significance waiting. And this is hopefully going to be interesting to a number of people. We would like to extend it to other languages beginning with ancient Greek, obviously, since we're familiar with that. But we think that this could be useful for many other languages as well. I want to cite the contributions or at least describe the contributions of undergraduate and other research assistants because I think that's really crucial for understanding how these projects develop and what the possibilities are for the future, for pedagogy as well. So this lists some of the things that undergraduates and high school students have done. In particular, I want to acknowledge the contribution of someone in this room, Adriana. So Adriana Casares is a former UT alum. Actually, I poached her from my wife, who teaches in the Classics Department here. And Adriana is now a high school Latin teacher in Austin and has been hugely instrumental in being and enabling us to do the work that we do. And this kind of partnership with teachers, with high school students and with undergraduates is crucial to the success of this kind of enterprise. There's a lot of desire out there and there are a lot of skills out there that can be leveraged. It takes time and it takes energy to coordinate all of this, but if you're able to do it, there's a lot of benefits to be seen there. Very briefly, I want to talk about computation and intertextuality in the classroom. So in addition to independent research and collaboration on research projects, I did teach a class in the winter called Virgil in the Digital Age, which was an advanced Latin course, where we used these three search tools to read passages of Latin and students were asked what is and what is not an interesting intertext and why. There's a crucial difference in doing this on the basis of commentaries, traditional research just as I would, and to do it using computational techniques. It really puts the onus on the student to process a large amount of data and really think through the pertinent literary critical questions, the philological questions. Are there frustrations? Absolutely. There are a lot of technical difficulties. Some students are intimidated by the use of some of the tools. But you can overcome them. I certainly think the benefits outweigh the disadvantages. And I'm sure part of that was also down to my relative inexperience in teaching a course for the first time that incorporates technical matter, especially when I have no particular technical background. I do think it was worth nevertheless. What are the pros? Enhanced reading? Naturally incorporating criticism and research just in the process of reading a foreign language. Final thoughts. I think classics is an unusually privileged position because of the amount of work that's been done in computation and digitization because our corpora, relatively speaking, are quite small compared to English. And so it allows us a good deal of expertise in a narrow area, which obviously with English it's a little bit challenging to know an entire field with classics. I think that you know an entire field, but you know a good proportion of it. Sharing an extension of resources with and to other languages I think is going to be crucial for the development not only of our project, but of the field as a whole. I am very pleased by the collaboration between me and my co-director, Joseph, because it's a substance of collaboration between humanists and scientists, not only to answer historical or material questions, this is crucial, but to support criticism, which I see as fundamental to what the humanities is. I think there are opportunities that I've kind of alluded to for the use of these tools for digital pedagogy in the classroom and for showcasing novel opportunities for collaborative research that are attractive to people who want to try things that they would normally do in a lab maybe, or they would do in bioinformatics, but they could also now do in a humanistic environment. So I think there's a lot for us to talk about here over the course of today. These are some of the people that I work with, some of the officers that have supported us. I'm happy to tell you about the origin of my collaboration with Joseph at some later point, but for now, thank you very much.