 Go ahead and get started. I know people will file in, that's fine, but you should probably get started. I'm Saiz Chather from the libraries at Johns Hopkins. I'm joined by my colleague, Yap Yeratz. I hope I got that right. Absolutely. I put the Center for Editing Lives and Letters at the University of College of London. This presentation is actually part of a series that we've been putting forward at CNI. And there's a few things that we're trying to explore. There are some very interesting scholarly questions that you'll hear from Yap. But it's trying to also integrate content and services from lots of other types of activities we've had in the Shared Libraries around digital humanities. It's an infrastructure building effort, an infrastructure with a small eye. But we do think it's a model, perhaps, that could be used by or adopted by other institutions. And also an approach for working with the scholars. The first time Yap and I spoke together, we emphasized the point of shared understanding rather than use cases, which I think is a translation of understanding. I'm very impressed at the way Yap and others have just basically dived into the technology. They've done a much better job than I have in terms of learning the scholarship. But the software developers and I are trying to understand the scholarly aspects of the project as well. So it's not two groups working independently, coming together very often. It really is this very close binding intersection. We've had presentations that are deep dives in technologies, but this is of the category of the scholars explaining what they're trying to do and how it's influenced the approach that we've taken and the approach we wish to take. So with that, I'll turn it over to Yap. Excellent. Thanks, Eid. So just very briefly, I mean, the archaeology of reading our project is a four-year project consisting of two phases, both of which are funded by the Mellon Foundation and includes us at the Center for Ethnic Lives and Letters, the Digital Research and Creation Center at Johns Hopkins, and Princeton University Libraries. And the main scholarly goal is to understand the history of reading, historical reading practices and strategies. And we do that through the study of two distinct and known annotators, two Englishmen, both educated at Cambridge. They lived in the 16th century, Gabriel Harvey and John D. And what I'll now explain are basically two of the newer directions in which the project is moving. Now we have started with the second phase of the project. The first is the movements or the pathways early modern readers took to their books. And the second is the fact that we encounter what we now would call structured data in the annotations these readers made in their books. To start with the pathways, so early modern readers didn't necessarily read their books from cover to cover. Rather, they effectively mined their books for the information they were seeking based on their interest and based on their pragmatic goals and aims. So we see readers marking up books, underlining text, commenting on the printed text, indexing books, really a form of early modern information management, an aspect of a larger information culture these readers were participating in. So they did not only read books in some sort of a vacuum, they read books, red books in conjunction with one another, based again on their interests. So what we see emerging are clusters of books, which were, as it were, in lively conversation with one another, showing a larger intellectual cosmos of early modern readers and their books. So what they did, based on interest, they moved from one book to another book. And these kind of movements, they noted down in their annotations. So through their annotations, we are basically able to recapture the pathways these early modern readers took to their books and indeed through their libraries. So what I'm going to do now, I'm going to show you various sorts of links of movements these readers made. This, for example, is a printed geniture, as it's called, a horoscope, in a book by the scholar Firmicus Maternus, Astronomicon. And you see in the left margin John D, you see his initials, wrote a marginal note saying Cardano. Cardano was a 16th century famous Italian astronomer. Cardano corrected this error with Firmicus' honor unharmed. And then he refers to Folio 181. Luckily, we also have John D's copies of Cardano's book, Libeli Quinque. And indeed, we see a similar geniture, a similar horoscope appearing on page 180. D actually made a mistake. He should have been referring to page 180. He didn't do so. But according to D, this was the horoscope with the correct data. A similar link is made by the other reader, on which we focus in the first phase of the project, Gabriel Harvey. Harvey was interested in warfare. And according to him, every war could or should be won by an ambush. So here he is reading Livy's history of Rome. And the printed text talks about the fact that Gauls, the French, they blocked the road and hence they are able to completely eliminate a couple of Roman legions. Harvey is interested in his passage. He underscores some printed text. And he also refers to a book written by the Roman senator Fontainus, who wrote a book about strategy, stratagems. And he refers specifically to a chapter in this book which deals with ambushes or ambushments, as Harvey calls it. Again, because of Harvey's particular interest, we see a link appearing between two books. Links, which we see a bit more often, are links within a book from page to page. Here you can see John Dee annotating a book. Again, interested in a particular topic. And he says, see below folio 276. And when you go to that folio, he refers back to the earlier folio onto which he wrote that marginal note, see above folio 42. And this happens all the time throughout all of the 34 books in our corpus. Then probably more interesting even is when our readers start to connect individual marginal notes. This is quite a lengthy marginal note, again made by Gabriel Harvey in his history of, Livus' history of Rome. And here Harvey talks about a debate which was raging in the second half of the 16th century, namely about the question whether it was legitimate to get rid of a tyrant. There were a couple of Protestant authors who suggested it was the case. But Harvey also cites a number of authors who argued this shouldn't happen. And then in the last line of this marginal note, you see that he notes, except that there's also great disagreement between imperial jurists and pontifical ones about which else were in more detail. Now interestingly, in the same book, Harvey again addresses this same topic, mentions a couple of the same authors, a couple of these Protestant authors, and says that the stance of these people is flat opposite to imperial civil law of the prudent, valorous, unreputed, just Romans. So this to signify various sorts of links between books, between pages, between individual notes that we encountered in our corpus of annotated books. Both readers were not only interested in links, but they were also very careful to capture the relationships between the component parts of the data that they were recording in their annotations. Often they worked with dense information and in order to make sense of the information and indeed in order to visualize it, they worked with structured data. One example of structured data are these Ramus trees. They are named after the French Protestant, Petrus Ramus, who advocated a logical division, a classification of knowledge. So on the left, you see a wonderful Ramus tree, made by Gabriel Harvey in one of his companions to civil law. So you see the Ramus tree on top and then below interestingly, he has made several notes which qualifies, which provides extra information about several components within this Ramus tree. On the right, you have a Ramus tree drawn by John Dee, which is a bit more simple, which just starts at the top and then can be read from top to bottom. And this is literally everything which is in a demonstration. So we have the principal parts, which can be divided in the common parts and the particular parts divided into complex, in complex parts and so forth, a classification of knowledge. Another example of structured data and these are really quite interesting are genealogical trees. Especially John Dee was interested in medieval history and these medieval works often dealt with noble and royal families. Now these genealogies of these families could be incredibly complex. Often these families had difficult, complex marriage strategies. They intermarried. They often gave the same name to their children. Sometimes genealogies or part of genealogies were invented for particular political aims. So conflicting genealogies existed. So here we see John Dee really working out a genealogy of a family. What is happening? What does the printer text say? Is it correct? And can I figure out how these family relationship works? The one on top is fairly straightforward. The one on the bottom is a different matter. It's a huge genealogical tree spread across two pages with some accompanying notes and as you can see also with some pen lines the connected several family members to one another. And this is also for us really a challenge how in heaven's name do we capture this in XML? And how can we reconstruct these genealogical trees and make them more accessible for our users? Yet another example is a manuscript a drawn horoscope or geniture. This particular layout had different ways of laying out these horoscopes. This was probably the most common one in the 16th and 17th centuries. So you have the division of the zodiac into 12 houses. You have the zodiac signs with their position and then the position of the celestial body such as the planets. Now one way of going about this is to capture this data in a tabular form as we have done. But like Mark Patton, a program at TSCC has suggested we should also think about capturing this kind of historical astronomical data in such a way that it can talk to other astronomical data sets which are out there. So the idea is not that we create some sort of a black box, a tool which only is inward looking and focused on its own data but also links out ideally to other data sets. Then lastly, before I hand over to Syed, so we now have been talking about early modern readers and their movements and their pathways but something interesting happens as well when we now as scholars start to make use of our own tool. What we've been saying what really happens is doing research on annotated books is going down the rabbit hole. You start somewhere, you start with a search, you find something interesting, you modify a search, you jump from one book to the other book, you look at one type of marginal annotation to another type of marginal annotation and at the end of the day you think, okay, I got in but how in heaven's name do I get out? How can I make sure that I can somehow retrieve the movements I've made so that at some point in time I can go back to it and can continue it? It's a bit similar as to how these early modern readers operated and this is I think a really interesting aspect that will be addressed by Syed in his part. Thanks, Jaap. So the person who in many ways is driving a lot of the work we're doing now, there's two people. There's Lisa Jardine from UCL who unfortunately passed away during the first phase of the project but Tony Grafton at Princeton, a professor of history there is really an inspiration for this project and he gave a talk to our library's advisory council not that long ago and he described the project with this quote, it's as if we arrived at the ancient ruins and saw Ivy and we chose to study the Ivy. Now the annotations on these books are an incredibly rich resource under themselves and as you've heard from Jaap, there's a tremendous amount of interesting scholarly exploration and technology exploration that we can do just with the Ivies but we are interested in the so-called ruins as well and we're interested in the content of these books themselves, their connections to other types of books, to other types of data and that's a key aspect of what we're trying to do over the course of these projects and this program. This is a report that was written at this point over a decade ago or about a decade ago and it was actually an inspiration for a lot of the data work that we're doing at Hopkins and I still think it's a very important read. I actually looked through it not that long ago but it's still very timely, very useful information and a few key things to take away from this report that I took away from this report. The main point being any attempt to build infrastructure that is rigid but has a very clear sense that this is how the systems come together and work and we understand the requirements is almost doomed to fail and will almost fail catastrophically because what tends to happen over time is you take this beautiful thing that you have initially and you keep extending it and you keep tuning it and you keep accreting onto it and so on and eventually what happens is the requirements change, the data changes, the technology changes and it just breaks. So a much better approach really is to think of something that's more flexible in the sense of almost scaffolding. So you do something that's more of a principle of navigation rather than a specific kind of roadmap. So what I'm really hoping to talk about in many ways are these principles of navigation. Now one big difference between what's discussed in that report and I think what we've discovered with our work through these projects like archeology reading is that that report talks about physical infrastructure or financial infrastructure and things of that nature. A big difference with what we're doing is the role of interpretation. So if you think about railroads or highways or roads, there's no interpretation going on. Those wheels aren't being interpreted by the highway or the train or so on. Financial systems coming together, you don't want interpretation. If I'm moving my money around, I don't want the infrastructure to make inferences about what I'm trying to do. I want it to do what I tell it to do. But that's not true when it comes to this kind of digital scholarship or digital humanities. The interpretation is critical and in fact we have to celebrate that interpretation. So a key question for us is how do we continue to foster that kind of interpretation which is actually tacit knowledge, trapped in people's heads if you will, yet build the infrastructure that can support it and embrace it in a sustainable, robust manner. So there's three key aspects that we've been looking at. And I think in some ways we tend to start with what does it look like? What are the metadata around what we're trying to describe and so on? And I think we're proposing and we've adopted an approach that I think adds a few nuances to that. If you think about a stack and you basically say go write a schema that takes all that interpretive information you want. How long is the schema right now? Oh, hundreds and hundreds of lines. Hundreds and hundreds of lines. That's true of any digital humanities project, any rich digital humanities project. But rather than saying that's too much or that's too complicated or how is that going to map to this other project? We've decided to say go for it. Go to town. Make your schema as rich and as complicated as you want. The harmonization is going to happen at the data model level. So rather than trying to say this element and this schema maps to that or let's try and make some sort of meta schema or ontology or whatever you used to call it, we're in essence saying let's map it to a data model. And in our case, let's develop the schema with a data model in mind so that we can actually go from the schema to the data model easily. And triple IF, the International Image Interoperability Framework. I sometimes mess up those lines, but I think that's right. If you want to think about it, in essence is the data model for the annotations? It's a very good data model in terms of if you're trying to look at the IV, if you will, that that's the data model we're using in that case. Now, I think we can make this demonstration work. I hope the internet connectivity is good. I have to drag this onto the screen. Sorry, it looks small there. What you see here is one of the first projects we've worked on called the Ramon de la Rose. It has a medieval French manuscript, there are 300 exit copies in the world. We have about 135 available through this digital library now. It's organized in various ways, including the locations where the manuscripts exist. So if you look at something like this, it probably looks like the kind of interface that doesn't have a doughy flash player installed. But if it did, you could imagine some beautiful folios and you'd be able to fill through the folios and so on. But this is actually instructive. If this looks like something from 15 years ago, it is. It's something that we did 15 years ago. We are in the process of refreshing it and upgrading and bringing it into a much more modern context. The archaeology of readings, the project that Yap has just described to you. And for that, we're using a particular viewer called Mirador, it's a community-based triple IF compliant viewer. There are others, but this is the one we've chosen to work with. And we've made some extensions because fundamentally triple IF is about the presentation of images. The series of canvases that have some sort of information attached to them and then you present those images. So the key here was for us to be able to build the capability to support the text around the annotations and the text around the symbols and so on. So this is a leap forward from what you just didn't see with the Remondville Rose. And now what we're also proposing to do is move this into a different kind of context where you'll be able to bring together collections from all of the archaeology of reading, the Remondville Rose. We also have manuscripts from Christine de Pizan's digital scriptorium and basically have it in one framework. And the key here is not to have different interfaces to try and shoehorn the interface or map the schema together, but in essence, treat the data consistently according to these data models and then use a common kind of interface. So what you're seeing here in this demo is we're now able to take subsets of these content, put them all into the same framework, basically the same archive we're using. They get converted to triple IF automatically and then you can search across all of them. So I did a search for something called engineer in this particular subset of the data because we know that term exists in all three of those collections. So not only are we not user interface experts, as you could see from Rose, I am not a scholar, so I don't know of a term other than engineer, that's the one we found, that goes across all three of those collections. But the point is this is how you can search across all those collections and you can search by the schema of the individual collections. You don't have to compromise that level of interoperability because you can now basically say at the data model level I can express the schema of Rose, the schema of Christine de Pizon and the schema of archeology of reading. And what else we're doing is we're making these linked data connections so you can't see it, but this is the word Milan and I just scrolled over that word and this is now linking to a service called Pleiades that the Perseus Digital Library is using. It's about place names and people's names and identifiers for them and so on. And we know that Milan is on that page. I don't know where, but I'll tell you. And we now are able to go basically say here's a link to something in Perseus that's relevant in terms of Milan. And down here on the bottom is a name of someone, Polyoneus I think, and this is reaching into a text-based service that Perseus uses called the canonical text services. And in essence goes and says, okay, wherever you have that name, can we please find references to it? And this is a dictionary entry so it basically tells you who this person is. If you were at the opening keynote, I think this is the kind of scaffolding that would help people, particularly undergrads in terms of understanding the context of what they're doing and what they're trying to do without having to jump back and forth between all these different fragmented kind of silos. We want to integrate all of this into the same kind of framework. So when we drag this back, go back to the slides. So that was just the demonstration. I'm going to scroll through a bunch of slides just in case the internet connection was not robust. So the other data model that we are now looking and exploring with this particular project is something from the RMAP project. This is a project that was funded by the Sloan Foundation, where we worked with IEEE and Portico to develop a link data service that basically brings together scholarly objects like articles, data sets, software products, authors, collaborators, and so on. So in essence, creating link data graphs around the scholarly network, the publication network that exists today. We've successfully used it with IEEE, an entire article database. So 4.1 million articles, we ran RMAP against that, generated these link data graphs of their entire scholarly network based on DOIs. It is identify agnostic, so you don't have to use DOIs or ARCs, what are the case may be. You just need to be able to map what you're trying to describe to the RMAP data model. And that's what we did with IEEE. We also did that with the SHARE project, all the shared content that's being housed or managed through the open science framework. We went to their hackathon, looked at the schema that was being used for SHARE with an OSF. We sort of inferred that that's the schema you have and mapped it to the RMAP data model and said, does this look right? Does this look okay? Is that a perfect kind of transformation? No, but it worked. And you're able to now look at the SHARE network as a series of link data graphs. It is based on the ORE protocol. Herbert is here in the room, so thank you for that. We did take the ORE protocol and flatten it in some sense. Simplified it, if you will. You lose some of the richness by doing that, but we think it lowers the barrier to entry. And it is in fact possible to go back to using full-fledged ORE if that is your purpose or your intent. So just as an example, very simple one. This actually comes out of the SHARE project. This is the kind of thing you see with RMAP once you generate these and they can obviously get more complex if you have more kinds of content, more kinds of information. But these are the kinds of pathways that Yop is describing. So if you think about triple IEF and the IV, the annotations, that's great. That's one expression of what we're trying to do here. RMAP would be the data model to think about all those connections between the books, all the connections between the references to outside materials, the astronomy data within the tables that you saw to modern astronomy tables. And if you think that kind of thing, I mean, those doesn't matter. I've been working for many years with someone named Alex Soleil at Johns Hopkins in the Physics and Astronomy Department. He's been instrumental, quite frankly, in a lot of the data-intensive work that we do. And he told me not that long ago he's been lobbying Harvard to digitize a bunch of glass plates from 1910, in the early 1900s. I said, why do you want access to the glass plates in the 1900s, early 1900s? He said, they have observations of the sky. They, you know, we can't replicate, right? The resolution would be terrible compared to what they have today. It's a huge problem trying to map them onto each other, but they're critical. Many sciences do care about time. Astronomy being one of them. So wouldn't surprise me at all if you saw those kinds of tables and modern astronomers said that I care about them. One other anecdote is the Ramon de La Rose in the past, when we first started the first institutions that contributed manuscripts, actually asked us to have a password on the site. So people would email us and ask for the password. An ornithologist at Cornell asked me for the password once. And I said, sure, why are you looking at Ramon de La Rose? He said, there are bird observations in there. Observation, I mean, I have a science background too. He said, look, of course, it's not like a modern observation, but I can't go back to the 1400s to 1500s and look at those birds. Some of those birds may not exist today. So there's a potentially long thread we can draw from the humanities to the sciences. Now, one other thing I want to point out is a lot of, I'm not an expert in digital humanities, but a lot of the digital humanities that I see end up borrowing methods from the sciences. That's perfectly fine. If you can take the methods and repurpose them and reuse them, that's great. Know that there is interpretation in those methods too. Scientists do interpret. They may not be open about it. It may not be wise to talk about that right now, but there is a lot of interpretation that takes place in the science. It gets formalized in their methods. So when you apply those methods in the humanities, you are taking their interpretation, whether you know it or not. Annotations are one place where I think that arrow can be reversed because a lot of the kind of text mining, data mining, visualizations all that scientists do, arguably has more complex use cases than the humanities do. That's not true with annotations. Scientists do care about annotations. They mark up their images in the field. I showed a geologist at Hopkins, some of the content from AOR. I said, have you ever taken annotations like this? His response was, was this person mentally ill? Why would you write things at an angle? Why would you refer to things that are not there? Why would you have shopping lists in your scientific documents? But the richness of the annotations that humanists study is far more complex than what scientists have been doing. So whatever methods and interpretation we can bring in from this, important to the sciences, you may finally turn the arrow in the other way. I think that's an important point to keep in mind. So the final point that I just want to make is to reaffirm the sense of the infrastructure components. So yes, the XML schemas are important. And yes, we definitely want you to interpret and celebrate and make them as rich as possible. The work that we've been doing is very much iterative, it's interactive, it's collaborative. So the schema and the data model are working hand in hand. That's what I'd suggest. But with something like SHARE or with something like our own ROSE project, we had a schema in place. We never thought about the data model when we did that work. You can still map it to the data model. It may not be a perfect mapping. It may not be a complete mapping. But it's still possible to do. And as I'd mentioned, the IEEE, sorry, the IEEF, triple IF, there we go. Data model is what we're using for the annotations and the RMAP data model is what we're proposing to use for not only the pathways between the content, but the scholarly pathways that Yop talked about as well. So imagine Tony Grafton, who's already using this resource going through the AARC, RKLG reading purpose. If he's okay with us basically saying, can we watch and track where you went through this corpus? You can keep it private for as long as you wish. You can then show it to your graduate seminar as a first pass. You can then show it to your undergrads as a second pass. And then you can share it with the world. Would it be useful to see where Tony Grafton goes through something like this? I think so, and where he goes into other places as well. And the last point I actually had on an earlier slide, when I told our software developers about what I was going to talk about, this is fairly typical of Hopkins. I say stuff in version 1.0 and they go, yeah, that's interesting, Said. And then we iterate over time and make it richer and hopefully more presentable. One of the software developers, Mark Patton, who Yop mentioned earlier said, don't forget to mention the HTTP APIs. That's how you express your data model. That's how you let people know what you can do with those data models. And that's a key part of the triple IF spec and the RMAC spec as well. So we did wanna leave plenty of time for questions and discussion. These are some acknowledgements that I think are important to place up in terms of our funders and our core collaborators. There are lots of other people working on this as well. If you have any questions, Yop and I'd be happy to try to answer them. Thank you.