 I work on the Siddham Epigraphic Database, which is at the moment basically a repository of Sanskrit inscriptions from India, but the plan is that it will in the future include non-Sanskrit inscriptions and some also from outside India, so broadly from the geographical region covered by our project, which is south-southeast and central Asia, possibly in the distant future who knows it might expand to other areas. Essentially most of the work I do is just digitizing pre-published inscriptions, so there's very little brand new research involved and for that reason perhaps my sort of data management might seem to be very simple. What do I do? I take a printed edition of an inscription, I encode it in an appropriate way, it becomes a digital edition which is then disseminated on the one hand through a dedicated website Siddham.uk, which is conceived as a dynamic thing, which is also a tool, a research tool, and it's also disseminated on Zenodo, which I mainly see as an archive, as a static archive for my work. So the backbone is the text. A text when converted into a digital edition looks like this, it uses the Epidoc format which has been mentioned yesterday and which I'm not going into, it's a subset of TEI XML specifically geared for working with inscriptions. Now of course things are not as simple as they look at first, so in case we wish it were that simple, no it is not. An inscription is of course not just a text, even though I'm basically a philologist, I must admit that an inscription is a lot more than a text and therefore it comes with a lot of baggage, a lot of metadata that we need to treat, that we need to work with. So the way this talk is going to be set up is first I'm going to try and be very brief but I have to explain what sort of data I work with and how they fit together. I've done previous talks on that and there are some people here who've heard a previous talk of mine in Vienna about that and I really try to summarise that bit very quickly and concentrate this time on how those data are managed right now, so within the project and within my own workflow what things look like. But I have to go into a little bit of an explanation of the data anyway first. So we've got metadata, they include data about the object bearing the inscription, such as how big it is, where it was found, what museum it is kept in. There are also physical data about the inscription, such as what size the inscribed area is, what its characters look like. We've got some internal metadata about the inscription, such as what date it was composed at, if it has a date, or if it can be estimated. And in addition we've also got bibliographic data where the inscription was first reported, who has edited it and where it was published, and major articles and books discussing the inscription or its text or some kind of implication of it. We're also working with images, although this has not been a high priority, we haven't been actively trying to amass high quality images, but I've been trying to at least keep track of the images I have worked with while re-editing texts, while re-editing the inscription texts. I had to look at facsimiles and photographs and I've been trying to archive those. And in addition to these, there are some, to my mind, very easily conceivable, very easily realizable editions that we can make to these. There have been some steps made toward adding translations, pre-published translations, but new translations could just as easily be created if somebody's willing to do the work and re-translate the inscriptions. I'll do some of them myself and anybody is welcome to do others. Commentaries can also be added to the inscriptions. There's a lot of gray data, there's a lot of half-baked commentary that I've been writing as notes for myself while re-editing the inscriptions. They are not ready for the public, but they are preserved and I'm hoping to be able to convert them to a sort of semi-readable, more than half-baked commentary at a later time, or we could add geolocation information. So how do the metadata and the text fit together? First of all, there is a way to include the metadata in the XML itself, and perhaps that should be sufficient. Epidoc has been designed with that in mind. A part of the file called the TEI header can include metadata, but the main problem I see with that is that it's not really object-oriented. It's very much text-oriented and in particular, there are cases, not many admittedly, but they're definitely there, where for example you've got an object which has several different inscriptions from different periods of history, or you might have an object that consists of multiple parts, either because it has been fragmented, broken into several parts, all or some of which are still extant, or because it was conceived as a composite object to begin with, for example, a set of copper plates, copper plate land grants. And to be able to represent these sort of complex relationships, several inscriptions on one object, one inscription on several objects, it's probably easier, at least from our side of view, to not just use Epidoc as the primer medium, but instead to conceive of our data in a way that there are several databases, basically an object database, an inscription database, and the bibliographic database, or bibliographic database table, plus bits of edition, edition snippets, XML snippets, that just contain the text edition, and the website, the Sydom website can work from a combination of these, when you visit sydom.uk, or when you visit sydom.uk a couple of weeks or months from now, when it works better than it does right now, it's coming, I'm told. So at that time, what you see in your browser is what the website engine has put together from a bit of an XML plus the databases, and eventually we hope it will also be able to export Epidoc so that the exported Epidoc contains all the metadata for that particular inscription, but that will be generated on the fly, I'll come back to that in, I don't know, 10 minutes or so. So anyway, very quickly, we might have an object database. So here is one inscribed object, the big pillar on the right, and there are metadata recorded for, can you, you don't actually need to be able to read the small print, but is it legible? Okay, I'm glad to hear. So that sort of data are recorded about an object. Similarly, we have an inscription here, that's what's written on the pillar you've just seen. And we've got metadata recorded for it. So that's inscription metadata. And then we go on to the bibliography, which is meant to be recorded as a database record, but in a way that is compliant with a structured TI bibliography. So that again is something we don't have yet. But what I hope we will be able to achieve is that the website will be able to actually export a full bibliography, perhaps even a partial bibliography in TI format. You know, TI bibliography lets you use analytic and monographic levels with different bits of information recorded for each. And so what we have right now is we have references to the bibliography in various parts of the database. There is one, for example, to what I classify as a book, although it's in a bit of a shady area. And there's a corresponding entry in the bibliography table at the monographic level. There's another bibliographic reference over here, which is to a journal article. There's a bibliography table entry for that. And even within that bibliographic entry, there's a reference to the journal, which will again be something on the monographic level from a TI bibliography perspective. And there's a separate entry for that. Similarly, there are analytic entries for book chapters and monographic entries for books, which I'm not showing here. And of course, there are the images, which on the one hand need archiving and perhaps some retouching enhancement. And on the other hand, I need to record basic metadata about the images. So for example, for the pillar and inscription that you have seen, I've got a couple of images, basic metadata have been recorded for it. And we can show all of those images on sedum.uk. So, yes, basically to show you in an image what I have set so far, we've got an object database, we've got an inscription database, each involves each each incorporates metadata about the object side and the inscription side. The inscription database sort of draws in the XML snippet, which contains the addition. And potentially it could draw in also a similar XML snippet with the translation and with the commentary. Objects can consist of multiple parts. So there may be component objects, but there's always a sort of hypothetical or virtual superordinate object and it's only the superordinate object that is linked to an inscription. So if an object consists of parts, then it's always the master object, the parent object that is linked to an inscription. I don't want to go into any more detail than that, except to quickly mention that in this way, more complex relationships can also be represented in a fairly simple way, which does not, of course, map to all the intricacies of reality, but I believe this level of complexity is sort of a good midline for our purposes. So to go back to my previous outline of data management, that was the simple view. And well, that is what actually happens. It's a bit tangled and I hope it can be simplified a little. But let's look at a few parts of this diagram. So first of all, to get to zooming on the sources, I'll even try to magnify them, which I think I can do here. Yes, I can. That's, that's a simple bit. People have talked about similar things before. Basically, I've got the printed edition to start out with, but to some extent, the printed edition is supplemented by field work research, which I might do, or someone from the project might do, or at a later time, someone from outside the project may do and contribute. Images and metadata are all sourced from the printed publications, but may also be supplemented or augmented through fieldwork. And both, or three of the images, the edition and the metadata go their own separate ways. What ways are those? So to close in on the main thing we are doing or I am doing transformation and what we produce vis-à-vis digital data. Yeah, okay, nice. Almost nice. Over here. Basically, not much to do with images at this stage, a bit of post-processing and recording the image metadata, as I have shown before. For the edition, that's quite a lot of work. What we don't need to go into the details, I need to add all the markup for the Epidoc XML to produce this Epidoc edition snippet, which contains just the, what you call the edition division in the Epidoc file, but not the TEI header. Metadata need to be recorded in a structured way. We now have the table to do that, so it's pretty straightforward to record the metadata. Now, once they've got the snippet and we've got the structured metadata, that's sort of fuel for the Sydom site to work with, but it's not really what we want to archive on Zenodo. When we're archiving things on Zenodo, we want to put up things that are more compatible and more transparent. The best way to go about that is to produce those full Epidoc edition files that I mentioned at the beginning, which actually contain all the metadata for the inscription in the TEI header. This, of course, results in some level of redundancy. If there's an object, let's say the Junaga rock, which has an inscription by Ashoka and another inscription by a Vassal or a Furitorial Skanda Gupta, then the metadata for that rock will be included in the Epidoc editions for both of those inscriptions, where so far, since we're working on the Gupta period, we don't have the Ashoka inscription digitized, but we do have the Skanda Gupta inscription digitized. In future, somebody will hopefully add the Ashoka inscription, and at that time, when somebody gets the Epidoc files, there will be two Epidoc files containing data about the rock. We'll have to live with that. Anyway, we need to incorporate those metadata in the XML file. Now, at the moment, Get and Reeze does that sitting at the back here. He's been kind enough to write a script which can extract the metadata from my tables and put them into the XML files, merging them with the XML snippets. Yeah, I think that's about Carbazid. I haven't talked in detail about how we publish and archive the images and the bibliography, because that's sort of still in the works, basically. So, how does this metadata integration thing work? On the left-hand side, you can see the basic structure of an Epidoc edition. It has a TEM header, where our written stuff goes here, and then it has a body part, and the body part contains different kinds of divisions, and so the edition division can contain the pre-made XML snippet for the text edition. The translation division can very easily ingest a translation XML snippet if somebody creates one. We've got a number of translations typed up by subcontractors, but we are not yet in XML, but since we're not going to be adding a lot of markup to these, and a very basic markup, like paragraph structure, these can be created in a matter of hours for all the translations we have once we have the means to incorporate that in the Epidoc edition. And similarly, we can do that for a commentary and for a property bibliographic reference list to a master bibliographic partner. So, what I showed you before as Epidoc is actually the edition snippet that goes in the edition division, and similarly metadata from the table go in the Epidoc header. So, for example, in the Epidoc header, there's a place for a title, and the title of an inscription from the metadata table goes in there. Or the Epidoc header has a place for an inscription identifier, an ID number, and our internal ID number goes in there. I should probably talk a little more about the ID numbers, which are sort of key to keeping track of all the work we do. But the essence is that it's just an arbitrary number. It could be any sort of number, come up with IN for inscription numbers and OB for object numbers and five digits in addition to that. And that's a stable identifier for all the inscriptions and all the objects in our database. And every file that is in addition to the metadata tables, such as XML files with the inscription additions, and in future, possibly XML files with translations or XML files with commentaries, will be identified using that number. That's the essence. At the very beginning, I thought for some time, just as Mark talked about his dilemma yesterday about how to put the Pew inscriptions in sequence, would there be any point in like, okay, let's start all the Gupta Dynasty inscriptions by IM005? There isn't really a point, because on the one hand, sometimes the qualifications are not quite clear. How do you know if an inscription is Gupta Dynasty? Okay, let's suppose it's by a Gupta vassal, but maybe he was also a Vakartaka vassal, or who knows? Or maybe just don't know who's vassal he was. And on the other hand, you can't account for new discoveries. So okay, you can always leave a number of empty slots. But there's just no point. It's just a number. It doesn't need to be meaningful. The inscription identifying number doesn't need to tell you anything about the inscription, or the object number doesn't need to tell you anything about the object. The metadata do that. And the ID is just an ID. Okay. And an object can also go in there. It doesn't matter. So how is the work shared? Well, most of it I do myself. As I mentioned, get in, thankfully, does the metadata integration. What complicates the picture a little bit is that we now have subcontractors who take up some of the partial tasks, such as encoding published editions in XML, recording metadata in our structured tables, or typing up translations. And we've also got some contributions from project members and in future, hopefully from people outside, who will either like post corrections to metadata, or submit photographs, or rubbings, or whatever images to us. All these data are mainly stored on Google Drive. And for all kinds of data, there is one master version for which only I have edit privileges, and everybody else can only view them. And everyone else has view access to the whole of the dataset. But basically, when, for example, a subcontractor uploads like an XML, they have transcribed, or a metadata partial table that they have filled out, then that comes up to the Google Drive. It comes down to my computer. And it's I who merged that new input while verifying it. I merge it with the master set. So the master set never gets changed, never gets overruled. It only gets gradually expanded. I mean, never gets changed or overruled by multiple people at the same time. I think this Google Drive thing is good for our purposes at the moment. It's safe enough just in my own household. It syncs daily to three separate hard drives, plus the copy in the cloud. And I do hope, though I'm not certain that some of the other project members, including some of the PIs, actually sync it down to their own local hard drive sometimes, and don't just keep it in the cloud. But even if it's just in the cloud, plus on my hard drives, that's probably safe enough. And every now and then, the unfinished or work in progress files are also archived on Zenodo. Now, a little more about what I hope will simplify this workflow in the not so distant future. Work is going on on this. As I said, the Sydom.uk website is a bit rudimentary at the moment, and there are bits of it that are broken, but it's being being improved. And one of the main things we're hoping to have in the near future is working CMS, a content management system. And the content management system will basically simplify our workflow in two ways. One of those ways will be, thank you, one of those ways will be that we can get rid of the Excel file and the Word doc and whatever that I have on my Google Drive. And the master version will be simply the version that is on Sydom.uk. Either I or a subcontractor, anyone who wants to add something or correct something, can directly correct metadata or inscription bits into the Sydom site through the website interface. And the same content management system will, I hope, take care of what get-in manually or semi-automatically does now and create those exportable full Epidock files that contain metadata in the headers, which can then be either downloaded by end users or can be at regular intervals archived into Zenodo. That's all I was going to speak about. Thank you. Questions? Please. Maybe I missed this, but you mentioned Junar. So for example, for Junar, we have Harry Fault's book, which contains a great deal of detail about the physical site, contains GPS coordinates and not an edition of the inscription, but a lot of material about the surface and so on. It seems like when you have that kind of information, it would be, what I heard you say is we could maybe in the future, whatever, but that information is already available and known to be reliable. Yeah, yeah. Okay. For most inscriptions and most objects, there's a lot of known and reliable information out there. There are two things we can do. One is to duplicate at least part of that inscription in the website, part of that information on the website. And the other is, of course, to just include bibliographic reference. So obviously, I mean, there are entire books dedicated to some of the objects or some of the sites, and we're not going to transcribe all the texts of those books. I mean, even if there were no copyright restrictions on that, manpower restrictions would prevent us from doing that. So what we want to do is basically to become on the one hand an index with a bibliography. That's where you go for more information. And on the other hand, the website is also a tool for working with the inscription. You can click and display the inscription as a diplomatic edition. You can click and display it as an amended editorial version and hopefully you will be able to search in the inscription texts or across multiple inscription texts with various restriction criteria. So that's what we want to do. Obviously, we're not going to be able to ingest every single bit of information and put it out there on the web. Does that answer your question? Yeah, no. I mean, it's not a kind of question where there is an answer to it. It's a really ongoing question. How much, in any kind of process, I guess, how much you want to send your users to something else for everything, right? And how much you want to, as you said, ingest, sort of. And I wonder, I mean, obviously, ideally, all the inscriptions get re-edited, obviously. But there's a lot of material, for example, for the older publications, which is basically public domain, which could be actually, you could actually upload the publications into the system itself. You could. Indian Antiquary, for example, I believe. You could upload the publication itself. But I'm not sure it's worth the while. I mean, what we're trying to create here is something that's machine readable. Now, okay, somebody has scanned Indian Antiquary. It's out there on DLI, or DLI is no longer extant, but it's there on archive.org, most of it. And people can hopefully find it for themselves. But let's say a portal outside Siddham will not be able to read the scanned Indian Antiquary that I've uploaded to Siddham. So, yes, it would be nice. It would simplify some other people's work a little, if we at least, for example, linked to Indian Antiquary on archive.org. But that's not the main thing we're interested in. It's something that, I mean, if this whole thing takes off and it becomes really a sort of crowdsourceable venture, like the Chinese text, for example, then I hope that users and outside contributors in future will add links of that sort. Okay, this Indian Antiquary is actually, that's the archive.org link where you can download it from. It's not priority right now. Yeah, that would be nice. Are there any aspects to this design of this, or what you're doing here, which would, because you looked at the archive that you're kind of using here is relatively small and graphically compared to later periods and other regions in India. And there you're faced with other issues with, for example, where large numbers of inscriptions are unpublished or only available in this step, which is at the ASI and so do you think this could be expandable into those? I certainly hope so. I can't really predict the future. The way I feel about this is we're at the start of something really big. It's like the little rocks that begin to slide from mountain top to become an avalanche. And some of those little rocks will not be part of the avalanche. And some of those little rocks will just stop on the way down. But I'm hoping to be one of those bigger rocks that will actually generate the avalanche. I mean, all of this kind of work will be a lot easier 10 years from now. Several times now in these, during this conference, have been reminded of Douglas Adams, whose, I forget the name of that hero, but he, in his youth, he spent, I don't know, three days trying to make the computer play three blind mice. And when asked was that, wasn't that very futile? He said, well, no, because it taught me a lot about working with computers. So, I mean, right now what we're trying to do is pave the way to make this kind of work simpler. And as it gathers momentum, I think maybe five years from now, either for me or for someone else, adding 10 inscriptions, for example, to a similar database, will take as much time as it takes for me now to add one inscription. And in 10 years time, maybe somebody will be able to add 10,000 inscriptions in as much time as it takes for me to add one now. But first, we have to do the groundwork. And that's what we're doing now.