 Good afternoon. I'm going to talk to you today about digital outputs of collaborative humanities research projects. So I'm not going to talk about my own research, but more give you an overview of some of my experiences working on research projects. Just to give you a little bit of background about myself, I wrote a PhD on the rock-cut monasteries of the Western Ghats in India and GIS database formed a central plank of that thesis. The appendix of the thesis included many pages of printouts of spreadsheet files, which are completely unusable today. So I've learned the hard way about data management. I've also worked as a software developer and as a database administrator and I'm now the work for the Beyond Boundaries project at the British Library as GIS research curator. Throughout the presentation I'm also going to discuss a previous project I worked on, which was mapping the Jewish communities of the Byzantine Empire, also ERC funded, and we did some things well on that project and we did some things quite badly as well. I'm happy to admit, I mean, all in all we delivered the digital outputs on time. They were two weeks late in the end, which is not bad as these things go. The primary one being this online web mapping system, but I'm just going to refer to that a couple of times throughout the rest of the project. Okay, so digital outputs on collaborative humanities research projects can be divided into three different types I would argue. We have data, resources and tools and I've arranged them in this pyramid-like fashion because as we move up the levels on the pyramid we have increasing technological complexity, increasing in effort involved, with a potential payback in usability and the ability to change research behavior. Now that also comes with a risk as we're moving up that the digital output will not be used firstly and secondly there's a decline in the longevity of these digital outputs as we move up the pyramid. So I would argue that any project needs to build a strong foundation here at the base before moving up and too many digital projects have gone straight in at the level of tools without thinking through these other aspects carefully. Now you'll have to forgive me for this one, but we're not working in ancient Egypt, we're working in India and so perhaps Shikara, this is the Kailashatempo Ellora, is a better analogy than a pyramid anyway. So we're going to begin the lowest level, data. So everyone on the Beyond the Beaudry's project is generating data and you'll be seeing their presentations throughout the day today and these data are going to be deposited in a raw form for download and hopefully in a more organized fashion than this. Now there's plenty of different repositories available for depositing data, but our project deals with Zinodo. Now why? Why are we depositing these data on Zinodo? Well the first reason is you have to, okay? Now to comply with grant rules and our delegates from the ERC etc we have to deposit the data, but I would also go further than that and say there are very many good reasons for doing so too. Firstly there are all the reasons associated with open data generally. Now just to find taxpayers investment is an important one, if the data is on Zinodo those data are not going to be lost. Secondly I think there's some particularly important reasons for our project related to developing countries. We should allow researchers in developing countries to see our work we're dealing with Asia and given the project aims of trying to break down regional historiographies and boundaries between work open data is important. So secondly, I think it benefits the researchers themselves. So you can come back to your data in the future and use it. I mean how many of you have had to fish through an old hard drive to find your data set that you used six or seven years ago and found it totally unusable, it takes you days to work out what's going on and you can't recreate the analysis or digital outputs that you made at that time. So if you clean your data properly at this stage and deposit it somewhere where you can find it again it's going to be good for you. Secondly the process of data deposition also and data cleaning also provides new insights. If you look at your data and you know your data that well that it's perfectly clean and that someone else can use it, you're going to have new insights generated from those data and it also makes the findings of your publications more believable. So Zenodo is a great platform as far as I'm concerned. There are limitations, various people have discussed some of those today but flexibility is one of the key benefits of using Zenodo but also one of the creates a lot of potential for problems too. So the power of Zenodo is cumulative, the more data that goes on there and the better keywords are used across Zenodo, the more power it has, the greater ability there is to win data sets together and a final benefit I'm going to discuss is citation. So when you deposit your data on Zenodo, cite your data in your publications. I would argue that that is an excellent way to structure the data that you've put on there and it allows people to find your work as well. People often tend to cite print publications. I do this myself. If there's a digital publication and a print publication, I often just go and find the print publication even if I'm using the digital catalog, go and find a reference to the print publication in a library catalog because that's what I'm used to doing but Zenodo does make it easy to cite. The citations have this type of structure and you can actually download the citation in various different formats on the platform as well. So as I said, flexibility has its own problems and we need to make our data that we're depositing useful. Useful and usable. There's no point in doing it if we don't and with the flexibility of Zenodo comes a lot of problems with doing this. So I've got some tips here and these are for the project researchers primarily but the rest of you can do this too. So deposit the data in the most basic formats you can find to avoid obsolescence. So we have textual data. I'm not going to comment on all these formats but PDF-A is an archival PDF format. If you have to deposit in PDF, I would not recommend it but if you have to, make sure you use PDF-A. It's easy on Acrobat. Go to Save As More. There's options PDF-A. It means the fonts and any other files that are associated with your PDF are actually embedded within that PDF. So it's not reliant on other files in the computer. The other point is don't deposit in JPEG. I think Robert mentioned that earlier. They're a lossy format which means they lose information when resaved. So now draw your attention to the TXT format. Most of you are using text and you need to save in Unicode. That's UTF-8. Now it's a massive pain that Excel doesn't export UTF-8 as default. So every time you're exporting from Excel, Excel is a CSV file, a Comerceptorated Variable, you have to save as in UTF-8. Otherwise, you're going to lose all your lovely diacritics. And I know you Sunscritters loved those. And you'd be very upset without them. So I've had a... So do save as. And if you take one thing away from this presentation, do not upload to Zinodo in doc or XLS format. In the Beyond Boundaries communities, there are quite a lot of doc and XLS files. Don't do it. If you want to do that as well, okay, I suppose. But basic formats, these formats have to be the priority, okay? So what other tips have I got here? Structure, clean your data. Data should be consistent. We don't want multiple values in the same column, capitalization, lack of standardized terms, abbreviations. Use headings that are meaningful in your data, okay, and meaningful to other people with acronyms explained in the metadata. Importantly, look at other people with the positive data and call things the same as what they have done, okay? Don't have everyone using different words for what is essentially the same entity, okay? Keywords. Keywords are very important parts in Zinodo. Now, you're going to make these up. We all do, right? But also look at what keywords other people are using. Zinodo searches keywords using diacritics as well, so be careful about that. And if you're stuck for using them, the Library of Congress catalogue has a list of canonical subject headings that you can use as keywords. Now, come to file names. These are some of my favorite horrendous file names. Yeah, now just use this one. Actually, a friend of mine did send me a file, a photo with that name. So no, there's no place for humor in digital outputs, okay? I mean, I'm not going to go through all these, but I'm sure you're familiar with having received files of colleagues with some of these names as well. So, you know, I would say that this is one of Daniel's file names, actually, which I'm very pleased to receive. And, you know, this is just within the context of the repository, this makes sense. It's the ID for the inscription on our SIDM platform. Okay, so do change the file names because every person who then has to download your dataset, which I know there will be thousands of, have to then rename the files. On JSTOR, for example, you have to rename that file every time you download it. They don't seem to give you in a nice format, they use their own ID, right? Licensing, Zenodo gives a Creative Commons BY out license by default. That's fine for me, but some of you may not want your sunscript used by any commercial company. And so you can perhaps want to change that to prevent commercial use. Okay, something you should think about too. And finally, break down the data. That's something for discussion, really, do you upload all your data as a single, as a single zip file, as we discussed this morning, or as a single, in a single repository with all the information there, or do we break it down into the individual section so that individual piece of the data could be cited separately. That's something the project needs to discuss further. Right, so that's the data. So that's once we've got that in place, that base of the pyramid in place, I think we can then move on to resources. Now, what do many of you contribute to digital resources? And I'm talking about the specialist academic resources like Gretel, or Sarat or the Archaeological Data Service. But I'm also talking about broader resources like Flickr, Instagram, people on the project may want to think about publishing in those areas, it's quite gratifying to see your images reused in people's blogs wherever. Okay, so that can be nice too. Now, I'm defining resources as domain specific in that they deal with a single type of data, often with a specific period or geography, geographical area or language. And they're usually presented on the web in a format that's understandable and accessible. So you don't need to download a file to look at those data. That's how I would define a resource. I mean, we could argue about that. Now, why deposit your data on resources? There's a community already in place. They have a larger and broader audience than if you deposited your data separately. They're often more accessible. For example, they're optimized for search engines. You can take advantage of added value functionality, such as display on the map. And they often give you a canonical URL for citation as well, which may be more useful in publication than a Zenodo DOI, because you can take all your different pieces of data from the same place. Now, choose wisely when selecting resources. There's a lot of effort often involved in structuring your data to put on the resource and to meet the standards that the resource requires. What if the resource goes down? How long will this resource be available for? You should also read the terms and conditions or some of these resources you have to pay for as well. Okay. So we have our very own Beyond Boundaries resource. Sidham.uk. This is, it's recently gone live, but it's still under development. And I think Daniel will be talking about this in more detail tomorrow. So I'm probably not going to go into too much detail, but we're hoping scholars will contribute their inscriptions to Sidham. And over the remainder of the project, we'll be trying to promote this and it's used by further projects in the future. I'm dealing with Asian and South Asian inscriptions and inscriptions from Asia more widely too. So we'll come back to that. I'm running. I don't know how much time I've got left really. I was going to talk about link data for a short time, but without explaining the technical considerations of web data, you may know this is a semantic web. This is a way of exposing, sharing and connecting pieces of data, using a method of structuring data with a particular variant of XML or JSON. And there are many ways to use web data, but I'm interested in for our project in connecting digital resources that have something in common, whether that's place, person, time or another entity. So by simply using an identifier that's common across these resources, it's possible for machines and also to harvest data and aggregate it in the kinds of portals that we heard about this morning with new mathematics. And I would argue that every project needs to think about this because we need a world on the end of the web where people can get from resource to resource through a graph like this. And if you have an interest in a particular place, you should be able to access data from all the resources about that place, not just that one resource, otherwise, otherwise you're you're you're cut off. Okay. And in South Asia, there's, there's pandit, which is a gazetteer and a prosopography of, it's quite um, well, I don't know how new it is, but it certainly is still in its infancy, I would say. And I'm providing these, and wouldn't it be great if we could just click through here, if I had an interest in, so, and sort of put on and click through there and find data from on from inscriptions from coins from texts, all about that place. And um, link data is the only really the only way we're going to achieve this. And and any project working, any humanities project needs to be thinking about how they can contribute to this collective effort. Okay. So what does this mean to you on the project? Well, come and speak to me. And we can see if we can get your places into pandit, or another gazetteer, and what we can do to try and build links between your data and others. So we're on to the final digital output now, which is tools. And I'm defining tools as something that facilitates an analysis of the data set, a tool to interpret or manipulate data often providing innovative results. And there can be great variation in the types of manipulation or analysis you can do these tools, talking about visualization, annotation, calculation, you might be able to feed in the data yourself or the data might set might exist within the tool. And examples of tools include read or the map in the Jewish communities, the Byzantine Empire project I told you before, that was a web GIS, a set of web maps for browsing data on those communities. And that's also a tool. So now it's it's rant time now, unfortunately. So as someone who's been is built a few of these tools themselves, and witness a lack of uptake in their use, I would say it's very important that in the humanities we understand that these tools take a lot of effort to produce, okay? Yet they have a finite life. And due to the speed at which progress in technology happens. And they really need to burn quite brightly in that period in which they're useful and to make them worthwhile. Now, not enough effort has gone into tool production regarding their use, the target audience, the impact they might have and how that might be measured. You know, web design in academia does not have the harsh feedback with profitability that is in the private sector. And so large projects can gain funding and continue to be produced for quite a long time before any feedback is sought whatsoever on the interface or whether they're having an impact or how that how they might be reconfigured to do their job better. So in essence, tools are treated often like publications, but it's not clear that they are objects of research in their own right. So I'm going to if I have time, I'm going to talk briefly about one such tool, which is Orbis, okay? Now, I love Orbis. I think this is great, okay? It allows Roman communication costs to be estimated in terms of time and expense and reveals the true shape of the Roman world, okay? Now, basically, you can specify an origin in a destination across the Roman world and you can find out the time that it takes to go from one to the other. Now, this is great. It's a very specific use case. It's had a huge amount of money thrown out from Stanford. I think it's a great system. It's very easy to use. It's had a lot of publicity. There's been articles about this in Telegraph, Forbes, the Hindu. It's an impressive interface. There's a lot of scaffolding around it. They had workshops to use this tool. It's been embedded within some practice, but if we look at the 150 citations for this tool on Google Scholar, the tool is now six years old. That's not bad 150 citations, but virtually none of them use the tool for what they want, what it was intended for, which is measuring distance. People say, oh, this is a lovely tool. People say, oh, I'm not happy with the GIS that's underpinning this, or they've not taken account of this problem, they're not taking account of that problem, but no one use, there are very few citations of this that use it to actually measure distances and scene places and improve the research because of that. That's an indication of how difficult it can be to embed a digital research tool in scholarly practice. To sum up, projects have high level goals and eye catching problems to solve. Many have ambitious aims, which helps them to gain funding. Mac and the Jewish communities are supposed to help medieval historians use GIS. That's a big ask. I need the uptake of GIS. Beyond boundaries is supposed to break down regionalism and historiography, a lack of interdisciplinary research, the national orientation of research, these kind of questions, which is very admirable aims, but can the digital outputs help to address these? I think the data and resources can, by depositing our data and resources, they can help meet these needs in projects more generally and theoretically digital tools can also help meet these high level goals, but the successful uptake of these tools is difficult and has to be thought about carefully. We also should not forget that for one of these tools you can, there can be many traditional publications and those in and of themselves can move a paradigm forward to address these questions and provide examples for future research. So these digital outputs do need to be thought about carefully. Thanks. Questions? To hear your opinion on relative merits of early and late adoption, I think the general experience in numismatics has been that the most successful projects have been those which adapt tools that are well established and well entrenched rather than developing new tools or tasks. I'm curious on, you know, that distinction between developing new tools versus waiting until the tools are well established and on the fields to bring them in? Yeah, I think that if you have a tool that's established in disciplines close to the one that you're working in and it's and it has been used well there, then there are practices in place, you know there are methods of transferring skills between disciplines, there's a familiarity there and apart from all that there is also a proven use case for the tool and that would that would make it more likely to be that would suggest that there's a there's a market available market there So that's a good example, I mean I can't think of an example I suppose like GIS in archaeology and history for example GIS was picked up early in archaeology and slightly later in history and in that early period historians went through a lot of effort in order to try and historians went through a lot of effort to try and build those bridges between the between the use of the tool and the two in the two disappoints So we train every year about you know 30 documenters would bring them out to London and train them here and also we train people in the country so we go to Africa and in China and train them there and one of the things we always tell them is if there's a really cool research project that develops a tool for linguistic annotation don't touch it do not go close to it don't be a beta tester because when the funding runs out it goes down you don't get the data out right so what is one of the problem of the short funding cycles that are then a tool is created for a particular project tailored to a particular project and you don't know if this can be sustained afterwards because funders also don't fund the maintenance and further development of something right they fund new stuff new stuff new stuff so there's a structural problem within the funding stream that doesn't allow for further development of tools so in linguistics for example one of the most bizarre situations is the most stable tools the lexical databases that our linguists worldwide use are tools that are developed by the missionaries by Christians by SIR right so and they don't want to use them but it's the most stable it's the best developed software that is out there and there is nothing on the market that can be compared to them as open source and non proprietary but they have a constant funding stream and everything else that was developed by others like by the Max Planck and by the Doge's project had a sweet life cycle of three to six years and then went down for the development where by the CNRS with yeah yeah they are you know the web moves forward very quickly and unless you can build a real momentum behind a tool it's very difficult for it to have any longevity you know open making the tool open source is to some extent can help but building an open source community around a code base is very challenging very challenging unless they're that unless the the people working in employment full time use that code on a day-to-day basis and have to make changes in order to do their job more effectively open source code base projects generally full flat so because you know obviously people will do things in their spare time but in order to make major software upgrades you know it's a it's a lot easier people using that in a commercial setting or in their or it not just commercial setting but in their day-to-day work all of this said I'm just curious whether you can think of any tool digital tool developed in a funded humanities research project that has been an unqualified success um there's one um actually I've been using called recogito um it's it's uh annotations um basically you annotate uh you can annotate any type of text but it's mainly being used by classicists um and the annotation produces link data that I was talking about before so you when you annotate the um text you uh you annotate at the moment it's working with place names and then after doing that you link the put you say you see room you link that to the url of room in a gazetteer like um pleides and then um you uh and then that is pub then that edition is then published in the recogito um editions which are then made accessible so people searching room can then find that um test and they'll know it's that like you know rooms one example Alexandria say they'll know which Alexandria it is um and uh you know there's a big been a big uptake of that actually there's a lot of editions in the recogito um edition base but um whether then that's gone through to the users I don't know how often people are using that in their research has yet I think that there is another very obvious example uh the portable antiquities scheme right is by far the most successful humanities uh digital tool project in the log but it's supported by the British government at enormous cost has enormous numbers of staff institutional backing right so all of these things we literally just talked about and it runs off technology which conceptually is is very old and almost quite old even when it was in place so they you know all of these things were just kind of discussed are present there and I think they're on now uh I was told over a hundred phds by dependents on the portable antiquities scheme data uh recently so you know this is the this is a clear example the project has succeeded has led to research oh yeah enormous amounts of research and then if I may about jpegs uh we digitized these two photo albums in the in the bl right of a archaeological survey of india bermicircle and the bl kindly gave me uh jpegs and tiffos for every image and you you can't like where I don't know I can't upload tiffos so too big like I've let it upload to zinodo for 20 hours and then at some point it just dies the connection so I've uploaded to jpegs because I was able to now that's fair enough yeah that is fair enough now it's a good it's a good point it's not always possible to upload these big format I would say that in most cases for like natural pictures so if it's not line art or not a color diagram or something like that but a photograph even if it's photogram or a description or a line art the jpeg a good quality jpeg is not going to make a difference that matters the issue of the jpeg is a migration if the jpeg is open to the subsequent resend it loses a little bit of data it's makes photo cards correct so if all you're doing is archiving image there are the risks are much smaller than if you're actually using images but if you're archiving in some way has to say things later unless you're doing that as a change the opening is re-touching a little part and resaving is a jpeg and then doing that again dead times of succession yeah but it's still okay to matter yeah okay we'll let you away with a couple of g-probes okay I have a naive user's question which is not at home about Daniel's brilliant file name but what was missing is any reference in the file name itself to a version number or something like that shouldn't that be an important part of the easily accessible information about that file that I should be able to look at it and see where it fits inside inside the the nodo versioning system who knows if he's gone back and fixed his file and re-update or another date there's a date there's a there is a version date then that's available in the meta if you want to go back and change something on zeno you can only upload a new version but when you upload something to zeno though it gets a do update and it becomes fixed from that point on but you can't go back and change it you can upload a new version on top of it but you can't go back and change the original so I would argue that as a nodo output should just have should not have a version number in it I would say that that comes in the meta data the example you gave was an xeno file if I've downloaded that xeno file onto my computer how am I going to know and it didn't take notice of the version number on zeno go back to each of them that's your problem wait wait wait no before you blame me how am I supposed to take notice of that where am I supposed to store that information the place where I downloaded it from that means that you're talking about the same problem which we talked about from jstore which is now I have to retype a file name so you want me to input the date that I downloaded this into the file name not all meta data can fit in a file name I understand that but what I'm saying is where am I serious about this where am I supposed to keep track of that information unless what I meant to do is to go back and not to download the material not to download the xeno file but to go back and retrieve it every single time unless you want to download specifically you want to download an earlier version you would just always download the latest version from xeno and that's that so I'm not going to have anything locally I well because you may have a version of the file locally already then you go into nodo oh there's a new version I'll download it how do I tell it from the old version it is a fair point you're making yeah and you could make your phone in match this new do version as well I mean ideally you would just you could do things automatically in terms of if you're relying on some set of files for some project of yours if you want those to be up to date you can have it push those files every night or once a week or something like that and then you know you all accept the most I'm just I'm not arguing in particular position I'm just wondering how can on with the assumption that starting with the best position I could build something which is not going to need to be fixed later let's think about things like that where should that information be but this is partly you're asking for someone who's who's filing their data to handle the door own workflow problems what is most interesting about this discussion it tells us also something that great you put everything on the note of who knows how to use it right that's the thing it's a very advanced kind of knowledge system to know you know you downloaded the work within you want to know you don't keep local copies you work with always the view is from because this is the logic that is behind it this is what we see with our language they have never worked with the digital collections in our archive because they're not used in teaching and in training so they have a really hard time understanding how to and Zenodo is great it's fantastic but there is a job in the training from the user perspective of how to use it how to download it so having it in sitting in Zenodo one of the discussions is like who are we serving because it's a very very small community of users that are able to manage this kind of data sets well um yes and no so the development developing countries sharing with the developing countries is kind of a funny idea well yes in Zenodo I mean I've been here for standing here for quite a long time now so I know I should drive this up but yeah I think Zenodo you were plenty of time yeah yeah questions were somewhat responsible it's just an indication I think you're a senior um I would say Zenodo is it's picking your brain you know it's getting you but I would say Zenodo is very simple platform to use firstly and then I mean you can just download a trial and okay reconstructing like a relational database like see I've worked you know I'm I've worked with researchers from many parts of the world and I think that they're perfectly capable of downloading TXT files and looking at and viewing them um that's you know or or you know a CSB file in a spreadsheet is um quite not it's not too difficult I know obviously not everybody can do that but um I think you're yeah I think yeah I'm like a star somebody's got me more on thank you