 We want to pose a question, really and pose a question to ourselves, that is this, and to you and it's very simple, is there a case for shared digitisation because we feel it's something we should probably be thinking about as a community? We understood a small trial and we will present that evidence to you. We are going to go through this quickly. It's 8Ps we are going to talk about, so apologies if we are a bit flustered. Who knows what happened to our slides, they were fine earlier. ac what could we solve with shared digitisation, what is the proposition, Pro out is the pilot, theYes process, the product, the potential and the related project with that! Now quite often you are awake up, listen to the 7'o clock news on a morning and the scientist have just discovered and you think, I realise that at 8 it is obvious, why did you have to do a study that tells us that. We will apologise up front because a lot of what we are going to tell you." You will like, and we could not get that, it is sort of obvious Felly, mae'r llifau gweld i'n gweld, ondo, ond mae'r llifau sy'n gweld i ni. Mae'n dweud o gweld ei ddweud o blynedd, ystafell, ysgol, ymgrifes, ynghylch, ymgyrch, ymgyrch, mae'n eu metodatio, dyna'r gwybod yma'r gweld i'n gweithio. Felly, mae'n gweithio ar gyfer hynny, fod ydych chi'n gweithio'r gweithio, mae'n gweithio'n gweithio, fod yn gweithio'r gweithio, dwi'n ddim yn ei ddysgu'r cysyllt. We did not want to undertake a pilot to tease out what extent the problems are or not. The problem to pose it is that many research libraries are starting to undertake mass digitisation. That is the position we claim we are in. What we define as mass digitisation is that we tried to define it developmentally as big data. Big data is that next step ahead of your current comfort zone. sydd mynd y ffrindio gyda'r awgwrs. Mi'n ffrindio gyda'r awgwrs. Felly, ydych chi yn fwy ffordd hon, ydych chi'n gweithio'r awgwrs ffynu'r cysyllt�gen sy'n cymryd ar hynna ffordd. Dyn nhw i'n ddych chi'n gweld ychydig wedi bod yn y bwrdd sy'n cymryd a fydd ddweud sy'n cymryd y ffrindio fannaf. E'n ddweud hynny, mae'n flwyddyn hir yna yn ddweud y cysylltu i'r awgwrs ti'n gwybod mwy i'r awgwrs gyda'r trafnod o'r rhan, ond nid ydych chi wedi gael y ffordd. Rhaid i'r mas y dygetiynau. Felly mae'n fawr yn ardalol. Ond ddiddordeb bytio 3 gynhyrchu dda chi'r mas dygetiynau yn Y Llyfrgell Llawni Gŵr, mae'n dda'r strategiwn, mae'n gweithio i'r 2025, ddim mas byddai'r 3d o fynd i gyfnod. Felly dwi'n eu gynhyrchu, mae'n ddiddordeb 200,000 ydym yn ymwympio mewn cymryd 3 dygetiynau. The University of Edinburgh, they finished last year 17,000 PhD theses, they're currently 100,000 pages through their session papers project. Trinity, they have a new strategic aim of opening up the analogue legacy collections for global access through large scale digitisation. This is their robotic scanner that sits there all day long digitising automatically. So that's some examples from three different libraries. So the problem then, or the hypothesis I suppose that we're putting out there, is if there is an increase in mass digitisation, will there be some level of duplication in that digitisation? And if we therefore assume that duplication is wasteful, or muda as you would call it into the Japanese if you're into your lean methodologies and things like that, then if we share that digitisation and reduce the duplication, which outweighs which the savings that we make for reducing the duplication versus the cost of co-ordination? And that's sort of what we'll look at really with this presentation. Or to put it another way, simply can with our shared digitisation, can we save resources by working together? And so we thought we'll have a pilot to test that. So we very carefully scoured the landscape of all the RLEK members and picked three top class institutions. Simply because we're three ex-colleagues, there was nothing more scientific than that. So Laura Shanahan, who used to be at the University of Edinburgh now at Trinity, myself who used to be at Edinburgh is now at National Library Scotland, and our colleagues at the University of Edinburgh. So we thought we'll get together and we'll do this pilot, you know, a bunch of friends. We can work together, we can make things happen quite quickly. So we have two legal deposit libraries and two university libraries, and we'll sort of play into this a little bit later when you sort of see how we undertook some things. Right, I'll talk about the next three P's, the pilot, the process and the product or the results. In setting up the pilot, we decided that it would have to be both achievable and meaningful. To make it achievable, we settled on a sample size of 100 items, that seems reasonable enough. And we would select those items from our older general collections. This way we were hoping to sidestep thorny issues of copyright, as well as distractions of copy specific details. To make it meaningful, we thought the sample would have to be fairly randomly selected so that we could generalize from the results. And we would have to agree on what duplication means and we settled on the addition level. Finally, we had to agree on certain standards that are acceptable to the three of us when it comes to imaging the materials and then sharing the digital files. Our approach was very much jumping right in and learning by doing rather than trying to plan everything out in advance in minutia detail. In order to generate our pool of items, we set two very simple criteria. One is that the items would have to be published in the year 1919, so that's a kind of a neat 100 years. And the other is that the word war would have to appear in the title. So that's a reference to the recent centenaries of the First World War. We were hoping that the COPAC Collection Management Tool would help us with this. And as it turned out, it could, but there was a little snack that I'll come to now. The great thing about COPAC is that it allowed us to select the three holding libraries. It allowed us to do a keyword search on the title, so that was great. And it allowed us to do the very precise calibration of the level of deduplication that we needed. So that was great as well. What wasn't great is that we couldn't limit the search to the year of publication. In fact, you can't apply any other limits to your result sets. So those friends who are working on the NBK, please take notes. Right, so what we had to do quite simply then, we got a result set of over 60,000 records, export those to Excel, extract the publication year, hone in on those 1919 publications that gave us a list of four and five items. We sorted them by the COPAC ID number and then we simply took the first hundred items. And there we had our fairly randomly selected sample set. Right, so a fairly quick time. As I said, it's learning by doing jumping in, doing it quick and dirty. We had a report on our holdings and overlaps, or did we? Because as all of you here know, COPAC is only as good as the data that we feed it. So we decided to do a very thorough check of our catalogs, our shelves, and then of the precise metadata. In Trinity, we decided to actually collect all of the items that we found, have them in one space for the duration of the pilot. So if we needed to check something quickly, we could. That's sort of a work-in-progress image. Right, so we had our list of hundred titles. Each of us had to check the full list in terms of our holdings, and then we kind of did it in thirds in terms of providing detailed metadata so that we could be sure that we are actually having the same addition. And then finally, I think initially nearly as an afterthought, we decided, oh, let's check what else has been. So which items of these hundreds have been digitised already and is freely available out there? And so, again, that was a matter of dividing up the hundred items into thirds, and we checked 33, 34 each. And that was a very significant step. In retrospect, of course, we would start with that check before we did anything else. And you're just going to talk more about the significance of that. When it comes to digitisation, we really kind of did the minimal approach. We honed in on three items that each of us had a holding of, and so each library digitised one book, and so we had three in total. Standards, 400 dots per inch, TIF, we'd share the master files by email, and then we didn't specify what each library would have to do with those three files, whether we'd put them in our repositories or not. As it happens, each of us, we published our own through our repositories so far. This is the increase of rent and mortgage interest water restrictions Act 1915, and the next is progress of aviation in the war periods, some items of technical and scientific interest, and finally, the hill of vision by Frederick Ly Bond. Now, remember I said we wanted to avoid copyright issues well. This item has probably three authors who doubled in automatic writing. Those of you who don't know what automatic writing is, this is it. It's channeling spirit communications. So, two of those authors, we have a death date, which is luckily far enough that we are out of copyright. One of them we are not sure, so we'd have to possibly register this, and it often works registry. So, what does the overlap look like? So, I've finally come around, so I'm giving you some results. Luckily, again, this nice Venn diagram does display and surprise the spies with general collections the League of Deposit Libraries predominate, and more than half of the items were shared between them. Interestingly, if you delve down into the overlap, you can see that the university libraries have a slightly bigger overlap than the two libraries in Edinburgh. So, it's only 5%, five items, so it's a small sample, so we can't generalize from that yet, but it's something to note. But the point here is, of course, that we have significant overlap over two-thirds of the 100 items are shared by at least two libraries. Therefore, if we were to create a big mass digitisation project with these kind of materials, yes, there will be duplication and there will be potential waste. So, at this point, I'll hand over to Stuart. Then the next question we wanted to ask about that is actually so we know of our 100 items. How many of these have actually been digitized already somewhere else that we haven't digitized? Because we could say, in a sense, that would be another measure of waste because we don't want to digitize something if it's already open and digital elsewhere. So, the number for that one actually surprised us again quite high. So, it was 59% of those items had already been digitized somewhere else. When you cut down how many of those are digitized and open, comes down a little bit, so it's only 36%, but actually, you know, over a third of what we could have digitized is already out there, openly. So, we don't need to digitize those. We only check three sources, Hattie Trust, Internet Archive and Google Books, and we sort of say further here because if we found it in one, we didn't bother checking the others. So, the conclusion again from that is if we were to do much more of mass digitization of general collections, then there will be overlap. So, really, we should be checking what's out there already. And then, I suppose, a supplementary question for us, then, as librarians and with our own collections, our own catalogs, should we or do we link to these externally digitized resources to our sort of physical collections? You know, they're not our copies that have been digitized, but it's pretty much the same thing, so should we point people to them? And if we do, do we sort of say, well, we trust some sources more than we trust others? Hattie Trust, that's great. We trust them. Google Books, we don't trust them, maybe, but actually, most of what's in Hattie Trust came from Google Books anyway, so should we just put all these links in? That's something I think we need to think through. So, we'll speed up a little bit, I think, but the cost savings of this, surprise, surprise, when you're just digitizing 100 books, or in our case, just the first three to start with, the cost of that, you know, we put 33 days into the project and we got three books out, that's not efficient. We're not going to do that again, don't worry. So, there's no cost savings there, so we thought, well, let's actually extrapolate this at what point does it become, if it does, does it become worth sharing digitization? So, if we were to go back next week and finish this project with 100 books, we reckon it goes up to about 51 days, that's 17 days per partner. So, actually, now the cost comes down to... So, we calculate all this in FTEs rather than pounds and pence or euros, but we're saying it's more like 0.17 days of a person to get each book, so actually, we're starting to get down to values which do save us money. So, we thought we'll extrapolate this. What will happen if everyone in this room were to collaborate? So, we get 30 members, we get 10,000 books. What does that then look like? So, if we assume, again, we're only doing two-thirds of these, if a third of them are already digitized, we put in about a month each of administration, we put in maybe two FTEs of central coordination to actually make this happen. The metadata we put in, we reckon about 3,000 days for everybody to check their holdings. Digitization for books a day, 1,600 days. Again, it adds up quite a lot, and we're working at well over 5,000 days' worth of effort to do 10,000 books. So, you're looking at about half a day per book, so that isn't, again, necessarily efficient. We can digitize better than half a book every half day. However, you split that up per partner and essentially what they're contributing, we're down to like two or three pounds per book. So, if we were to work together for less than the cost of one FTE for one year, you would get 10,000 books digitized from your collection back, so, when you're working at that sort of scale, it does the numbers sort of do add up. So, the conclusion from that is obviously, and this is one of those, a surprise, collaboration actually costs money. It'll probably, in a sense, double the cost of actually digitizing each book, but you split that up over a large number of partners and actually then the unit cost does become much, much cheaper and probably worth looking at. However, one of the things you'll probably think it is, but actually a lot of what we're digitizing right now is our unique collections, really, not our general collections. And if it's our unique collections, that duplication doesn't happen. So, one of the things is, is that the case? Are we really just choosing unique collections or at what point will we start moving into more general collections? So, yeah, what have we learned or confirmed throughout the way along the project? It is things like licensing that gets in the way. We agreed up front how to crop our images, but all three teams crop them differently, things like this. How do we do project management and coordination? As Christof said, there was a lot of problems with co-pack data or indeed our own data that then matches with co-pack. So, if you just look at the false negatives here, so this was where co-pack said we don't have things, but actually we do have them. 20% of Trinity's results were false and negative. They actually have those items, but co-pack's not representing it. 10% with us at the National Library of Scotland and so forth. And actually our catalogue is when they looked at it, they said this is awful and they actually went back and recatalogged all 100 items simply because they said, we can't leave it like this, it's not good enough. So, one thing I suppose just to leave you with to think through, well, as a community is there something we should be doing here and if so, what are the different things we could be thinking about? So, number one is actually let's do nothing. What we're digitising right now tends to be unique collections, so let's just leave it at that, that's fine. Number two, should we actually say, well, if nothing else, could we do some sort of tracking and monitoring of who's planning digitisation, what's underway, what they've completed because we don't really share that. Certainly National Library of Scotland, we've got a big sort of multi-year selection plan now. None of you have probably ever seen that and should we share all those more routinely. Number three, should we undertake a much larger trial, you know, actually say, well, let's do this at the RLUK level and see whether it works, whether it adds value to us. Number four, could we just go the whole hog, have sort of a big bit of vision. So, we've almost set up sort of the UK equivalent of Hattie Trust and that actually then brings whole new options as well because you're saying, well, actually, we can have a central body that coordinates this, stores it once, preserves it once, that sort of thing. Or are there other options? So, this is sort of in a sense what we'd be interested to hear from you. And as it says here, if you're stuck for a topic at dinner tonight, you don't know who you're sat next to and you want something to talk about, well, talk about this and sort of let us know what you come up with. So, two slides just to finish off with. You're probably wondering, which is what it says in that gap, but it's not there. You're probably wondering, wouldn't it be great if there was a single global register of all digital texts? Because ignore the fact you're librarians, if you just want to know, you're at home, you want to know, has this book been digitised, you can search Google, you can search Internet Archive, Hattie Trust, but at some point, an hour later, you have to give up because it's just like, well, I can't find it anywhere, but it doesn't mean it hasn't been done. So, would it be good if that such a thing existed? You know, it would help us in terms of our digitisation selection, looking for duplication. It could help anyone to find open books. And then, you know, for example, there's digital scholarship benefits, wouldn't it be great if you could just take a big sort of cross-section from a particular subject of all open digitised books, download them automatically, do your work with them. So, we thought that. We've formed a partnership, which has been really good. So, National Library of Scotland, Wales, British Library, Hattie Trust in the US, University of Glasgow as academic partners. The Information Studies Department and RLUK are a member of this as well. So, Matt's sort of representing that. We want some AHRC funding to investigate this. So, we put that in as part of their research networking collaboration called, which was out in autumn last year. And we've been funded to do this project. So, looking at developing a global data set of digitised text. So, the purpose of that being to explore the feasibility of a global registry of digitised works and the value that such a registry could bring to digital scholarship and libraries. So, this will be taking place over the next 12 months. We've got four workshops, two in the US, two in the UK. I think one will be at the British Library 21st of June. That will be the first UK one. And we're basically going to look at sort of, is this feasible? Can we take the Hattie Trust, who have obviously in a sense they have this data for North America? Can we sort of merge in some of the large UK library databases and just sort of see, you know, is this a feasible, is it a tractable problem? We're almost sort of looking at it as a planning grant, basically. You know, could we do this? Because if the conclusion is, yes, it could be done, then actually, that's a much, much bigger project. So, we're not actually going to do it. We're just looking at the feasibility of it. So, that's the end. So, really, I suppose we leave you with that question. We've presented a trial, a small-scale trial, some evidence, and then that's the question, is there a case for shared digitisation and what should we, what should our UK do about that? Thank you.