 Baethau y cefnod o'ch meddwl am yr ysgolig, y tîm ar y cangwyr yma i'r Rhaglion Llywodraeth. Ac efo'r gweithio ar y Cymru mae'n gwneud yn ôl i'r boi. Dyma'r idea i gilydd o fewn y gallwn meddwl am gyllidio gan gydag o bwysig yma, I Hessen Womanol with images collection and how we can enrich the metadata to make them searchable and useable. We do a lot of digitisation at library. The point of it is really to get people engaged. If you find that your collections aren't searchable then people are not going to pay attention. They're going to sort of drift off and go somewhere else so it's very important to us. Rwy'n cael pryfliad yr ér adeithio a gwneud hefyd mewn cyfyrdd o zed innidol. Unwaith ar y cyfyrdd ac mae'r adeithio yn siaradau lleol ar y casudd. Mae'n연 bod yn cael heb yn ôl i ddechreu. Rwy'n cymryd ac mae'n cael barat o'u ôl i'r meld. Pan cyfrydd yn edrych â'r adeithio'r ac yn cyfryd mewn cyfyrdd y maedd. Mae'n gweithio'r gwahod o'ch gwahanol, i'r fforddioniaeth, a'r gwahanol i'r fforddioniaeth. Dyna rhai gwneud ffastafol yn yr Edinbeth, gyda Jackleann Hyde, i'r Sefyrr, i'r Peter Higgs, i'r Harry Potter, i'r James Bond. Wrth gwrs, mae'n bwysig, mae'n gwneud ein bod yn honnod o'r Sean Connery o'r ysgrifennu cyfnodol yn yr ystyried. Rwy'n fwy hwn i'r prynsgwch hefyd o'i digonwyr byddai'l adroddiadach, ond yn ymgyrchol'r cwrtog iawn ond y url yn eu cyd-dweud arbennodr. Felly mae'r cwrwch iddyn nhw i gynnyddu buf yn prysg exempto ar gyfan oedd ymddunio'r preffeso eoglion yn ymgyrchol yma. Yn hyn, dyma'r fawr am ymdyn nhw'n bryhau deilio'r hollbeth o'r apian yn yw'r brûh. yn y gallu wneud hyrwethoedd yn iddo i gyd, ond ydych chi'n cael 400,000 o'ch gwmwysgwys. 60 kilometrach o'ch wahanol yr anghywch. Mae gyfaint o ddod o archifau ac rhan o gweithio i'n gyfaint o'r gyfaint o'r gyfaint o archifau a'r gyfaint o'r gyfaint o'r gyfaint o'r gyfaint o'r gyfaint o'r gyfaint o'r gyfaint o'r gyfaint o'r gyfaint. Yn y gwaith ei arfer y cwmwysgwys, mae'n gwahodd yng ngyredig i'r iawn o'r portol. This is a way into about 25,000 non publicly available images at high resolution and my role here is basically to look after the images, get them online and make the device workflows for our users and photographers to make their lives easier. The guide I actually put this page together has a mind of obsession with skeletons. I don't know if you can see that but he's actually picked as representatives of the collections three different skeletons. I'm not sure why he did that but I quite liked it and I've decided to run with it. So the tool that we use is our front end is a thing called Lunar. These guys are a commercial product. It's not open source unfortunately but we do have a very strong relationship with their support staff and as a result it's nice to influence a lot of the direction and features that the tool holds. So basically it's an image repository based on a VRA template. It's excellent image management and it offers high resolution zooming through GPIC 2000 tiling and page turning software for rare books and things like that. So to talk about this workflow I want to use this as an example. This is from Erin Canabula which are rare books that date back to the versions of the printing press. So anything up to 1500. This is from the German Bumbler from 1483. These images are very popular not just because they're rare but because they tend to be quite gruesome and violent. I'm not sure that things work quite as positively correct in those days. Now this is the crux of the matter basically. This is the slide where I start to dice with actually keeping my job because what I want to really talk about is the constraints that our digital imaging unit are under. We've basically got two photographers and one camera and they're expected to digitise several lifetimes worth of material for the interests of the public. I say this facetiously but they are under a lot of pressure. A previous librarian actually suggested that our two photographers could work shifts so that the camera was constantly capturing images and they can go to sleep while the other one's doing the work. I think that's possibly denies their human rights slightly so they haven't gone down that road but it's a slight illustration of the pressure they're under. The other thing that we're up against is that we don't have anybody to catalogue this material unless on occasion we get project staff and volunteers in and this is obviously an issue. You can see here the way the purse strings work in education. Quite often you will find that they're far more likely to spend a lot of money on a really high end camera than they are going and getting a new member of staff because of the implications involved there. So this means that we need to find creative and resourceful ways of putting meat on the bones of the records and I won't say that again. Okay, so things start often, occasionally, if something is considered sufficiently historically significant will get the entire work digitised by the department. But more often than not what will happen is that the request will come from a reader who's opened a book, they've seen something they've liked and they thought let's get that digitised and get it online. Now to do this they actually have to fill out a paper form much to my utter shagran. I have built them a web-based tool to do this but things move very slowly in this measure of collections, one of them we're currently still working this way. So what this will capture is identification data, things like title, author and shelf work but there will be very little descriptive metadata to go with it and that's where we need to make your collections searchable. So what we've found is often, certainly without backlog, there's just not very much that data kicking about. The image is moving under, the request will move down to the photographers who will then get the high resolution image up, they'll put it into Photoshop, crop it and get it archived off into their storage. These images are such high resolution that they'll take up about half a gig with their derivatives etc that go with them and from there the photographers get to work on filling out an excel data sheet. Now we again have talked about getting a web-based form to do this but that would get around issues of things like multiple users and at once issues with saving that kind of thing but there are a lot of things that excel can do, so powerful that they're very hard to imitate when you try and balance them yourself and these guys have decided they want to continue using this. So we've basically got some code underneath it that will do all the necessary things to push the data downstream into the relevant systems. So the metadata capture that the guys will do to start with, they get a lot of machine generating data for free, the XF data which is captured by the camera and that's good because that is readable, bioliner and displayable as well so we get that for nothing. Beyond that they're effectively working with what the reader is giving them but we've tried to make things as easy as possible for them so things like they can put in a little rights code which can be extrapolated out to give them information about the work rights, in this case it's an off from work. The reproduction rights in this case university of Edinburgh and a creative commons license on it as well and then we'll offer them things to do with filtering and things like that to allow them to get data in quickly and we'll also allow them to the code underneath will muck about with the data and put it into the various formats it needs for display. Something else that the underlying code needs to do is embed data into the image in case we ever actually lose the parent system that it comes from and in this case we fire off a command line tool which is open source called XF2 which effectively squats IPTC and XMP data into the image which can ultimately be seen on the file info tab on Photoshop. This is something that has got no windows support whatsoever and the fact that we run this from the Excel micro means that we are completely on our own when things go wrong but it's a very handy tool what it was. And then the main kind of bit in terms of the data is CSV generation. Luna expects a CSV of data to accompany the images that get loaded up and we'll generate these off the macro on the back end as the template gets more and more unwieldy so does the code to generate the CSV but we've got a handle on it. And the number of collections that we're using are growing as well. We're up to about 20 that are expected to be created from this process so there's a fear about housekeeping that needs to go on to make sure that everything's going to the right place. But when we get to that point everything is on Luna and I call this a skeleton make up I apologise for mentioning skeletons again but it really is to get the point across that we've got good identification data here but not a lot of description. But what was there is kind of typical of what we get but quite simply so much of the time what's actually depicted in the image is not actually anywhere in the net. This is a good example because you can do a search on Luna for skeleton and that image won't actually come up because nobody's ever actually tagged it so we need to think of other ways of getting these records enriched. Now a lot of the things that are kicking about just now are there to help us with these things and we've started looking at visual and facial recognition software and that kind of thing to see if they can help us at all with getting machine generated data. I think her stuff is a bit obscure and we've not managed to get much of that yet but you know as the tools improve I think we can look forward to that helping us. But we have certainly worked with the crowd and about a year ago we developed some fairly low brow methodated games in the loose sense of the word. Which are basically my simple PHP forms effectively which throw up random images and are just looking for tags from users to see what they see on the screen. We skinned them in sort of gold style 1980s sort of computer looking field to sort of emphasise the fact that this was not rocket science. And we ran them at events where we basically got students to enter tags based on pure blibli. We offered them coffee vouchers and lollar pops and things like that. And we were really quite surprised by how well it worked. We got 10,000 tags off the last few months which we've been able to push back into later. So that's been great and there's I think the five on the competitive aspect of having a scoreboard and getting to vote on each other's tags. Now there are issues with doing this. You need to know what is the correct number of views of a better methodated to make it approved. You need to in some way in your output system clearly mark that this has come from the point that it's not academically sourced and that it shouldn't have such been taken with a pinch of salt. And we've also got a moderation module which we thought would be easy to deal with. But as I say it was more popular than we expected and we currently have 20,000 tags something purgatory that we've got to do something about. So I think we are probably going to get rid of that. But the main issue I think is engagement and prior this is online and it's there for anyone to use at any point. What we find is that people will only play these games when we can around them down their phones at the event because we haven't actually got that engagement with the community yet. We've looked at other great sourcing tools as well. This one at the end here is from a bunch of guys called Tilt Factor who are in Dartmouth in New Hampshire. They work heavily with this sort of thing and they work in the British Library and they've got a lot of images at the moment. We've also worked with the Public Cataloging Foundation who do things with the BBC and they put a lot of things on their side. But there was much more of that material so we didn't get so much off that. But anyway the end of the workflow once we've got our tags, this is really we just do this ad hoc whatever we've got enough material to justify it. It's just extracted from the MySQL, turned into CSV, gets some API calls which are authenticated and then they're into the linear database. So that's the sort of technical workflows as it stands. But we realise that probably in terms of getting this stuff used and getting people interested, engagement is the key. For that we try to get the data to be as open as possible so that people can get and use it. And the things that we have done including building a flicker API which just loads stuff to flicker behind the scenes. And you get loads of stuff back from there, you get loads of views, much more than we actually get with linear. And then we've made everything publicly available through OAPMH. I don't know if you're familiar with this concept but it's a protocol for harvesting metadata based on URLs and underlying XML. It puts in whichever metadata form that you may require. That's why he's an attractor because literally he's harvesting data. And then we also push our stuff into Europeana which is, I'm sure you're familiar with, a big aggregator of European institutions and their objects. Now this is great because it puts us in a level footing with a lot of big institutions. And there's lots of work going on building APIs and working with the data. And people are getting at it. The problem is there's so much stuff in there that is a massive point to swimming. And we do think that it's quite hard to get our stuff noticed against all this other things. But we are in there as intended. OK, so the other things beyond making the data open, which I should say actually just regarding the data, we have found that people aren't using the data. We've got informatic students who have been building apps out of our metadata and images. And we've had quite recently we've had an artist come up from London who was interested in using our images to create some curtains of certain collages of the images. But she specifically wanted the Google Analytics data because she wanted the stuff that nobody had looked at before. It was quite an unusual request and I have to say that once she looked at them there was a whole existential question of, well surely they're out of scope then. But what she made was absolutely beautiful and it was a really interesting thing to do with the collections. But as I say, making images open by default is something that we are pushing towards as a sort of statement to try and get more engagement with our collections. This is partly because we've realised that trying to sell high resolution images hasn't been a perfectly lucrative and constrained for us. But also we're starting to feel that it's just more important that people kind of get hold of our stuff and work with them and find out about it basically. So to do that, it's been really creatures of the fall then. Anything we know about the copyright, whether it's ours or whether it's out of copyright or it's not from work, we can effectively slap a CC by licence on to which you can probably see on there. And we also are able to make the export resolution in London in front of as big as possible. In this case 12,000 pixels on the long side. I don't think you can expect anything bigger than that. And then the sort of national conclusion of this is to make stuff completely open. We are starting to get involved with the international imaging interoperability framework which has been developed in Stanford University. And the main idea behind this is that your images are opened up and only ever have to be hosted at once. So where you might find yourself wanting to manipulate your image and get hold of that image and do what you want with it to put it online this basically completely eradicates the need for Photoshop because everything that you do with it is passed in through parameters as part of your URL. So just to give you an idea, you can mirror your image by passing on relevant parameters. You can squeeze it, you can grayscale it. You can rotate it and you can crop it in detail here and it's all just done through URL. So that's what we are moving towards and Alun is going to offer support for us in the next release. We're going forward to getting all our stuff into that format. So that's basically where we are. Just to recap, the stages that we feel that we can go through and look at to meet on the borders of our records as we put it. First, let the reader do the work. Get them to do is to give you as much stuff as possible from the outset. Get a robust workflow in place when you support the people that are actually capturing the metadata so help them out with shorthand and that kind of thing. Get as much machine generated data as you can whether it's EXF or whether you're managing to get stuff from budget recognition. And then, if you need more, take it to the crowd and get them to give you what they can. And finally, we just need to get as much engagement with the collections as possible, but we do that by making the data hopefully available by making the others. I'll put a few links for you there. I think a few. Images is the main images front-end, but we also have a collections repository which sits behind that, which has been around quite recently. And the method of the games are just talking about on this side here, the library labs. That's very welcome. Yes. For my sins, I'm afraid that once I get a slightly obsessed with a skeleton's idea, it'll just run with it. Yes, I apologise for that. That's also really cool. But the serious question is, have you tried or spoke to any departments dealing with computer vision or digital humanities to help you with clustering images and when you're targeting them? Well, I guess we've kind of spoken to people like, I guess things like the universe and I don't know if you're familiar with them, the guys that work at Oxford who are basically embracing this whole science thing and finding as many ways as possible to get a community in place. And they've got a lot of tools which you can actually plug into and start working with. So we're looking at that, but I think we're kind of conscious that we need to do more in terms of getting out of a little bit of silo and not just bothering our own things for communities that only we can reach with. We need to get that push out. Certainly, more in particular in that way. Possibly the digital humanities. I was at a summer school last couple of years ago and I looked for them. Basically, I think you see what's at the moment if you can just tap in for a community and I think that's what we haven't done yet. Flexibly much for this. It's good.