 Can we dim the lights a little bit? Sorry Peter, I have to have the right mood. Okay, good morning everybody. How much time do I have? Do I have until 20 minutes? Yeah, 20. Okay, cool. So I'm going to talk to you today about Prime, which is a JISC funded project in the UK, which aims to establish a mechanism for metadata exchange between publishers, repositories, and institutional repositories as well, to try and link up the data that's in various locations and make it more discoverable. So I'm from Ubiquiti Press. We're a spin-out of UCL. We're a researcher-led press. We aim to do things a little bit differently from a lot of the established publishers. We're fully open access, for example, and we try to work very closely with the research community to reflect their requirements, which is one of the reasons we take part in projects like this before the JISC. I'll give you a very quick overview. That's an archaeologist because part of the Prime project is centered around archaeology as its focus, just to get things started. I'm going to talk about the why, where, and how of the Prime project to try and make it as relatable as possible. The first thing is about why exchange metadata in the first place, why have we decided that this mechanism would be useful and where to do that. One of the mechanisms we're using to try and incentivise and enable the deposit of datasets is that of data journals. I'll show you some examples of those. The Prime project is then really about how to disseminate the data once it's been deposited and made available. That will all make a lot more sense in the coming slides. The big thing about why, one of the things we focus on with why you would exchange, why you would share data in the first place and then why you would try and make that more available by sharing the metadata, is why do researchers engage with sharing data in the first place? What's their motivation to do it? This is my experimental cartoon to see if people in different countries laugh at it. One of the reasons is researchers want to have the maximum dissemination for their research. They want, as Richard Dawkins does, they want to be a household name and they want all researchers to be influenced by their ideas. In order to do that, they need mechanisms in order to get their research out further. They can't just put it up on the web or something, they need systems to help them. This is a slide you saw yesterday partly. Sorry, this is another slide. These are some of the things we look at as the motivations for researchers. There are a lot of public benefits such as public trust and science re-used by small and medium enterprises. There are a lot of benefits for the researchers themselves. Enhanced reputation, career recognition, citations, et cetera. A lot of benefits for the research community such as being able to share data, create new science by mashing up data sets from different disciplines, et cetera. Everybody in this room knows there are a lot of benefits to sharing research data in all of these areas. It's just about, first of all, getting the researchers to understand that, getting especially publishers engaged in this and building systems that will make it more possible. This was the slide you partly saw yesterday. The very first journal published in the world in 1665. The whole point of this was that in order for science to advance, we have to share information. We have to let other people validate what we've done, build upon it, and it's the whole principle of standing on the shoulders of giants, et cetera, is what makes science, makes the research world advance. This is what we call the social contract of science. There was a very good point made yesterday that if people aren't following the social contract of science, if they're not sharing their research and they're not sharing their data, it's basically scientific malpractice. I think it was a really, really good point that publishers, just as much as researchers can be guilty of this if they don't build systems which enable the efficient exchange of information. What we really need from publishers and from all the systems we're talking about building here and interoperating with is effective, efficient distribution models that can get researchers' information out to the widest possible audiences, which is what they want. This is also something we see now where research funders are just demanding, governments are demanding, and it will become the main model. It's a question of how long it takes and who gets dragged kicking and screaming into it. So, what we're looking at on the prime project is how to, once we've incentivised people to deposit data in places and to share the data, they've often done this in a very wide range of places, so this is, for anyone who have read the XKCD cartoons, this is my adaptation of one of their maps. There are lots and lots of different types of data repositories, we have general ones, we have subject ones, we have national ones, we have institutional ones. Researchers can be putting their data in any of these places, often with different types of metadata and what we're trying to do is find a way to make that more discoverable. So, for example, on one of our data journals we deal with all these different repositories, for archaeology data and we want to know if someone puts their data into one of these repositories, probably people who are searching in the other ones want to know about it as well, how can we get all these repositories to talk to each other and share information about what their holdings are to make it much easier to find. So, one of the ways we first of all try to incentivise people to do this is through data journals. My slides are a little bit of a surprise to me, I'm a little bit sleepy this morning. The way a data journal works is you have data in a repository, you can also have software as well, it's not a big difference and ideally that repository first of all is a good one, it has a sustainable business plan, it has a proper preservation plan and it gives persistent identifiers like a DOI to the data. One of the reasons I talk a lot more about DOIs is because researchers already understand DOIs, they use to cite data, to cite regular research papers, so we think it's very, very important to use them for data as well, because they understand the benefit and the use of them for citation. Then the data journal comes along and basically you have a series of data papers which also have a DOI and they link to each other through those DOIs so there's a permanent link there that can't be broken, it will persist. So the data paper simply describes the data and it's then referred to from research articles so they're not the same thing, so if you write a research paper and you want to refer to the data you used you would refer back to the data paper and that sounds like a bit of a roundabout way but they have very distinct purposes so the person who actually created the data set could be the lead author on the data paper but they might only be the fifth or the sixth or not even an author on the research paper so it's a way of giving people credit for having created data and having released it and made it open. And the big benefit of that also is that because the data paper has a DOI you can do all of the normal citation metrics and things, you can see how many people have cited it you can start looking at impact metrics which we were talking about for the ref before which are very very important how many times has that data actually influenced an article on Wikipedia who's tweeting it, who's Facebook liking it etc what's happening in the wider community so that's really the aim of the data journal it's there to incentivise the people to create the data to put it into a repository and it's also a way to link up lots all of those repositories into one place where you've got a rich layer of metadata to make those things and all those siloed locations findable and very quickly so the reason we use the data site DOIs is what some researchers are used to doing but we use a cross-riff DOI for the data paper because it's the only way to track the citations so you can't track the usage of a data site DOI yet but you can track from a cross-riff DOI so that's why the paper is very useful and data paper very quickly just to make it really clear describes the methodology with which a dataset was created the data set itself describes the reuse potential the reuse potential is very important it's really what we're advertising with the paper saying come and find this data in this location and reuse it these are the reasons so it's not a research paper and it's not a replication of information of a repository it's really much richer information to help that data be found and reused and the very important part about it is that it is a fully peer reviewed thing and the big part about the peer review is that it's about encouraging best practice so the person the data scientist only gets to publish the data paper if they followed best practice so they have to put the data into an acceptable repository with a sustainable model and a good preservation plan they must give it an open license they must make it actionable and open all the things that were talked about yesterday to make it intelligent was the word and then they get the paper so the peer review is really about enforcing best practice the data could be rubbish it could be awful we still want people to publish it that can then be used to invalidate their research papers which are in could be talking about items in the public interest for example so that's the way that works and so we started off at Ubiquiti Press doing a trial with a journal called the journal of open archaeology data which works just like that and that's the one we're using for the prime project so in order to test prime out where we're sticking within the confines of archaeology but at the same time this is now expanding out we've got psychology data for example there's been a lot of data fraud in the Netherlands recently so we're launching a journal to help people validate the data behind research publications and psychology also public health and doing the same thing for software as well so we have an open research software journal to encourage people to make their software siteable but for this for the purpose of primalist we're focusing on the archaeology data journal so now to jump into the prime project the aim is to develop a system to exchange metadata between in this case the UCL Discovery Eprints repository at University College London the archaeology data service subject repository in York and journal of open archaeology data so we're taking a data journal a subject specific repository and trying to get the metadata efficiently exchanged between those things and principle is once we've done that that could then be rolled out to all other institutions other journals and other subject repositories so we're focusing on archaeology building on some other just projects people might have heard of Dryad UK reward and Swordarm reward project very quickly was a precursor to prime and basically it was a tested whether we could use data journals in order to get people to put their data into institutional repositories and we found that that was very successful at UCL so we're using that workflow again in prime and we worked in things like preparing data management plans as well and that's the way that works I'll jump over that so once again the aim of prime is to find out where is the data it's to get researchers to discover data that they're interested in so for example if a researcher has put from UCL has put their data into the archaeology data service UCL wants to know about that they want to have a record of it because it's one of their researchers so that's our first use case I'm just going to show you three quick use cases to finish things off in this case a UCL researcher would take their data put it into the subject repository the metadata are about that and the DIY would be sent to the institutional repository to create a record there and the aim being UCL wants to have a complete record of all its outputs it's very important for the REF which was mentioned earlier and that way they have a complete record but the data is still held and curated in the place the researcher thinks is the best place which is the subject repository where people have the right expertise that can go exactly the other way so a researcher can put some archaeology data into the institutional repository and the system knows that ADS is very interested in archaeology data so it then receives a record about it as well but once again the data is held and only held in the main place that's referred to by DOI and further out and the third use case involves the journal in this case the in the course of writing the data paper all of the metadata will be collected to various subject repositories and this is based on the way we did the Dryad UK project which was about getting researchers to write who were submitting articles in evolutionary biology journals to also submit the data behind those articles to the Dryad data repository so we're doing the same thing really the researcher comes to the journal writes their data paper which is basically a high level metadata record about the data and then gets sent to the repository and they get given a link which says click on this link to deposit your data in the repository and they go through and everything's already filled out for them because the metadata's gone ahead they simply upload the data set into the repository and then because we also know that the institutional repository is interested in this a record is then sent on to the institution as well so the aim is that the researcher only has to enter the metadata once and that it's replicated to all the stakeholders that are interested in it but the data itself is only held in the most optimal place for it to be curated and preserved right so I'm out of breath and very quickly my last slide is that the methodology we're employing for this so that this project is just getting underway now go backwards we're creating a metadata profile and initially this is going to be based simply on the data site metadata profile we're not looking at any reason we need to extend that at the moment though at some point things like serif elements could possibly be very interesting but this is really about the minimum amount of metadata that needs to be exchanged between repositories like who created the data set, who is the funder what's the title of it and what's the persistent identifier even just that small amount of information is enough to make a useful record in any of these repositories along with the subject information and then the metadata exchange initially we're going to use symplectic elements which was also mentioned in the last talk which is a system which harvests data from around from various publishers and other places and makes them available to the university repository as a way of putting the information into a UCL discovery and this way if someone puts some data into the ADS for example the archaeology data service symplectic will discover that and then we'll send it through to UCL and the researcher will be asked is this your data, are you willing to claim it if they just click the S button then it creates a record in the repository for them so they harvest that and later on we're going to extend this to any EPRINCE repository even if it doesn't use symplectic and then we're going to collect case studies as well from all the researchers who take part and whose data is propagated through the system to try and understand how they're motivated to use the system and whether they feel that it's a useful thing and whether it has much impact on the discoverability of their data and that's it, thank you Thank you Brian, please