 Helen is the UQ Project Manager of Scholarly Communication and Repository Service. Today Helen will talk about data publishing at UQ. Helen, over to you. Hi, good afternoon everyone. I was really pleased when Natasha got me to talk about data publishing because here at UQ Library we've been trying to build a bit of a solution around that and particularly for the long tail of research data here at UQ. So I have said it specifically about data publishing at UQ Library because I'm very aware there are some groups around UQ doing some fabulous work in this space but I will focus very much on what we're doing here in the repository. And so thinking about data publishing really raises questions of why we're going to do it, how we're going to do it, how our research is going to gain credit for the data that they've produced and particularly how they're going to gain credit separate from and in addition to the analysis of this data in publications. So what we're really trying to look at is really how we can build those meaningful connections between publishing the data and publishing the scholarly work. It's actually my favorite part of the data lifecycle, data publishing because it can be both the beginning and the end. So it might be that you're tidying up your data at the end of a project and looking to archive it. But if you go one step beyond that, I do archiving your work and you really start looking at depositing your data or publishing it and you really are giving sort of the start of another project. You're really, really putting your data out there to become the beginning as well as the end. So there's been talk for a while now about making data a first-class scientific output. Here in this paper from 2012, they discussed achieving that through formalizing the methods for citation and publication and thereby incentivizing data sharing. I think that's really important when we go around talking to researchers here at UQ is to really make sure that they're understanding that the incentives behind sharing data, if we talk about the data with them about being a primary research output, that really starts to click with them and they start to understand more often where we're coming from. Crucially, a point of difference which we talk about with researchers is around archiving data versus publishing data. So if when you archive your research data, that can be obviously very beneficial in terms of preserving the data. But when you publish it, it allows for things like validation and peer review of the data which really enhances science as a whole. So we're going to researchers talking to them about not only the academic credit that they'll get but also about that the results of their work will be verified by others that they'll be able to expose their data to decide a peer review which to some of them can be quite scary especially when we're talking again because that's that long tail of research data who perhaps aren't as familiar with this idea as data sharing is on this. But we're really trying to provide a mechanism to ensure the quality of data sets available. So at UQ, what do researchers want when we go out and talk to them? What is it that we're saying that they want? They would like, I think, research data archiving somewhere to preserve their research data a way of sharing it, a way to publish their research data in a way that treats it as a primary research output and that's crucial I think as to why we've implemented the data publishing infrastructure here in our institutional repository. We very much wanted researchers to feel that they were going through that process of publication in a simple way as they would with their other scholarly work. And we do talk to them about peer review and verifiable results making sure their results are validated and reproducible and the idea of getting academic credit. But I'm just putting all these words in their mouth. I'm not sure do researchers really know that they want that? So when we go out and talk to people, we've done a lot of work in this area and we're very lucky here in the library to have a team of librarians who work in research output services as well as that client service liaison librarians. So we're able to go out and talk to researchers about what they actually do want. So we did a couple of things. We've continually evaluated our data management service since 2014 as well as collected user stories from people. So they tell us that the largest ever data sets they work with perhaps aren't that big. You know we are really trying to provide this facility here for people who don't have other options who aren't working in these big areas which perhaps provide these nice fancy work clothes for them. So they're telling us they're not working with huge data sets but that they have many different types of data. That their storage locations for archive data are a little concerning that they will store it on their external hard drive or on their computer. So we know they want to preserve their data and they want to save it into the future but perhaps they're not sure about how to do that. We know that 53% of them wanted to keep their data permanently. So the idea of data archiving isn't something that they're adverse to. They're happy to keep their data permanently but it's taking that next step and actually publishing their data, sharing their data that perhaps they're trying to facilitate. So these are some of the real user stories, real researchers said these aren't things I've made up and that they want to store their research data in such a way that others can cite it. That seems to be really important that they get credit for their work but they need access to institutional repository storage solutions for their data as required by the journals they intend to publish in. So we did a bit of an environmental scan recently where we looked at for the past five years everything that UQ has published we analyzed those publications by journal and by funder and then one of our data librarians went and dug out for the top 25 journals by productivity so by sheer number of publications at UQ and also by an overall time cited so you can say by an overall total number of sites for papers in certain journals. So we got two lists of top 25 journals, one for productivity and one for overall time cited. What we found were those policies for those journals and only 7 out of 25 in terms of sheer weight of numbers required data sharing, still a lot 7 out of 25 but in the highly cited or you know journal list of the 25, 18 out of the 25 had a data sharing policy in place. So we know that UQ researchers are publishing in journals a huge number of which 18 out of 25 which are requiring them to share their data. So this researcher here isn't unusual and the most frequent phone call we're getting at the moment in the team is people who are trying to publish their research data in a journal that's requiring them to deposit their data somewhere and they're looking for a solution to that problem. They also would like stats on who downloads their data so that's a little bit more difficult to work through for them but they are interested in who's looking at their data. This researcher said they needed to be able to securely store their sensitive data but also share it with other researchers and collaborators. So we knew that we had to build infrastructure that made sense to people who had data that perhaps needed to be mediated access. This person would need to be able to permanently store their research data in a way that was open and accessible in order to meet the requirements of a funding agency. So as well as analyzing UQ's research output by journal requirements we did the same for funding agency requirements. We looked at all the funding agencies named on UQ research outputs in the last five years and we found that there are multiple funding agencies that are putting this pressure on to researchers to make sure that their data is open and accessible. That's both Australian ones as well as international ones named on UQ research publications. We knew that they wanted to store and accommodate all the research data along with everything that goes with it. So they need facilities to apply data dictionaries, metadata, lab notebooks so that it can be used by researchers in the future. So these are all really great user stories to come from our researchers. These are really good use cases that we're able to accommodate using our institutional repository. And I do think that over the years that we've been here and talking to researchers very much the conversation is starting to change and we really are changing that terminology now. So people are beginning to start to talk to us about data publication instead of data sharing. The conversation is really I think the start of a culture change here at UQ which is I think very good to see. And the idea that researchers should share data to advance knowledge and promote the common good is quite an old idea but in recent years we're really seeing a lot of enthusiasm from them. I think because people are starting to look at how they can get that academic credit and how it can lead to very much a conversation around research integrity and an audit trail from raw to published data but then also from the published data to the publication. And I think that's where you get very strong trustability. And this is what we're working towards is really the idea that data is deposited alongside at the same time as publication of any scholarly output. So at a time that UQ researcher is publishing a paper that we give them an easy workflow and a trusted system for them to deposit the data that goes along with that publication and link the two things together. And really by integrating the data publishing with the other publishing we're giving them real credibility, trustability. So it says here in this paper, the relationship is best accomplished in systems and repositories where the custodian has trusted status within their relevant communities. And again I think that's why it fits really well in the repository and really well with the library. But it also requires robust infrastructure that's quick and simple to use and we first implemented the form which I will show you very shortly in our repository a couple of years ago and it has come through a number of iterations where we've tried to make it very user centric and very straightforward for researchers to use we do want them to do it, we want them to deposit the data, we want them to describe it. And so we're trying to make it so that they can use it and be confident that it's a straightforward workflow. So if it's going to become part of normal scientific practice it really does have to be easy to achieve. So when researchers come and talk to us about publishing their research data we will quite often talk to them about if there's a discipline specific repository because I do feel that those are very relevant to certain researchers and we talked to them about instead of archiving their data on their external hard drive perhaps go and use a specific repository like that or we tell them also about UQE space. So the fact that they can actually describe their research data in UQE space we talked to them about the idea that data that underpin a journal article should be made concurrently available and we talked to them about the fact that we can link that data metadata record with their publication metadata record. They can be shown to be related objects and I think that really just they start to really understand the value behind what we're trying to achieve here. We make it discoverable so we obviously send all our research data metadata up through to Research Data Australia and then we also send that through to the data citation index so we're able to track citations of their data sets through that which has been really a key thing I think for people to really comprehend the impact that this can have, which is really good. So I'll show you a little bit more about how, but we have this need some extra help, email data at library so we have a generic email address there which comes through to the team here in the library. We're very lucky to have some very skilled and specialist data librarians working here. We have I suppose it's a relatively small team but a very dedicated team that work very hard to process these records as they come through and to really have those conversations with researchers articulating clearly the relevant funder and journal requirements that they can use the institutional repository that it's known, that it's trusted that it can integrate with that other publication workflows and link to that other related publications or data sets. We try and really keep it very research-centric and build them a profile of their data sets. We can give them DOIs for the data sets. We show them how to license the data set. We show them how to cite it and how they should be showing other people how to cite their data correctly. We still find a lot of people just either acknowledge a data set or mention it somewhere in the paper so we're really trying to push them out to proper citation. And then we can do things like if their data is actually stored in a trusted subject specifically repository, we can link out to that or they can upload their data if it's a fairly small data set. They can choose mediated access to their data or they can choose open access so they can actually link to it or upload it or they can just have a contact person so that people that wear the data set exist but they would like to mediate the access to that. And we can also add an embargo period if required. So if somebody comes to us and says they need six months, twelve months and buy a period on the data set, we can facilitate that as well. So this is what UQE space looks like, the home page. And when you log in, you go into my UQE space. You'll notice here you can see my UPO options. That's something that Elliot and admin can see. So a researcher starts with the tabs, my research, possibly my research, admissing publication. Then they have two more options, my research data and admissing research data. And I really think by having the data sets up there in that prominent along with the publications gives the right message. It gives the status of research data as a primary research output. So they know they're getting the list of my research publications. They know they can claim publications that might possibly be theirs and the system will present those to them. But they can add publications if they think we're missing them but that they can also get a list of their research data, which it looks like this. The data sets below are currently attributed to you and people really like this page. But then they can also go to admissing research data set and this is what they get. It's a fairly simple form and like I said, we've gone through a couple of different iterations and we are actually looking to redesign all the forms in each space at which point it will get a bit of a facelift but I think we're pretty happy with the fields that we've got in there at the moment. So the citizen goes in. They add a simple amount of metadata, not too much. All the mandatory fields are up top so they can fill those in and get a lot of that done very quickly. They go through and they can add access conditions. So this is where they'll tell us if they'd like to be open access or mediate access. And this is where they'll pick a license in terms of access for the data set, which we talked to them about in great detail because obviously if you're making your data available online, you need to make sure that you're releasing it under conditions that you feel comfortable with and they'll also allow for reuse. So we talked to them about what the different restrictions on the different licenses mean. We talked to them about copyright and whether or not copyright exists in their data. If copyright doesn't exist in their data, which quite happens the case in Australia, we talked to them about UQ terms and conditions, which is a very simple thing that says, you're very welcome to use my data. Do anything you want with it but I'd like you to attribute me. So we talked about various options around licensing in terms of access just to make sure that they're feeling comfortable. I think for some people it's quite a new idea that they're just going to put their data out there online or publish their data online. Then we go through various things. They can upload their work. They can add links to the location of data if it's for in Pangea or Dryaddon with their repository, for example. And then they tick a little deposit agreement that says they're the creator or the co-creator, that they're authorized to deposit it. They've got permission to include any third party content that it's original, doesn't infringe any legal rights. And that by depositing it, they're granting UQE space a license to reproduce it and make it available. And that the data creators' moral rights to be associated will be respected by UQE space. And then before the record's published, it's checked by one of our specialist research output librarians. So every record that comes through, every time a researcher says Admission Data Collection, they fill through the metadata. It doesn't go automatically published online. It comes through to our team. We check very carefully through the record and we quite often will contact the researcher and speak to them about the metadata that they provided and make sure that it's a rich resource because I do feel if you're publishing data, the metadata you provide around it is very important. And to make sure that that data and metadata are consistently high standard would be certainly a thing that we have here at UQE Library. So then you end up with a final record. This is a record from the E-FISH Genomic Database repository. They're a great group here at UQE. They analyze all these amazing fish and sharks and they get all the genomic information of which they say they use roughly about 3% of the information that they collect and then they're very happy to make the full amount of information available online. And you can see here we've got the file actually attached there so people can just download it. And we also have a link through to the full text publication. So we're making sure that you've got that trail from the dataset to the publication and also to any other related publications or datasets. I do think that's the main thing here by popping all this information directly into their institutional repository. It's really giving us that advantage and that integration with other aspects of publishing which is where you're going to get the credibility I think with researchers. This is the second half of the record. You can see they pick their creative commons attribution but non-commercial license. It tells you about type of data. All very standard metadata but enough for you to go off if you're going to try and discover the dataset. So in the future here at UQE Library some of the plans that we have is to centre around creating more of this research-centric data management infrastructure. So we have a couple of different projects on the go at the moment funded by the enhancing systems and services and suites of projects. I guess you would call them here at UQE. They are trying very much to provide this umbrella and university-wide infrastructure that's really going to help researchers sort out their workflows and that includes management and use of the data from the DMP all the way through to storage, preservation and reuse. So we expect this will tie in very closely with the existing information that we have in eSpace. We know from our user research that research has required that easy-to-use infrastructure that's available to them at no cost and that allows for best practice work with but with minimum administrative intervention. So we're not trying to give them an administrative task to do but we're trying almost like John said to be collecting that metadata earlier in the process so that by the time it comes to the end of publishing they're not having to remember everything that they've already got quite a well-established set of metadata by that point. So currently at the UQE library we do have a DMP online tool. There's no flaw in a metadata from that into the repository and there's no links to storage provisioning. There's no links to published record metadata. However, we are well positioned to capture that information in eSpace because we know we've got the infrastructure I've shown you now. We've got that complementary projects around data sharing. We know we can do the licensing, DOI. We can send it through to RDA. We can send it through to DCI. So we know we're in a good position to do this. We've done an awful lot of brainstorming and I like the little bit there on the wall that says can do. We know we can do this. There's also one that says it's my data. I'm not publishing it but I don't believe that one. So we really are going to work towards thinking about having an idea of project level minimum viable metadata which can be fleshed out into a DMP which can have other information added to it. We really are at UQE trying to look across a huge number of different disciplines and they all require something slightly different and a few of them have different ideas as to what data publishing even is. So by keeping this idea of minimum viable metadata at the project level we're keeping it very simple. That allows, I think, as wide as we can possibly get at UQE coverage. We're not trying to go for everyone. I said at the beginning there are people at UQE doing this really well without us so we're not trying to over arching trying to get onto all of those people but for the people that don't have working systems the new system will allow research project level metadata captured in a DMP to cascade the data lifecycle automatically provision data storage and then we can use that information to publish one or more data dataset metadata records linking back to the original data and also linking forward to a set of publications that came from that project level and data collection. So I think that's a really good situation to be getting into and certainly that's the vision although I think it would be not coming probably I'm going to say 12 months but it's certainly the direction that we're heading in and I've got a quote here from Vincent Smith who says the power of published data is amplified by ingenuity through applications and uses unimagined by the original and distant from the original field without connecting these disparate datasets the true potential of data reuse and repurposing is lost and that's from his paper on data publication towards a database of everything in which he has the idea that perhaps we can I want to say co-iculate everything into one large huge database that can be queried and solve all kinds of interesting problems so I really do think that publishing data is something worth investing a lot of infrastructure a lot of thought, a lot of infrastructure into and something we're very excited to be part of here at the library today. Thanks Helen we look forward to hearing more from you in one year's time but for now Susanna let's move to the question time any questions? Yes there is one there is it possible that UQ can share the list of journals that require data publishing I'm about to start working this out for the journals that I see you publishing and it would be great to have a central repository for this information Yes I'm very happy to share that information, we did look very specifically at UQ's publications of many and then slice the data just everywhere we've been publishing but I'd imagine it would be very similar across the university so I'm very happy to share that information That sounds fantastic So what software are you using is the question? We use it's an in-house it's just what's used to build our institute for repository so I do believe it's all open source and online but yeah it's all the in-house development stuff Apart from the DMP online which is an implementation of the DCCs DMP online from the UK Obviously you're welcome with your brilliance there Alan That's all the questions we have for the moment Perfect, thanks everyone Thanks for your time Thank you