 Good afternoon, everyone. My name is Xiaoping Shen. I'm from Ants. Unfortunately, my colleague Natasha is not feeling well today, so I will be your host today. My colleague Susanna is also co-hosting today's webinar with me. This webinar is part of webinar series co-sponsored by Ants and the Council of Australia University Libraries on Seam of Research Data Information Integrations. Our previous through webinar has already covered the DMP tools, system managing ethics, and the data storage. The recording of these three webinars are available on Ants YouTube channel, and today we will talk about the data publishing. First of all, we would like to acknowledge our co-sponsor, the Council of Australia University Libraries. We would like to thank them for their support. Secondly, we would like to say acknowledge Commonwealth government for their support of Ants and increased programs. So, with that, let me introduce our first speaker, Dom Hogan. Dom is from Siro Research Data Service Support Team, Information Management and Technology site. Today, Dom will talk about the data publishing at Siro. Dom, over to you. Yeah, good day, everyone. Thanks for having me along. So, just to explain the broader context of data in Siro, I have this little slide here from what things looked like about 10 years ago, or maybe a little more than 10 years ago now, and we had about 20 or so. Actually, I've forgotten the count, but we had a number of different, what were called divisions at the time, and each of these pretty much ran their own show. They got their portion of Siro's funding, and they had their own departments, they had their own libraries, and there was collaboration between divisions, and there was collaboration between the libraries, there was a Siro library network, but all in all, there were varying standards of information management throughout the organization, just due to the nature that they were run separately. So, around that time, what happened is that there was a change to the formation of information management and technology, and so this was one service for all of the divisions in Siro, which at the time included IT libraries and records, and that allowed us to take a unified approach to say data storage, networking, computer infrastructure, and so two of the things that came out of this was the publications repository, so we were able to merge our legacy publication citations, and also have a unified approval system for new publications from 2009 on, and we also got working on data.siro.au, which is our data repository. Now, as I said before, with those many organizations that we had within CSIRO, some of them had very high data management standards, and then others had much lower standards, not due to the needs of the research, so bringing in this data repository. Actually, there would have been some organization, some part of PUTs of Siro, that actually felt that they had things pretty much under control, and they weren't really in need of a new repository, whereas other parts of the organization had been really crying out for this sort of thing. So the goals with this repository was to provide consistent access to data, and also version control, so this sort of enables the reproducibility of the scientific outputs, so that you can actually get down to individual versions of what you have. And the other goal there was self-service, so we wanted Siro researchers to be able to just log on and create their own data collections, write their own metadata. Now, that can sound a bit scary, as maybe researchers can be forced to write that metadata. So far, what we've found is that the people who are using it are the people who really want to use it, and sort of they want to get their data out there, and so they usually put the effort in to write fairly decent standard metadata. And there's also an approval system in there, so that the approvers can go through and suggest reviewers, and there's a little bit of peer review that can go on there. And another goal in here was scalable storage. So as we have many, many terabytes or petabytes of data coming in towards us in the future, the way to store it, so that maybe how things were done before is you would have, I'd say, a file server with as much disk space as you could put in there, and file sorted it however they happen to be. And there's an expectation that those files will be available to me right now. I'll be able to get them. But that's a very expensive way to host data, especially when you're getting into the petabyte scale. So what the DAP and what other parts of Sira storage going for is these different standards of storage categories, so that data that maybe need to be preserved, but doesn't necessarily need to be accessed instantly at any given time, can sit on tape, and then when it's requested, it's loaded from tape onto disk where it can then be accessed. And so I've taken this slide from in-corner and Renate Tihouse, who are working the storage area of CSIRO. And this is a bit of a model of their idea of a scientific workflow where in our cloud, what we have here are the various storage categories. And so the researchers don't need to think of the individual machines hosting any of this data. It's somewhat abstracted from them. All they have to do is know what the address is, and then the storage team actually takes care of the rest. So they don't need to migrate between servers when there's an upgrade to the hardware. They can just keep pointing to the same address. And then you have these different storage categories like input. You have the verified work and static. And so they are actually stored on different media that are optimized for different uses. So the static reference data may be sitting on tape and then reloaded later on when we need to reprocess it. And you can also see that the idea in there is that as a project goes on, it would be moving the data through various quality control processes and eventually to publication. And the publication part, I guess, is where the data access portal comes in. But not always. Sometimes it can be a domain repository. And then various parts of CSIRO have their own ways of managing data. But we're trying to move towards having one unified repository that at least catalogs everything in CSIRO. And we're making progress. So the data access portal itself. Now hopefully no one can read this because I realize there's a few errors in this diagram. But what I'm trying to represent here is that the data access portal is not really just one system. It's actually a few systems that play together nicely. So what we might have, I don't know if anyone can see my mouse cursor here. But so without CSIRO researcher, they're entering metadata into the user interface and uploading, at the moment, uploading their data to an SFTV staging server, which is all fairly straightforward. And most researchers get that done without ever asking for any help. And then, of course, you've got a database that stores this data or the metadata and also assesses the data files that come in and records some metadata about the files as well. And then that all gets sent off to this thing called the logical collection manager. And what's happening there is this is the thing that takes in the requests for data or takes in the submissions for data and then decides what to do with it. So if the data happens to be someone's requesting data from TAPE, so we have our research community and they're saying either asking the user interface or the web services interface for some data, then the logical collection manager is going, OK, well, that's sitting on TAPE. I'm going to need that to load that off onto disk here so that then somebody can then download it. And the thing about this TAPE is that the main delay that happens here is that the tapes themselves move data very fast once they're loaded. It's just that waiting for the robot to actually load the tape. So even for, say, a collection that is maybe hundreds of gigabytes, typically this will only take about, you know, 15, 20 minutes before somebody gets the notification saying, hey, here are the files, we can start downloading. And then I also wanted to talk about some of the other systems that feed into this because what we're discovering is that we're going to need to set up this sort of what we're referring to as the data ecosystem. So the various different services and utilities that interact with each other to provide a broader network of data capability. Now, here I've got this sort of big database store of SAP. That's our organizational information system. Although I believe there's actually more to it than just SAP, so take that with a pinch of salt. But anyway, so that's one system, but it doesn't necessarily present the data for easy use by other systems. So what one of the teams in IM&T have done is create a series of different web services, which just for developers really, I don't think there's much in the way of a user-friendly interface to any of this, that takes information from here and from other sources of CSRO, so like the Publications Repository, and formats that in a way that then any other, say, service or application inside of it can then use. So the Data Excess Portal grabs our business unit information and our project information from SAP, but through these web services. So we don't really need to know how SAP works, we just need to know how these web services work. And yeah, there are a really great piece of infrastructure that sort of sits underneath things. And I think a lot of the glory goes to the end applications that wind up doing great things with these, but these basic web services are what support that and enable that to happen. And so then once we're in, so this is just a screenshot of the organizational data coming through to the Data Access Portal. So the user is entering the information, they've selected their business unit and then they select their team. And there's other information in there that goes so you can get project information, information about who the project leader is and things like that. And of course we could integrate more of that in the future. So for instance there's an interface to our Publications Repository, so we would be able to conceptually say, what are the related publications? I think from these, you're listed as an author, that sort of thing. Now another thing that happens here with the DAP is we've got various different research groups that have already got metadata in their own systems, their own databases, and they wanted to put these things in our repository. So these formed pilot projects. So we had a microscopy group that had a very complex database and a lot of information about their microscopy images and they wanted to transfer that over. So they've got their own interface for doing that, which sort of semi-automates the process. So the system grabs the metadata out of their databases and then gets the user to just polish up the record, finish off the complete DAP collection record, and then they have it in the repository. And a similar thing goes on with the astronomy collection, so the first one we set up was the one for pulsar observation. And this represents a very large volume of data that also gets used quite a bit and internationally. So it's been quite a success and they also have very specific information about the radio astronomy data that they have a custom search. And a recent addition to this is the another set of radio astronomy data. This is for the from the Australian Square Kilometer array Pathfinder project. So I have a slide about that now. So this is just a sort of indication of the workflow that really it ends with one point in the data access portal, but there's a lot that happens before this. So I hope these are antennas and eventually there will be something like 32 of them and they transfer a tremendous volume of data to what's called the correlator. In fact this data, the volume of this data is really too big to actually store. It would be extremely expensive to set up a supercomputer that could do that. So what this correlator does is that then compresses that data down into something that can be transferred to the the paucy center and that that data gets processed into what is actually stored. So rather than being I think at full capacity it's going to be something like five petabytes a year. I didn't write the number down, but it's a truly scary volume of data that they're dealing with here. So they crunch this down and then actually store what they're going to store in the data archive. And then they have several different ways of interfacing this. So they have what they call the CASDA application. So that's the CYRO has CAHPS Science Data Archive and the astronomy community generally uses these virtual observatory tools which are like programming interfaces to access the data and they can query the data. And just because you can't really download this volume of data that would be crazy. They use these programming interfaces to just query it for the data they are interested in. But there's also part of that is gone into the data access portal so that people that can just use that user interface there and also search for data and request just the small portions of data that they do want to actually download and use. And then this is another one of the systems that the data access portal interacts with. This one's it's an open-dap sort of threads server and this is just one example of a fairly popular collection we have which is our a hind cast of ocean waves. What they do is they create these net CDF files and the net CDF files have embedded metadata. They have information about what fields they are and we can see over here that we have information about the layer, what the units are in. There's a whole catalog of various different datasets stored this way and the beauty of the open-dap services is that it provides various methods of accessing the data that just by default. So here I've shown an example of the mapping service. This is just a running in my browser but in theory another spatial portal could just link to the data services that run on top of this and then access the data in their own mapping, custom mapping service for whatever purpose they happen to have in mind. And so what we have in the data access portal is a set of metadata for the collection at large and that just points to these open-dap services and at the moment we only have a handful of collections that use this but they are very large and then the hope is to improve this so that any researcher can just point to an arbitrary open-dap server and say this is where my data is and that could include services at NCI or other other institutions. So here now oceans and atmospheres this used to be called marine and atmospheric. Now they have their own metadata catalog for quite some time and they have their own data center with various databases and ways of accessing things and what came up for them is that the new ship the RV investigator is collecting a lot more data than the previous vessels ever did because it has a lot more instruments and higher resolution instruments so they were looking for a way to store this data securely and also minimize the problems of actually transferring that very large volumes of data over the network. So what we do is we grab metadata from their system they can write it still using the way they used to doing things and they transfer the tapes to our system and we can actually just plug those tapes straight into our tape library and it behaves like other depth collections that have already been prepared and so what happens there is that they are able to share that with their university collaborators for the quality control process of the data that comes off the ship and then prepare that into data products that then go public. Now here's another example, Land and Water that's a group within CSIRO and they also have had a very high standard of data management over the years so they have this metadata catalog that they use internally and they use this through the working life of their project so they have a have a files server set up and they have very strict naming conventions on the folders and files and then they as they go they what certainly what they encourage their researchers to do is to write metadata this is just Anselic standard metadata in their system and then what we've done is we've so for this one and for the marine and atmospheric one we've enabled the ability to just basically upload the X and L so this at the moment that's Anselic or marine community profile metadata can go in through here and when that's done so this is just an example of one that's come through so they've that's created the depth record they can move their files in there and make that public or they can just share it with individual people so we have different restriction levels that we can add to different collections so we can restrict access to the files but leave the metadata public or we can leave respect access to even the metadata so there would be some collections in the system that I can't find even as an administrator because I'm restricted from accessing and now one thing I'll just get back to this previous slide so you'll see here that this one gets a DOI and what's we've got some policies around what gets a DOI what does not get a DOI so for instance this one has restricted files we can see a little padlock here in the files and when we click on that we get told up you'll have to log on to access that so this for whatever reason maybe commercial sensitivity or the licenses of source data they are able to share this data what they get is they get a handle instead now we are currently discussing amongst the team about maybe relaxing this so that metadata records can get a DOI there's a little bit of debate going on there but certainly the the researchers would be very keen a lot of researchers are very keen to use DOI as a preference for anything that they site and then this is an example so this is a licensed dataset I didn't want to show one that I can't show to anybody because it's restricted so this is just one that you can get through GS Arts Australia and we've got our own copy in there just available to Syro staff so that they don't have to go download it again and so when we do that we don't get a DOI and we don't handle this is just using an internal ID system and so this this will only show up to Syro staff for long on and then we've got version control so this is a bit of an example of a software collection that's been going through a few different versions and each time a new version gets created if it's a minor update say just fixing a typo then the DOI is maintained so you'll get several versions but you'll only get but you won't get a new DOI but in the instance where the data changes so that this is an actual new version of the software and you can see that there are also subsequent versions that have new contributors to the data so when those versions come through so if they've changed something significant about these attribution statements people in the title of the collection they get a new version and they get a new DOI for each one so what's in the future oh yeah and one thing I have completely neglected to talk about is the development we're doing of a data management plan tool so if we think back all the way to that workflow where we were looking at I'm gonna no I want to get back to many slides the when we think back to that workflow where we have the different categories of storage while researchers were working on their data before publication this is they are starting to collect information about the files and information about what happens to those files as it goes and that combined with the data management plan tool where researchers describe what they're planning to do and how they're planning to store things we would like to get that metadata feeding into depth collections so that they already have things written by the time they actually go to create the depth record and it's less so people maintain that metadata as they go so there's much lower transaction costs as one of my colleagues says when they're creating the metadata rather than remembering everything right at the end another thing we're working on a number of research groups have been very keen on us setting up some features that would support linked data so we've got semantic web features coming up like a persistent URL service which is a generally useful thing what we find is that a lot of researchers would want to get DOIs but they may not really understand the policies around using and maintaining DOIs so they might think of a DOI as just a persistent URL that points at whatever you want it to point at which is maybe more like what a Perl would be so I'm seeing a need and certainly for linked data there's a need to have persistent URLs that you can define the policies around and so having an institutional wide persistent URL service that both the data access portal could use but that any research group could use for tracking any object it could be a person it could be a data file it could be a piece of software and this would lead in towards things like provenance tracking where you can actually identify each thing each part of the process of the research workflow and record this and then that should improve transparency and reproducibility of the research itself we also are looking at vocabulary services because that will be much if for nothing else improving the way we enter keywords into our collections but there are numerous applications for vocabulary and then so the main thing that this semantic web features is not to have say we're going to implement this all in the data access portal because there are parts of CSRO that would like to use these services so things like that persistent URL service maybe its own entity so I think the thing that the researchers are really looking for is the persistence and the reliability that still being there in 10-15 years time and they want so that you know they might be working on short-term projects they may not be able to guarantee that sort of support for themselves but they're hoping that the organization can support that and have that commitment to it and the other thing we're working on with the web services interface is programmatic creation of collections particularly for data collection that is fairly routine so we have say some geologists who are taking a lot of samples and scanning them and they would like to just feed those scans and information about those scans straight into a data collection and that they can then reuse later and so rather than manually going through and creating it each time which is really infeasible at the moment having a program that can do that for you that's what they're looking for and so we're in a testing phase of that right now so I can't get through this without acknowledging the support of the Australian National Data Service who funded quite a lot of the development of the data and also that I took one of those slides from Air Corner and RNA Tie-House from their presentation at e-research last year there is a cost of thousands that have worked on this over the years and this is by no means all of them these are just a few people that I'm working with lately so thank you to them and thank you for listening in do we have any questions? Yes, Dom, we do. There's a couple of questions here What do you use for project IDs? Is there a national service? No, we're just using internal identifiers for projects so this is really just they're probably codes that wouldn't make much sense outside of CSIRO that's why I didn't really go into a huge amount of detail about that so I guess there's certainly four projects that do have that sort of national scope there's no reason why that couldn't be set up but at the moment it's really codes that are sort of specific to how SAP works. There's also a comment from the same person who says that they're pushing ORCID to supply these not quite sure that would work as ORCID is pretty much just a personal identifier as far as I remember. Yeah well that's definitely on the list of things that we're trying to implement so we want to register ORCIDs and because it's an opt-in thing we can't just say hey everyone in CSIRO here's your ORCID they have to actually volunteer for that but yeah we definitely want to link ORCIDs from both CSIRO researchers and external collaborators who are listed into records that those are one of the identifiers that I think would be very useful and so when I was talking about persistent identifier services you know to fill in gaps because there are all types of objects that could benefit from this that might not be covered by an internationally recognized service. The next question is what is the policy you mentioned around DOIs? Is it publicly available? I'm quite sure I'm talking about it. The policy is publicly available or whether the data is publicly available. Okay so the policy on DOIs I'm probably going to have to pass the buck on this but I believe we can share it I'm not sure if we've got it I don't know if I could point to one website that posts it at the moment so I'll either well yeah I'll pass the buck to one of my colleagues if I can get you to take a name down for that I'll be happy to get back to you. She's come back to say the policies that she wants to have shared. Yeah I believe that is actually part of our if Sue Cook is in here you could pass control over to her and I'll give her the voice I think she's got some comments to make about this. She has made a comment that says the CSIRO DOI business rules are on the ANS website. Oh there you go. Okay. Others have asked for lots too so they are on the ANS website. Okay another question is are there are any of the open-dap components available for other research institutions to implement or use? Yeah I believe open-dap I don't think we have a custom implementation of that. That's actually managed by a sort of data services team in CSIRO who I linked to the high-performance computing team so I don't I'm probably going to guess that none of them are in this webinar but yeah as I understand I believe that some open software that they have implemented there so I don't think there's anything special about what we've done. Certainly I know the NECDA or NCI have some threads servers going and I'm sure the Bureau of Meteorology do too so I don't think there's anything particularly novel there. Maybe there's a few implementation details but I can definitely put you in touch with Gareth Williams who would be more than happy to talk about that. Jerry's made a comment, Jerry Ryder who's from Anne, she says with regards to pearls for projects Anne says a service for ARC International Marcy Grants. Okay and I think for the moment that's all of the questions that have come in. Thank you very much. All right thanks thanks Tom. Helen is the UQ project manager of scholarly communication and the repository service. Today Helen will talk about data publishing at UQ. Helen over to you. Hi good afternoon everyone. I was really pleased when Natasha got me to talk about data publishing because here at UQ library we've been trying to build a bit of a solution around that and particularly for the long tail of research data here at UQ. So I have said it specifically about data publishing at UQ library because I'm very aware there are some groups around UQ doing some fabulous work in this space but I will focus very much on what we're doing here in the repository. And so thinking about data publishing really raises questions of why we're going to do it, how we're going to do it, how are researchers going to gain credit for the data that they produce and particularly how they're going to gain credit separate from and in addition to the analysis of those data in publications. So what we're really trying to look at is really how we can build those meaningful connections between publishing the data and publishing the scholarly work. It's actually my favorite part of the data lifecycle, data publishing because it can be both the beginning and the end. So it might be that you're tidying up your data at the end of a project and looking to archive it but if you go one step beyond that I do archiving your work and you really start looking at depositing your data or publishing it and you really are giving sort of the start of another of another project you're really really putting your data out there to become the beginning as well as the end. So there's been talk for a while now about making data a first class scientific output and here in this paper from 2012 they discussed achieving that through formalizing the methods for citation and publication and thereby sort of incentivizing data sharing. I think that's really important when we go around talking to researchers here at UQ is to really make sure that they're understanding that the incentives behind sharing data. If we talk about the data with them about being a primary research output that really starts to click with them and they start to understand more often where we're coming from. Crucially a point of difference which we talk about with them researchers is around archiving data versus publishing data. So if when you archive your research data that can be obviously very beneficial in terms of preserving the data but when you publish it it allows for things like validation and peer review of the data which really enhances science as a whole. So we're going to researchers and talking to them about not only the academic credit that they'll get but also about that the results of their work will be verified by others that they'll be able to expose their data to decide your peer review which to some of them can be quite scary especially when we're talking again like I said to that long tail of research data who perhaps aren't as familiar with this idea as data sharing is on this but that and you know we're really trying to provide a mechanism to ensure the quality of data sets available. So at UQ what do researchers want when we go out and talk to them what is it that we're saying that they want. They would like I think research data archiving so much preserve their research data a way of sharing it a way to publish their research data in a way that treats it as a primary research output and that's crucial I think as to why we've implemented the data publishing infrastructure here in our institutional repository. We very much wanted researchers to feel that they were going through that process of publication in a simple way as they would with their other scholarly work and we do talk to them about peer review and verifiable results making sure their results are validated and reproducible and the idea of getting academic credit but I'm just putting all these words in their mouth I'm not sure do researchers really know that they want that so when we go and talk to people we've done a lot of work in this area and we're very lucky here in the library to have a team of librarians who work in research output services as well as that client service liaison librarians so we're able to go out and talk to researchers about what they actually do want so we did a couple of things we've continually evaluated our data management service since 2014 as well as collected user stories from people so they tell us that the largest ever data sets they work with perhaps on that big you know we are really trying to provide this facility here for people who don't have other options who aren't working in these big big areas which perhaps provide these nice fancy and work clothes for them so they're telling us they're not working with huge data sets but that they have many different types of data that their storage locations for archive data are a little concerning that they will store it on there into the external hard drive or on the computer so we know they want to preserve their data and they want to save it into the future but perhaps they're not they're not sure about how to do that we know that 53 percent of them wanted to keep their data permanently so the idea of data archiving isn't something that they're adverse to they're happy to keep their data permanently but it's taking that next step and actually publishing their data sharing their data that perhaps they're trying to facilitate so these are some of the real user stories real researchers said this these aren't things I've made up and that they want to store their research data in such a way that others can cite it that seems to be really important that they get credit for their work but they need access to institutional repository storage solutions for the data as required by the journals they intend to publish in so we did a bit of an environmental scan recently where we looked at for the past five years everything that uq has published we analyzed those publications by journal and by funder and then one of our data librarians went and dug out for the top 25 journals by productivity so by sheer number of publications at uq and also by overall time cited so you can say by an overall total number of sites for papers in certain journals so we got two lists of top 25 journals one for productivity and one for overall time cited what we found where there was policies for those journals only seven out of 25 in terms of sheer weight of numbers required data sharing still a lot seven out of 25 but in the highly cited or no journal list of the 25 18 out of the 25 had a data sharing policy in place so we know that uq researchers are publishing in journals a huge number of which 18 out of 25 of which are requiring them to share their data so this researcher here isn't the most frequent phone call we're getting at the moment in the teams people who are trying to publish their research data in a journal that's requiring them to deposit their data somewhere and they're looking for a solution to that problem they also would like stats on who downloads their data so that's a little bit more difficult to work through for them but they are interested in who's looking at their data this researcher said they needed to be able to securely store their sensitive data but also share it with other researchers and collaborators so we knew that we had to build infrastructure that made sense to people who had data that perhaps needed to be mediated access this person would needed to be able to permanently store their research data in a way that was open and accessible in order to meet the requirements of a funding agency so as well as then I think UQ's research output by journal requirements we did the same full funding agency requirements we looked at all the funding agencies named on UQ research outputs in the last five years and we found that there are multiple funding agencies that are putting this pressure on to researchers to make sure that their data is open and accessible that's both Australian ones as well as international ones named on UQ research publications we knew that they wanted to store and accommodate all the research data along with everything that goes with it so they need facility to apply data dictionaries, metadata, lab notebooks so that it can be used by researchers in the future so these are all really great user stories to come from our researchers these are really good use cases that we're able to accommodate using our institutional repository and I do think that over the years that we've been here and talking to researchers very much the conversation is starting to change and we really are changing that terminology now so people are beginning to start to talk to us about data publication instead of data sharing the conversation is really I think the start of a culture change here at UQ which is I think very good to see and the idea that researchers should share data to advance knowledge and promote the common good is quite an old idea but in recent years we're really seeing a lot of enthusiasm from them I think because people are starting to look at how they can get that academic credit and how it can lead to very much a conversation around research integrity and an audit trail from raw to published data but then also from the published data to the publication and I think that's where you get very strong trustability and this is what we're working towards is really the idea that data is deposited alongside and at the same time as publication of any school we are put at a time that UQ researcher is publishing a paper that we give them an easy workflow and a trusted system for them to deposit the data that goes along with that publication and link the two things together and really by integrating the data publishing with the other publishing we're giving them real credibility, trustability so it says here in this paper data stewardship is best accomplished in systems and repositories where the custodian has trusted status within their relevant communities and again I think that's why it fits really well in the repository and really well with the library but it also requires robust infrastructure that's quick and simple to use and we first implemented the form which I will show you very shortly in our repository a couple of years ago and it has come through a number of iterations where we've tried to make it very user centric and very straightforward for researchers to use we do want them to do it we want them to deposit the data we want them to describe it so we're trying to make it so that they can use it and be confident that it's a straightforward workflow so if it's going to become part of normal scientific practice it really does have to be easy to achieve so when researchers come and talk to us about publishing their research data we will quite often talk to them about if there's a discipline specific repository because I do feel that those are very, very relevant to certain researchers and we talked to them about instead of archiving their data on their external hard drive perhaps go and use a specific repository like that or we tell them also about UQE space so the fact that they can actually describe their research data in UQE space we talked to them about the idea that data that underpin a journal article should be made concurrently available we talked to them about the fact that we can link that data metadata record with their publication metadata record they can be shown to be related objects and I think that really just they start to really understand then the value behind what we're trying to achieve here we make it discoverable so we obviously send all our research data metadata up through to Research Data Australia and then we also send that through to the data citation index so we're able to track citations of their data sets through that which has been really a key thing I think for people to really comprehend the impact that this can have which is really good so I'll show you a little bit more about how but we have this needs some extra help email data at library so we have a generic email address there which comes through to the team here in the library we're very lucky to have some very skilled and specialist data librarians working here we have I suppose it's a relatively small team but a very dedicated team that work very hard to process these records as they come through and to really have those conversations with researchers articulating clearly the relevant funder and journal requirements that they can use the institutional repository that it's known that it's trusted that it can integrate with the other publication workflows and link to that other related publications or data sets we try and really keep it very research-centric and build them a profile of their data sets we can give them DOIs for their data sets we show them how to license the data set we show them how to cite it and how they should be showing what the people have to cite their data correctly we still find a lot of people just either acknowledge a data set or mention it somewhere in the paper so we're really trying to push them out it's a proper citation and then you know we can do things like if their data is actually stored in a trusted subject specific repository we can link out to that or they can upload their data if it's a fairly small data set they can choose mediated access to their data or they can choose open access so they can actually link to it or upload it or they can just have a contact person so that people that wear the data set exist but they would like to mediate their access to that and we can also add an embargo period if required so if somebody comes to us and says they need 6 months, 12 months embargo period on the data set we can facilitate that as well so this is what UQ Space looks like the home page when you log in and you go into my UQ e-space you'll notice here you can see my UPO options that's something that I'll and admin can see so a researcher starts with the tabs my research possibly my research add missing publication then they have two more options my research data and add missing research data and I really think by having the data sets up there in that prominent along with the publications gives the right message it gives the status of research data as a primary research output so they know they're getting the list of my research publications they know they can claim publications that might possibly be theirs and the system will present those to them but they can add publications as they think we're missing them but they can also get a list of their research data which comes like it looks like this the data sets below are currently attributed to you and people really like this this page but then they can also go to add missing research data set and this is what they get they get it's a fairly simple form and like I said we've gone through a couple of different iterations and we are actually looking to redesign all the forms in each space at which point it will get a bit of a facelift but I think we're pretty happy with the fields that we've got in there at the moment so the person goes in they add a simple amount of metadata not too much all the mandatory fields are up top so they can fill that in and get a lot of that done very quickly they go through and they can add access conditions so this is where they'll tell us if they'd like it to be open access or mediated access and this is where they'll pick a license in terms of access for the data set which we talked to them about in great detail because obviously if you're making your data available online you need to make sure that you're releasing it under conditions that you feel comfortable with and that also allow for reuse so we talked to them about what the different restrictions on the different licenses mean and we talked to them about copyright and whether or not copyright exists in their data if copyright doesn't exist in their data which quite happens the case in Australia we talked to them about UQ terms and conditions which is a very simple thing that says you can very well come to use my data do anything you want with it I'd like you to attribute me so we talked to them about various options around licensing in terms of access just to make sure that they're feeling comfortable I think for some people it's quite a new idea that they're just going to put their data out there online or publish their data online then we go through various things they can upload their work they can add links to the location of data if it's for in Pangea or drive them with their repository for example and then they tick a little deposit agreement that says they're the creator or the co-creator that they're authorized to deposit it they've got permission to include any third party content that it's original doesn't infringe any legal rights and that by depositing it they're granting UQE space a license to reproduce it and make it available and that the data that creators moral rights will be to be associated will be respected by UQE space and then before the records published it checked by one of our specialist research output librarians so every record that comes through every time a researcher says add missing data collection and they fill through the metadata it doesn't go automatically published online it comes through to our team we check very carefully through the record and we quite often will contact the researcher and speak to them about the metadata that they've provided and make sure that it's a rich resource because I do feel if you're publishing data the metadata you provide around it is very important and to make sure that that data and metadata is consistently high standard would be certainly a thing that we have here at UQE library so then you end up with a final record this is a record from the E-FISH genomic database repository they're a great group here at UQE they analyze all these amazing fish and sharks and they get all the genomic information of which they say they use roughly about 3% of the information that they collect and then they're very happy to make the full amount of information available online and you can see here we've got the file actually touched there so people can just download it and we also have a link through to the full text publication so we're making sure that you've got that trail from the dataset to the publication and also to any other related publications or datasets I do think that's the main thing here by popping all this information directly into that institutional repository it's really giving us that advantage and that integration with with other aspects of public publishing which is where you're going to get the credibility I think with researchers this is the second half of the record you can see they pick their creative commons attribution but non-commercial license it tells you about type of data all very standard metadata but enough for you to go off if you're going to try and discover the dataset so in the future here at UQ I bring some of their plans that we have really center around creating more of this research-centric data management infrastructure so we have a couple of different projects on the go at the moment funded by the enhancing systems and services suite of projects I guess you would call them here at UQ they are trying very much to provide this umbrella and university-wide infrastructure that's really going to help researchers sort out their workflows and that includes management and use of the data from the DMP all the way through to storage preservation and reuse so we expect this will tie in very closely with the existing information that we have in e-space we know from our user research that research is required that easy to use infrastructure that's available to them at no cost and that allows for best practice workflows but with minimum administrative intervention so we're not trying to give them an administrative task to do but we're trying almost like John said to be collecting that metadata earlier in the process so that by the time it comes to the end of publishing they're not having to remember everything that they've already got quite a well-established set of metadata by that point so currently at the UQ library we do have a DMP online tool there's no flaw in a metadata from that into the repository and there's no links to storage provisioning there's no links to published record metadata however we are well positioned to capture that information in e-space because we know we've got the the infrastructure I've shown you now we've got that complementary projects around data sharing we know we can do the licensing DOI we can send it through to RDA we can send it through to DCI so we know we're in a good position to do this we've done an awful lot of brainstorming and I like the little bit there on the wall that says can do it's like we know we can do this there's also one that says it's my data I'm not publishing it but I don't believe that one so we really are going to work towards thinking about having an idea of project level minimum viable metadata which can be fleshed out into a DMP which can have other information added to it we really are at UQ trying to look across a huge number of different disciplines and they all require something slightly different and a few of them have different ideas as to what data publishing even is so by keeping this idea of minimum viable metadata at the project level keeping it very simple and that allows I think for as wide as we can possibly get at UQ coverage although we're not trying to go for everyone I'd said at the beginning there are people at UQ doing this really well without us so we're not trying to over arching trying to get onto all of those people but for the people that don't have working systems the new system will allow research project level metadata captured in a DMP to cascade through the data life cycle automatically provision data storage and then we can use that information to publish one or more data dataset metadata records linking back to the original raw data and also linking forward to a set of publications that came from that project level data collection so I think that's a really good situation to be getting into and certainly that's the vision although I think it would be not coming probably I'm going to say 12 months give me a year for this but it's certainly the direction that we're heading in and then I've got a quote here from Vincent Smith who says the power of published data is amplified by ingenuity through applications and uses unimagined and uses unimagined by the origin and distant from the original field without connecting these disparate datasets the true potential of data reuse and repurposing is lost and that's from his paper on data publication towards a database of everything in which he has the idea that perhaps we can I want to say co-iculate everything into one large huge database that can be queried and solve all kinds of interesting problems so I really do think that publishing data is something worth investing a lot of infrastructure you know money a lot of thought a lot of infrastructure into and yeah something we're very excited to be part of here at the library today Thanks Helen we look forward to hearing more from you in one year's time well for now Susanna let's move to the question time any questions Yes there is one there is it possible that UQ can share the list of journals that require data publishing I'm about to start working this out for the journals that I see you publishing and it would be great to have a central repository for this information so yes I'm very happy to share that information yep we did look very specifically UQ publications of me and then slice the data just everywhere we can publish it but I'd imagine it'd be very similar across the university so I'm very happy to share that information that sounds fantastic so what software are you using is the question and we use it's an in-house it's just that what's used to build our institution repository so I do believe it's all open source and online but yeah it's all the in-house development stuff apart from the DMP online which is an implementation of the DCC's DMP online from the UK obviously you're welcome with your brilliant Zealand perfect thanks everyone thanks for your time thank you