 Welcome everyone to another Ann's webinar event. It's a pleasure to have you all here online from near and far as part of the Greater Ann's webinar series, which today have included topics such as data management, data licensing, data citation, to name it a few. My name's Alexander Hayes and I had with me here on the sunny Canberra Day, Jerry Ryder. We searched down around a list from Ann's who's flown all the way from Fair Adelaide. Welcome everybody. South Australia to join us for this important event and of course a myriad of meetings that she's doing. Welcome Jerry. For your interest everyone and to acknowledge the significance of this webinar topic, it's important to note that we've got attendees registered for this webinar from the University of Canterbury, New Zealand, University of Tasmania, the Australian Antarctic Division, University of Edinburgh, Chair Sciences Australia, La Trobe University, University of Canberra Australia, Deakin University, University of Melbourne Australia, Wiley Publishers, University of Western Sydney, Griffith University, University of Queensland, Research Data Storage Infrastructure, RDSI, Monash University and that's just to name a few. A few of these organisations it's obviously for to whom data is publishing as of great interest and an already an integral part of their research activities. So we've got very two distinguished guests today joining us today who are privileged to have on board, given that the topic at hand is data journals. Jane Smith is the Sherpa Service Development Officer at the Centre for Research Communications, University of Nottingham. In this role Jane's involved in a number of projects around open access information including Romeo, Juliet, Open Door, Fact and Geord. And those of you who have been involved in institutional publications and repositories you'll be familiar with at least some of these acronyms. Jane's here today to talk about the Geord project, the Journal Research Policy Data Bank, which has a particular focus on journal publishers' data sharing policies. We also have with us Dr Fiona Murphy, who is the publisher for Earth and Environmental Sciences, Sciences Journals at Wiley working with a number of titles, societies and other publishing partners. Fiona is also increasingly involved with emerging initiatives that promote good management practices of research data including reuse, use, citation and linking from primary publications. Among other activities this has led to being a core partner in the PREPARE project on peer review and publication of data sets and to membership of the STM Association Research Data Group and World Data Systems Data Publication Working Group. Now for a very brief background on Anne's activities during late 2012 Anne's staff undertook a desktop survey to identify data journals across a range of disciplines in order to define what a data journal is to review data journal policies in particular looking for requirements for DOIs, data deposit and data citation as well as to assess the status of data journals surveyed, taking into account years established peer review processes and whether they're indexed in fact by Thomson Reuters' Web of Science. So pleased today to be able to bring together these lead international initiatives and these guest speakers at a webinar that were sure will shed some light on the policies devised by academic publishers to promote linkage between data journals, journal articles and underlying research data. So I'd like to welcome Fiona Murphy who has been involved in a sister project to George but also has some experiences in her role from Wiley as well. So Fiona, I think we should be able to see your screen now. Yeah, can you? Yes, beautiful. Thank you. Yeah, just showing screen here. Okay, right. I'll take it away then. Thank you. I'm going to say it. Good morning everybody. And obviously good afternoon to most people here. Thank you very much for inviting me to speak to you today. I just wanted to say I'm going to talk a bit about some of the things that I've been doing around publishing research data and also touch on some other things that are going on as well. I wanted to start though with just a couple of what and why slides just to make sure that we're all maybe starting from the same starting point. So here you go. So what do we mean by publishing data? I think it's analogous to but not precisely the same as publishing primary data. No, I'm just trying to get your square a bit smaller because it's really massive and I can't actually see my slides very well. There we go. It's not precisely the same as publishing primary research. In so far as primary research output is generally a finished product whereas the data underlying it is often raw or in various states of partial process. So data should also be, and now I can't. It should be permanently or long term archived in a reliable repository and I put reliable in quote marks because I think that can be a problematic concept all on its own. Should be allocated to persistent identifier. I would say DOI but I think there are also problems around the nature of different kinds of data sets which can mean that a URI or even a web link is the only thing that's possible in certain cases. And there should be a critical level of metadata to allow discoverability to enable people to find a particular data set and to know what it is that they're looking at. And then the why? Why would we publish data? Well it's a very good reason to provide academic credit to the scientists, particularly the kinds of scientists who traditionally haven't been able to accrue publications and the status and career path that go along with that. And also the publication path is one which is known within research communities and could be incorporated into current research and proposal grant workflows. Hopefully it ensures that the data set is uploaded to again a trusted repository and you can have some reliance on archiving and curation practices. And again I think that's something that's emerging as a need for a more general and better understood best practice or standards and accreditation rules. I've got peer review processes and this is another thing that I think, I'm sorry I can't switch my Skype off if that's annoying people, I hope it's not too bad. It's peer review. I think that's another part of the data publication process which is again analogous to the primary search piece. But it's also not exactly the same. It's something that people have a great deal of issue with because of the size of potential data sets, the time, the skill that might be required to actually manage a peer review. The fact that it's quite one known that reviewers are already under a great deal of strain and time pressure. So that's an potential pain point in the process. Publication of data, again if we're saying that it would then become more discoverable, more permanently available, then hopefully it would then be more visible to people who aren't necessarily in the know immediately to be able to find and reuse. And transparency, it should also support the movement towards accountability to the public and to the funding agencies given that a lot of money does get spent in research and you want to see what it is you've got. And that's the other really good reason why you'd want to publish the research because it's the way that the research data, it's the way that the wind is just generally blowing. Many of you are probably aware that the White House Office of Science and Technology Policy had a big meeting last week about public access to federally refined research and they spent half the time talking about data as opposed to just the regular standard research output. The Science and Open Enterprise report came out at the end of 2011 I believe and it's a very interesting report and it does make the case that science and all kinds of research should be open up to people that paid for it and anyone that wants to use it and people should be able to find their way around it. In fact I've heard the reports made also speak and he makes a very clear case that librarians are of key to this new paradigm. Horizon 2020 is another one I just picked is the EU's programme that's the European Union's programme for research innovation. They've got a budget that was €80 billion, I think it might have been cut a little bit but they're absolutely placing a top priority on opening up research and allowing a sort of facilitating and the capacity to build data set knowledge and to be able to find interoperability and synergies to drive new insights and business models and growth and jobs and generally they see this as being really key to Europe's long term viability and prosperity. So what do publishers do? Well one of the things that we can do I guess is our sort of new reaction to most things is start a new journal. So Geoscience Data Journal is a partnership between Wiley and the World Meteorological Society and it's also been supported by NUC which is the Natural Environment Research Council in the UK in particular the British Atmospheric Data Centre has been very helpful giving us a lot of time and people space to work through how we might set this up and make it work. As you can see we've published short data papers which are cross linked to and which site data sets that have been deposited in an approved data centre in order to DOI or another permanent identifier. I've also put a little description here of what we believe a data article is and why it's a good thing to do. As you said you see it's the when, the how, the why, the data was collected and what the data product is and it's a way of pulling together all the parts of the project, the output that would enable reproducibility and also reusability. So I've also put a slash page here of what an article looks like and I wanted to draw your attention to the fact that there we've got the DOI of the article itself but we've also put the DOI of the data set up there on the front page. We thought it was really important to have it sitting up there amongst the front matter and really prominent so that people can see how it relates to the article overall. I did want to mention as well that we also put the DOI in the reference list because we're wanting to support the general citation of data sets and we're also mindful of the Thompson data citation index which is being pulled together at the moment and we want to make sure that we support whatever working workflows that they eventually come up with. So I've got here on the left a picture of the workflow as we envisaged it at the beginning and as you can see it's pretty complicated. There are a lot of processes and there are a lot of parts where the researcher has almost been battered between say the publisher and the editorial process of the primary research paper but also the repository and the data set itself and we felt that this is a potential barrier to people really picking up and running with this sort of publication. So we started isolating and trying to name the issues we felt were key ones and there was a workflow and cross-linking issue. The journal and the repository need to be able to speak to each other. We need to know something about the repository in order to be able to work with it including whether it's going to be here next year, what happens to a data set that goes into that repository. I thought that it was intrinsic to calling something a journal is that there should be peer review again as I mentioned earlier. Peer review of data sets is a big ask and people aren't really clear what it is they're supposed to do. And just generally, I think Jane touched on it as well that people, researchers, they are being pushed towards behaving in this sort of way but they're also having to operate in the real world where if you've painstakingly compiled a data set you don't want just achieving any credit yourself and it's important to be able to engage with people and answer questions and address concerns and adapt as required. So we felt that that also boiled down to the need for and a better understanding of how this sort of publication, a journal and a repository would interact. In which case we started working on the prepare project, that's where that came in. And again, like George, it's disk funded and they're managing research data strand. So I've put up up here the key partners, the contact details of the project leads and as you can see we're coming towards the end of our cycle so we've got some outputs and we've got some final including like to point out. So one of our work packages, one of the areas we've been investigating is repository accreditation. Because clearly if you've got a data paper at the data set there needs to be a very strong, durable link but there are a lot of questions that we have to know on an individual basis if a repository is trustworthy but then we need to have some way of sealing that, of publicizing that and of generally ensuring we don't have to keep duplicating that work every time we either start a new journal or another publisher wants to work with that repository. So we were looking to see how to start pulling that insight and information together. As you can see we've put a list of the characteristics that we've been looking at around the project accreditation and how you assess whether a repository is good to work with. And actually we're in the process at the moment of finding some recommendations which I'll give people a note of how to interact with in a moment which is the second key area of our study is the peer review of data. As you can see we had a workshop a couple of months ago and we decided that we had three recommendations that need to connect data review, data management planning so to basically pull the data management plan which hopefully happened much earlier in the process with the review which then happens at the end so you can connect what was the project supposed to achieve, what was the data that was supposed to come out and how was it supposed to be collected or used or assessed. We wanted to show that there's two sets of reviews that could be a scientific review that could be a technical review and these both needed to be reflected in the curation and information that was inheld about the data set. So we also wanted to connect the processes of the data review with the article review. As you can see we've got a formal document for comment which is up on that URL there and there's also a mail list this is the data publication email address which is very easy to join and to comment on and we'd be very happy if you were to do that. You can also find, if you join the list and then go back through some of the previous posts you can also find the material around the repository accreditation. So most recently we were looking at cross-linking workflows so we may just have the workshop on that one and so I've just put some preliminary findings from that. The loudest voice actually was as I was touching on before the need for there to be some central registry of Broca and that's partly I think also related to the accreditation issue but it's also to do with the fact that at the moment all the links are bilateral and any information that's sent between them is largely manual and it's just not going to be workable to try and build that up if you can imagine in a world where a lot of data sets are being cited and you're wanting to collect, capture information on who's cited what has an article, a data paper pulled together multiple data sets and then cited them that they could be sitting in different data centers and we just can't keep a track on that manually. I think something along the lines of cross-ref brokerage, maybe something around Thompson Reuters and the ISI might be a possibility but it feels that if there are multiple people to be incentivized to publish data sets we need to be able to collect the certations and that then needs to be done in a manageable way. As we said, data citation I think is also emerging as a currency that's understood amongst research communities and in fact data citation is like publication of data it's analogous to but not exactly the same as primary research. If you can imagine if you've got a long-term observation data set in the atmospheric sciences a data set could be cited and then the same data set could be cited a year later and it would in fact be a different data set so there's a certain amount of something being fixed and yet not fixed which you certainly don't get with primary research articles but the concept of citing that data set is something that I think many of us are familiar with. So I also wanted in the interest of fairness to mention obviously while I'm not the only a scientific technical publisher I'm also aware of and in fact applause for the fact that many publishers are exploring this area and I think it's a sign of its growing importance and people just realising that this is going to be critical for underpinning scholarly communication going forward. So I've just put a few journals and publications down here to illustrate that and where this is by no means exhaustive but actually do have quite a good list on the prepared website that was one of the things that we did was to pull together a list of data journals which again people are welcome to have a look at but Earth Systems Science Data is an EDU publication it's open access, it's been going for about four years and it's fairly similar to Geoscience Data Journal it has an open peer review system and at one point it was also publishing supplementary information which we decided we didn't want to do I think they've now tightened up their criteria Scientific Data from Nature was announced very recently and I think that that was a real signal I think of the importance of this topic starting to assume the scientific data is going to be publishing what then they've turned data descriptors which I think are pretty much data papers, data articles but it hasn't yet formally launched so I think that's the space to watch and get more information about as we go forward Geoscience by MedCentral is very much a life of medical sciences, it's a big data project and Geoscience also undertakes to hold the data set as part of the publication and faculty of a thousand research, a thousand research this is another quite new entrant into the field again they have open peer review which is also post publication which is another one maybe to have a look at it's also in the life sciences, biomedical sciences but it's also build out partnerships with some of the data centres such as FigShare which will take your uninteresting data sets and allocate to DOI they're interested in publishing negative results to just generally build the canon of scientific technical knowledge so I also thought it might be useful to mention a couple of things that you can go and do after this session you can have a look at our site which has got quite a lot of information about the work packages and the output that we're conducting and also as a blog and the mailing list as well I'm very welcome to join or interact with that more widely there's a gist mailing list which relates back to the website I mentioned earlier and that's quite interesting that's very international that's got a lot of librarians, data centre managers publishers, interested researchers who are all trying to engage around this and it is picking up a lot of the things that are going on that other organisations are engaged with Research Data Alliance again it's quite another question which I know Anne is very involved with which I think is at a point where it would be quite interesting, quite useful to at least engage with joining some of the mailing lists because there are a lot of working groups which are just getting off the ground at the moment some of them are around things like data publication data citation, capacity building and so forth so at the very least you can keep an eye on what's going on by being aware of them the World Data System is another international organisation which is encouraging a membership again I'm aware that the Australian Antarctic Data Centre is certainly a member and Australian Bureau of Mutrology but it's generally, it has a mission to support the best practice of stewardship, creation, research data and you're invited to support the mission you can become an associate member which doesn't involve paying any money but which does involve being called to the table to actually engage with and support the policies as they're emerging, as they're worked through and it's with an idea of joining things up and supporting interoperability and not reinventing the wheel as well I think that there is a potential issue because there are so many things going on at the moment that people could well be working in isolation and say reinventing the wheel in several different places at once but I think research data reliance the World Data System are very much looking to see what's already out there what's good practice, where the low-hanging fruit is and to actually build from there and support the things that are already happening so in the future, a little bit of a blue sky moment hopefully I'll know a lot more about the future tomorrow because there's actually a meeting in Oxford I've put the program there we're hoping to have at least some of the sessions broadcast or recorded but hopefully there'll be a Twitter feed as well and we'll try and make sure there are some outputs that come out from that more generally I think the sense that the stakeholders in this in scholarly communication we're in a shifting landscape I think it's really important that we speak to each other that we're adaptable and I think that there's so much to do that there should be space for all of us within that and I think that the journals and scholarly communications are going to start really changing in the not too distant future I think there'll be more enriched content there'll be more tools for query I think things like copyright and ownership are going to become they're going to adapt, they're going to change not going to say they're not so much important but I think they're going to be important in a different way and that's it