 Okay, let me introduce you properly. So Dr. Fiona Murphy is publisher for Earth and Environmental Science journals at Wiley, working with a number of title societies and other publishing partners. She's also increasingly involved with emerging initiatives that promote good management practices of research data, so good thing, including use, reuse citation and linking with primary publications. Most significantly, the Prepar de Projects and among other activities, she's also a member of the STM Association Research Data Group and the WDS RDA Data Publishing, Publication Working Group. Okay, I see that she's ready. Do you like, switch the slides? If just that, okay. Thank you very much, Inga. Thank you very much for the invitation to come here today. I'm aware, as it was just said, that I'm what's between you and lunch, so I will try to be brisk, I guess. Someone's left their phone, yeah. Right, so the brief that I read and the talk I want to give today is, again, around data publication, so hopefully it won't duplicate too much what's already been mentioned. I wanted to talk about it from obviously my point of view as a publisher, but given the audience and the mission for this particular workshop, I wanted to also draw some parallels between ourselves as it supports stakeholders in the research communication sort of ecosystem. So first of all, I put together a couple of slides. I wasn't sure what people before me would be covering, so just to make sure that we're all on the same page, I think Peggy in particular has covered this reasonably well. So from my point of view, publishing data is analogous to, but not exactly the same as publishing primary research in so far as if something is published, if a data set is published, it then can be republished in a slightly altered format if say it's around a long-term observational data set. Absolutely, we need a long-term archiving situation in a reliable repository with a persistent identifier which might be a DOI or it might be another mechanism such as a URI, and I think this is really critical, is a level of metadata to allow discoverability, hopefully reuse and hopefully just really highlight what is actually sitting there. So why would we do it? Initially, academic credit for the people that have created and managed the data and has also emerged ready today. There isn't really a standard way of deciding who's an author of a data set, and I think that's some work that someone needs to do at some point. Yeah, so you publish data in order to be able to feel comfortable that it's going to remain. It's going to stay, you're going to be able to find it again. Peer review, I think, hasn't been mentioned so much. I mean, it's partly a quality check for the data set itself that the correct fields are filled in, that you haven't got a corrupt file. But it's also, by plugging a data set into the scholarly research ecosystem, you're extending that system whereby people are looking at to check it to see what the scientific content is, and that's valuable too. And I think it's also mentioned in the keynote. Ideally, the level of metadata, the discoverability of the data set should then promote its understanding and use by people who aren't immediately in that circle of charm people who knew about it beforehand. Transparency, I think that's particularly important, given, again, the way that policy and conversations are going about how research, how innovation need to be conducted and how increasing that the public are aware that they're funding and need to benefit from the research. Yeah, and because of the money. I think, again, increasingly, various jurisdictions are becoming more and more active and saying the research data is available, commodity in terms of commercial and also research innovation in terms. And increasingly, the fund demand dates are becoming more crystal around, actually, people are gonna have to spend some time thinking about where the data go at the beginning of the proposal and at the end, and the reuse is gonna be increasingly important as well. So I've put the NSF, the White House, for the US, also the Horizon 2020, which I know will be a key horizon for a number of people here. I also mentioned the Sciences and Open Enterprise report, which has been very influential, certainly, in the UK. And it seems to indicate the way that the wind is blowing in terms of access to and behavior around science and research, generally. Jeffrey Bolton, who's the lead author of this report, emphasizes within it the critical place that librarians should have within this future. I have heard him proclaim at a meeting with a completely different set of people that the wrong people were in the libraries, which I thought was really harsh, actually. But I think revisiting the whole tenet of his talk, looking at the report and seeing the way that the interpretations have been coming out since then, I think what is supposed to be interpreted is that a huge number of extremely capable and the right people are within the library. But like a number of us who are key stakeholders in the scholarly research system, we need to be interrogating our current workflows and our deliverables and our skill sets and be open to possibilities of adaptation and just changing some of the things we do as open access and open data converge with other movements to this new world of open science or open research creation. For instance, I think the emphasis changes from maximizing access for your institution to maximizing efficiency, discoverability, and also trust, I think, because an open world is also potentially an unruly world and I think people can get lost and confused by that. I also wanted to mention that Susan Riley of Libar was also at the meeting that Peggy mentioned last week in Oxford and I think she made a sterling case for the pivotal place that librarians play as research data management stakeholders. I think in particular she pointed out that you're part of the institution. People are there, they're on the premises, that they're part of the current research infrastructure which is what needs to be built upon. They're trusted knowledge management partners and the personal or departmental relationships, I think, are there to be grown and explored. So as a publisher, I'm very used to the narrative of a changed paradigm because as a publisher, that's the conversation we've been having internally as well and I think the question is how to embrace change in new technologies, new business models without being, that's a UK phrase, Turkey voting for Christmas. So I wanted to share with you, it's a diagram we've been sort of staring up for a while internally. As you can see, research, the workflows, the bullseye around it and around it we've got a cycle of the different stages that the researcher would go through in terms of beginning and to completion of a project and then we've got the name of that process that takes them from stage to stage. And then we actually broke that down to I guess the more concrete tasks that the nitty-gritty that each one actually involves going through and I think as publishers we were a bit chastised internally to think that we were only really occupying space like the bottom left and a bit on the top left kind of quadrants. We're coming in right at the end of the process and we feel that actually something that we are going to be looking at a lot more is taking all these stages and thinking well what can we do to support in this future that's going to look different? What points of interaction can we have? And I guess I would hope that that would be something that's useful for librarians to think around as well. I think different institutions, departments, people are going to have different experiences clearly but I think given how much of this is around workflows and research data, it's possible I think for a number of librarians to think well actually can we systematically be more involved and supportive and enhance the impact of our institution by becoming partners in this. And again, I think some of the ideas that come out of that which have been mentioned before which I think by mentioning them myself hopefully I helped with building the case. Data management planning at proposal stage, consulting with researchers as they have to produce data management plan as part of their funding proposal. Compliance with funder mandates for data management and for publication practices as well. I mean this is also something I know the RCUK for instance is going to start requiring a data accessibility statement as part of all their funded research outputs which I think again Peggy mentioned. So we think these are going to have to be standardised and researchers are going to have to know what format, which journals, how do they go about doing this. I think also building policies and best practices and building up from pilots or from other knowledge. I mean people have been able historically to ignore a lot of the say open access or other scholarly communication debates but not anymore and researchers are going to have to engage with these or they're not going to be able to build a proper career. So I wanted to show you some of the things that are already out there, obviously open air. I think is a very key project, very important is doing some interesting things. Opportunities for data exchange has closed now but it was a project where librarians publishes data centres and researchers came together to think about some research data publication issues. They define to find some of the drivers, some of the barriers to progress. They build out some case studies and they projected, they thought about what might happen in the future and how each of those stakeholders might find a place. It's a very good website actually if you Google that. Another one is actually ongoing, SIM for RDM which is, I always put this down, support infrastructure models for research data management. As you see, it's in the run up to Horizon 2020, they're looking to support researchers effective knowledge of data infrastructures and again, they're going to need support and then they're going to need skills or they're going to need to be able to call upon other people's skills in order to maximize their own impact and that of their institution and projects. I also wanted to give honourable mention to the Australian National Data Service. I think they're very, very active. They have a webinar series which you can download from YouTube. They've got a great website, very clear mission about supporting reuse, curation, archiving of data. They've got a lot of re-nice and short explanatory guides just defining things and giving you some way to go to look for things and they also have a really international outlook. So, what do publishers do? Before I go and mention this one, I wanted to just also say that there are, as you may have noticed, a number of data journals, data projects coming out recently. The faculty of 1000 has brought out F1000 research. Nature has just announced the launch of scientific data. Hindawee has also launched some new data journals. I met Central a couple of years ago, launched Gigascience. So it's basically something that the publishers feel we're filling the need to respond to. It's something, again, that we see requiring response from us. So Geoscience Data Journal is the Wiley version. So we publish it with one of our partners, the Royal Meteorological Society, and we had some support from NERC, which again, someone's mentioned already, the Natural Environment Research Council, in particular the British Atmospheric Data Centre. So it's partly in terms of helping find and identify a good editor. Partly in supporting the general mission. I mean, we had a lot of encouraging noises from them, but this was a good way to help support, again, impact, measurable impact of research. And they just want to be able to guide people, encourage them to use their data well, archive it responsibly, and ideally we use it rather than go out and collect it all again. So I wanted to give you a short description of the journal. We published short data papers which are cross-linked to unsighted data sets that have been deposited elsewhere in an approved data centre. So there's two pieces to that. There's a data paper and a data journal, and there's a data set in a separate repository, and we just link. And it's a CCBY licence, certainly for the data paper, and it's whatever the licence situation is for that data centre, but normally we'd expect it to be open. And a data article, there's a description there. It could be put software, it could be, it could have other additions, but basically it's a description. It gives people the when, how, why the data were collected and what the data product actually is, ideally supporting and the reuse and giving people enough metadata that they can find what they're looking for. So this is a typical splash page. This is a front page of one of the articles. I want to just draw your attention to the fact that the DOI for the article, the data paper is there, and the DOI for the data set is there on the front page as well. We thought it was important that the relative importance be illustrated by them, both being right at the front of the page. Although we also have put the data set, we stipulated that needs to be a part of the reference list as well, because we're very mindful of the fact that Thompson ISI, for instance, are setting up a data citation index, and however that eventually turns out, I want to make sure that citations, two data sets in this title and in all our titles can be collected and counted and give people the correct amount of credit. So it's a different sort of journal, and actually it's involved, we also realized that we needed a different sort of editor, different sort of editorial board from the general kind of academic expert that I'm certainly been my pleasure to deal with today. So Dr. Rob Allen is an academic expert. He's a science officer at the Met Office. He's also a project manager for a historical data rescue project, which as you can see, for a climate reanalysis, there is a direct scientific purpose to this project, but it is also a case of finding old log books and digitizing them and trying to make them discoverable to people and just reinterpret things, bring things into the scholarly canon if you like. I've also put up some of the editorial board members as well to see Sarah Callaghan on top of the list, alphabetical. Again, just to show that whereas we do have people who've got atmospheric, geological, whatever, geoscience expertise, we're also looking for people that have information management expertise that can help with just breaking down the boundaries, I think, between the researchers and the journal and the data centers and basically support this mission, support the workflows that are going to be required. Okay, so because there are issues, you can see there that's the workflow. On the left, you've got the repository half of the workflow and on the right, you've got the journal part of the workflow for these two pieces, the data set and the data paper. And it is complicated. I think from our point of view, certainly, we realize that every data center has a slightly different workflow and that means that we were trying to think, how do we adapt our workflow to take this into account? We also needed to make sure that we could trust the repositories that we were working with. There are things like the data set of approval, there's data site, there are organizations that cope with this already, but again, not necessarily in a journal context. So again, I mentioned the peer review and people who have worked with journals know that the reviewer's referees already have a lot asked of them every time they review a primary research article. The scientific peer review of data felt very important, but it also needs to be a kind of realistic ask. People also need to know what it is they're being asked to do. And engagement, we also felt, and again, it's something that those of us who keep an eye on what's going on in the policy circles are aware that this is coming, that people are going to have to read aside how they behave with their data sets, but the researcher that's working extremely hard got no time to rethink about it, just in this treadmill of proposals and research and publication output. They need to have the information given to them in a way that's palatable that they can actually cope with at the right time. So it's brought all boiled down to the fact that unlike a usual journal where the publisher can take responsibility for managing all of the kind of content processes, this is very much a joint venture and it's one that you need to undertake with a number of partners if, as we want to, you want to work with a number of data centers. So we set up a project which I thought actually might have already come up today, but lucky you may get to mention it. So prepare, the peer review for the publication and accreditation of research data in the earth sciences. The project manager is Sarah Callaghan, which lucky for us, and these are some of the main principles that we're investigating and reporting back on. The project will actually be finishing towards the end of next month, although we might well spend some extra time trying to process and reporting on some of the findings and other things that have come out of it. That's the first thing. That's probably all we need to say about that after the moment. And I also wanted to actually page up. I did also want to just draw attention to the footer at the bottom there, as that's all the key partners. I know the DCC is represented here today as well. We've got the Faculty of 1000s. They've actually got a competitor publisher working cooperatively, which also felt important. We've got a couple of international collaborators. We've got the California Digital Libraries and also NCAR, which is the National Center for Atmospheric Research in Colorado, as well as JISC, who are the initial funder and some universities in the UK. So it's a group of quite diverse stakeholders who have been trying to bring, pull together some recommendations and some best practices around data publishing. And I put the website up at the end of this presentation for people to pick up if they want. I did also want to mention the George project. I don't think anyone is here certainly today to talk about that. It's a sister project funded from the same strand. And you see that the mission has been to collate and summarize journal data policies. And the fact is, the fact I think that somebody felt the need to do that and also the fact that when they went through this exercise, I think under half of the journals that they looked at actually had a research data policy. I think it points to the fact again that we're in time of great transition and that people aren't going to be able to just send a submission to a journal in the future. They're going to have to, someone's going to have to have done the research to make sure that what they're sending through is going to be appropriate, it's going to be acceptable. And I think again, that's something that could be supported within the institution. So I wanted to give you some other sort of tips for going forward. There's our project site. As I said, we're quite near the end of the project now. We do have one more workshop next month at the British Library. I'm not sure, there might be one or two spaces left if anyone's able to come and participate. We're actually looking at the peer review guidelines which we've already put together and trying to come on to a final conclusion. There's a gist mailing list, data hyphen publication at gistmail.ac.uk. And that's not specifically linked to the project. It's around the data publication issue and it's got a lot of, it's quite international. It's got people just posting into that local meetings and we're trying, we've been quite active recently talking about peer review and repository accreditation, for instance. So it's quite a nice one to join just to kind of keep informed. The research data alliance has also got a mailing list. And there's also, in its early date, in the process of setting up working groups and task groups to investigate and deliver back on various research data issues. So again, if you look at the site, if there are any working groups that interest you, I encourage you to make contact with people. I've also just mentioned the World Data System. There's another organization which is looking to promote best practice and stewardship of research data. There's, you can join, you can become a member or you can just again simply monitor the communications and so forth coming out of that and I think some of those are going to be quite interesting. So a slightly different quote, I think the one I heard earlier, Jeffrey Bolton again. And it's quite, that's quite a strong statement. I think it's something to aspire to that it's, at the very least it's good practice to make sure that people can find what's underlying in order to verify conclusions and promote transparency. And thank you very much. Thank you Fiona. Any questions? Actually it's the same question I asked before, I think, one of them. Interesting, the view of the publisher. So my question is, so you have these links, DOIs, the data sets in your articles online. What about in the metadata that we harvest in our library discovery tools, are they also available there so we can use them in our front ends? The metadata from the DOIs, from the data sets. The metadata for the articles that we traditionally publish. Okay, the Geoscience Data Journal, my journal, our journal, is published with a CCBY license. I'm not interested in license, I'm interested in metadata. Maybe I didn't phrase it quite well, so we, at libraries we harvest metadata, we have our own, for instance, integrate, I'm going to use some of these bullshit bingo words, unified integrated discovery tool front end, let's maybe find, where we present the metadata of the articles that we have access to at the university, and people can then find a link to the full text, but I would like to have also a link to the data set in these harvested metadata. Traditionally, this is not the case because we haven't had this until now, but my question now is, do you as a publisher also provide that additional metadata fields, the links to the data sets, to the third party or local presentation? Okay, we're not systematically providing any feeds at the moment, although we have had a couple of preliminary conversations with, say, open air. If you wanted to talk to me about that at lunch or afterwards, then we could certainly discuss it. Does that help? I already have a lunch appointment, but maybe later. Same email. Any other questions? Before we end for lunch. Hi, it's Robin Rice from University of Edinburgh. It just occurs to me regarding that previous question that it is a data journal, so the paper, something to keep in mind is, the item itself is the data in this case, although I think your question probably applies to a more traditional journal. But, sorry, so my question was about, I don't know if you'll consider it a fair question because, again, your data journal is not a regular journal, but I can't help thinking about regular journals and the increasing maybe either trend or not trend, like debate whether to do peer review of data in the normal scholarly communication outside of data journals, and how you think, how similar or different the process is because some of the things you mentioned in terms of peer review of data sets, such as making sure the variables work and all of that kind of thing, I would have thought would be analogous to what the editor does in a journal, not the peer review committee, because if I do peer review and I'm correcting typos, that's not a good use of my, say, scientific expertise, not that I have scientific expertise, but so is there an analogy there? I know you have a special editorial board that's willing to look into data. What's that need for? I know that peer review is considered time consuming as it is and I know it's said that it's not a good mechanism to capture fraud, for example, because they can't look at the data hard. So I was just wondering where you, like I said, this might not be fair, but where do you see peer review going in the traditional journal process where at least the data set is being asked to be made available? Okay, there's quite a lot there, isn't there? I guess, I think taking the peer review issue as a whole, we were inspired to investigate the peer review of the data sets because of this particular journal, but I agree it's something that should be extrapolated out generally, particularly as data sets I think will become more regularly linked with primary research publications. I think that you can sometimes look at it as almost a two-fold process, where you might have, say, a mechanical machine-read stage which simply does check that it's what it says it is, that there are no void fields, empty fields which should be filled, that there are no values which are beyond the parameter they should be in, that the file's not corrupt. I think after that, yes, that there are issues about what's humanly possible and that's going to vary from community to community because of various levels of complexity. Sometimes people will need specific software in order to be able to read something. We might get things in different formats, some of which have been partly digested and some of which are more raw, and that's part of what we're trying to do in pulling together the peer review guidelines for data, which is the piece that we're literally on at the moment. So I would say do please have a look at the website because we're actually in the process still of consulting. There is a document there which we're very happy to get feedback on. Okay, thank you Fiona. We'll stop here, it's time for lunch now. Thank you again, all the speakers of this morning for your presentations. Lots of things to discuss during lunch. We start again at a quarter to two so we'll see each other then here. I would just would like to ask the speakers who haven't given their presentations yet to come here so that we can upload them.