 Welcome everyone to another ANS webinar event. It's a pleasure to have you all here online from near and far as part of the Greater ANS webinar series, which today have included topics such as data management, data licensing, data citation, to name but a few. My name is Alexander Hayes and I have with me here on the sunny Canberra Day, Jerry Ryder. We search down around a list from at ANS who's flown all the way from Fair Adelaide. Welcome everybody. South Australia to join us for this important event and of course a myriad of meetings that she's doing. Welcome Jerry. For your interest everyone and to acknowledge the significance of this webinar topic it's important to note that we've got attendees registered for this webinar from the University of Canterbury, New Zealand, University of Tasmania, the Australian Antarctic Division, University of Edinburgh, Chair Sciences Australia, La Trobe University, University of Canberra Australia, Deakin University, University of Melbourne Australia, Wiley Publishers, University of Western Sydney, Griffith University, University of Queensland, Research Data Storage Infrastructure, RDSI, Monash University and that's just to name a few. A few of these organisations it's obviously for to whom data is publishing is of great interest and an already an integral part of their research activities. So we've got very two distinguished guests today joining us today who are privileged to have on board given that the topic at hand is Data Journals. Jane Smith is the Sherpa Service Development Officer at the Centre for Research Communications University of Nottingham. In this role Jane's involved in a number of projects around open access information including Romeo, the Juliet Open Door, Fact and Geord and those of you who have been involved in institutional publications and repositories you'll be familiar with at least some of these acronyms. Jane's here today to talk about the Geord Project, the Journal Research Policy Data Bank, which has a particular focus on journal publishers' data sharing policies. We also have with us Dr Fiona Murphy, who is the publisher for Earth and Environmental Sciences, Sciences Journals at Wiley working with a number of titles, societies and other publishing partners. Fiona is also increasingly involved with emerging initiatives that promote good management practices of research data including reuse, use, citation and linking from primary publications. Among other activities this has led to being a core partner in the prepared project on peer review and publication of data sets and to membership of the STM Association Research Data Group and World Data Systems Data Publication Working Group. Now for a very brief background on ANZ activities during late 2012 ANZ staff undertook a desktop survey to identify data journals across a range of disciplines in order to define what a data journal is to review data journal policies in particular looking for requirements for DOIs, data deposit and data citation as well as to assess the status of data journals surveyed taking into account years established peer review processes and whether they're indexed in fact by Thomson Reuters web of science. So pleased today to be able to bring together these lead international initiatives and these guest speakers in a webinar that will show some light on the policies devised by academic publishers to promote linkage between data journals, journal articles and underlying research data. I'd now like to introduce to you Jane Smith from the University of Nottingham. I hope that everyone can see the presentation appearing. I've been working on the journal research data bank data policy bank or geord for simplification. Just before I talk about what was what happens to the project and findings. I'm going to give a bit of background. I'm sure you're all quite familiar with it if you're tuning into the ANZ webinars but you just bear with me. Data has been coming increasingly valuable resource in its own right. People are wanting access to the data behind journals not just the data in the journal article. So I'm looking to access the data set, research councils are now wanting that publicly funded research data to be made of more available and shared across the communities as much as an indication that they're spending their money appropriately. With changes in research practice and technology it's now possible to make use of these data sets and collect different data sets of different researchers and extract additional information across the board. As I'm sure you were 2011 ANZ had an international workshop and part of this came out with the conclusion that it would be a good advantage to collect journals policies on research data. What the journals and the publishers want the authors to do with that data. So Jisk which is what who funded geord through the managing research data program incorporated this idea into it and asked for people to be able to look at doing a feasibility study of is it actually sensible to do other aspects of the program including making research data management programs and management strands in various institutions so there's a bit more of an infrastructure to be developed and if the institutions are developing actually the researchers deposit data they can start at wanting to know what the journals will let them do. So in some ways we've been calling it somewhat short and cheekily as the Romeo of the data to help people understand. So Jordan's six months feasibility study it ran in July December last year as it was commissioned by Jisk and it was run by Center for Research Communications Research Information Network our colleague Paul Sturges at the University of Loughborough that's just down the road from Nottingham and Mark where consulting and together we sculpt and shape a potential service that provide ready source information for covering journal policy landscape of research data. So we did this in three stages sorry I need to have my notes in my order our aims identify the scope and format of a service to collate and summarize general data policies but also to investigate and recommend business models which just wanted the aim that it would be financially self-sustaining. So those key stages I mentioned first I wanted to investigate what was the current state of journal policies on research data are there any out there how good are they what do they cover that sort of thing. We also wanted to consult with stakeholders I'm not just talking about researchers but the research managers the funders the publishers support work people who support researchers like librarians and repository staff and as mentioned want to look at the business models and what service options available. So the literature review I want to look at what what have you done already in the literature had anyone done something similar did they have any recommendations and how to do the studies these are that. In some ways the literature do general thoughts was there wasn't a great deal of literature on this area particularly on journal policies or research data there might be stuff about research data but not necessarily about journals having policies. However there were some key studies and these found a large percentage of journals lacked policies and data sharing and those studies are the likes of McCain in 95 and perhaps more famously and Chapman in 2008. I don't have the full references but I can get them to people if they wish. There's also no standard procedures across these from these studies indicated how a journal should create a data sharing policy or what those policies should advise. That's right this is also this is a large degree of inconsistency some were very vague some were very clear cut of what was wanted. There's also little guidance available to the authors. However some subject areas like biomedical science was leading the way in this area and also sort of as a perhaps perhaps a cause of little guidance researchers data sharing habits were also quite inconsistent. So with this knowledge we went to start looking at what policies the journals actually have. We decided to look at both the highest and lowest impact factor journals and to pick a hundred of each of these from the two subject areas covered by the Thompson Reuters citation index that science and social sciences. However as you notice we only have to look at 371 this is because there's actually some duplication across these two lists. Of those 371 titles we investigated 162 which is 44% actually had policies. In fact there's 230 policies which I'll explain a bit later but it does make sense. Those are quite good subject coverage and 36 subject areas covered across these two lists. We did consider whether or not to contain journals we knew had policies in but decided at the end to remove these because that could give a bit of bias and we didn't actually know where they sat on an impact factor scale. So this is a graph of who had policies. As you can see majority of the journals we looked at had no policy. We have some listed as unknown and that's really where we were unable to find a journal website so we couldn't find if they had a policy and we decided not to go for direct contact with the journal that is due to the timescale of the project. However the worst of were multiple policies for their journals about 15% and this would be where there might be a policy on data sharing, there might be a policy on data preservation, there might be a policy on sort of the formats of the data and so it was actually the multiple policies the data policy was spread across multiple policies of the journal. We used Pirouin Chapman's definition of strong and weak policies which in summary is where strong policies where data to deposit is a condition of publication. For example if you don't deposit it you can't publish. Whereas a weak policy would really suggest or recommend the deposit. Based on this of the journal policies we found nearly three quarters a week with only a quarter being strong. Perhaps again not too surprising the high impact journals were more likely to have a strong policy and the lower impact journals were more likely to be just recommend or suggest that authors shared data. However again as indicated in our literature review approaches varied between subject disciplines with some more established than others. We did in fact notice in addition to biomedical sciences some of the some of the chemical structure journals had more established practices. So in addition to finding where they had a policy we also want to know what was in that policy. So we looked at data types and with this we're looking basically what type of data do they want to want the author to deposit. Most of the time we found it was datasets, multimedia, other data fairly general terms quite important in terms of datasets but general was very few asking for specific types of data but those it did were actually like program code or protein crystal structure to be deposited somewhere. We looked at where they're asking to deposit. The great percentage of the policies we looked at requested materials were put on a website fairly general again or just journal website. However when we did some stakeholder consultation it was revealed that a lot of the publishers were actually quite keen on well-managed subject repositories but few were actually specifying then in their journals to do so. When was looking at deposit? This is again quite inconsistent across the policies. 23% of the policies you looked at asked the data to be made available for peer reviewers but not necessarily available to the readers after that point. 51% mentioned actually depositing alongside the article and with some of these percentages they might be ticking several of the requests. They might ask for reviewers and to be deposited later and available so it's more interesting. At least one journal did allow the collusion of an institutional website URL as an end note to the articles as long as it was a statement there that said the data hadn't been peer reviewed and maybe updated. It did allow for that tying in of the data, the background data to the article. Regarding sanctions very few but only 22 of the policies we found made any indication that if you didn't deposit the data you might not be published. So we decided to look at exulting stakeholders and these were really across the board. Scholarly publishers, research funders, research administrators and positive staff, library staff and the researchers themselves and we wanted to look at how they currently share data. Do they agree with the idea? Do they have any concerns about sharing data? Would they use a service-listed journal data policies? And for those that are around, would they be interested in assisting with the sub-keep? So we conducted 23 in-depth interviews and these were mainly with publishers, libraries, support staff. We also had a focus group of researchers and a workshop with publishers and we did an online survey that was directed at researchers again. And across this it was found a complex situation where a different stakeholder groups making some assumptions about each other's views and what their actions. However, majority did support making data open and listed quite a few benefits of doing so. For example, preserving data for the future, promoting knowledge, reducing fraudulent claims and enabling the data to be scrutinized by the community. However, there were some concerns and barriers and caveats. The researchers were concerned about who would own the copyright to the data. Would the data be available in a form that it could be valuable to share? Spreadsheet numbers might not make any sense to another researcher. Do they need another layer or sort of basic analysis report to be shared? And in some cases of the researchers, particularly early careers researchers, they were concerned that making the data available before they submitted their PhD could mean their PhD was worthless. So just give some of these three of the main groups and some of their sort of comments. Researchers, they indicated quite, they thought a journal policy bank would be quite valuable because it would allow them to access whether a particular journal policy fits their form of data or data sharing ethos or the requirements of their funders. And it could be a point of reference of accessing other researchers' data. The librarians and the policy staff, those with the history of librarianship, had really not so much knowledge about curating data, but they had similar experience with curating journal and monograph collections and thought this knowledge could be transferred. However, in spite of this potential, there wasn't much happening in the UK since sort of that state called the management, the same JISC program has results in several research data management programs at the various institutions taking part. So that that picture may be changing. However, they thought the librarians did think that a policy bank would be quite valuable, it would enable them to support and develop research data management at their institution and would help them gain information, provide publication guidance to the researchers that were interested. Now publishers, obviously, wanted to look at what they thought. They thought that the audience for George was a little bit unclear, was it researchers, was it the publishers, was it the librarians. However, they thought that an accessible list of information on data policies could be useful for the funders and policy staff and authors themselves, but it's effective for researchers to ensure compliance with funds and institutional demands. So some sort of summaries of the state called the consultation. All of the state calls us to recognize the importance of linking between journal content and underlying data, particularly where data is stored in subject-based repositories. There's consensus about the importance of making data freely available. How is the less unified approach about actually doing so in practice? So some of the common features came out at the stake of the conversation of what should be in a George service. There's quite a wide-ranging specification on requirements. As you listed them all together, it's going to be quite hard to satisfy everyone fully. However, these are the five common features that came out. They wanted to clear automated and simple instructions on the service. Clear documentation on the service's aims, its policies and procedures. They wanted to know, for the journal policies, they wanted to know what the conditions or deposit were, would they be able to reuse, how to access and any restrictions on the data. They wanted guidelines for recommended file and data types and metadata, policy wordings of how to write the policies. And they wanted to know where the data could be archived. Almost 80% of our respondents to our online survey, which targeted researchers, answered they would use such a centralized service that records the data sharing policies of academic journals. So certainly was interest in the service. But the big thing is, can it be self-sustainable? So having my colleagues developed a potential, based on the state calls of consultation, some three basic services are then market tested, spoke to the state calls, which were the more interested in. So the first one suggested was a very basic service. The minimal web interface would have a excuse the acronym. API and application programmers interfaces, which would allow machine-to-machine interaction with the database. But it wouldn't be much more than that. The second was an enhanced service that would be the same as the basic, but there'd be additional data integration. So it would link through to compliance with funder policies, possibly institutional policies, and it may list recommended repositories for the deposit. Lastly, there was a advisory service that would be same as enhanced, but on top of this would be a more advisory guide, best practice for writing policies, policy frameworks, or policy language suggestions. In general, the state calls referred to either of the first two. However, when it came to speaking to budget holders, another quite positive idea on the start of a research management side, they were less keen on finding the funding. They didn't think they could persuade the organization that this was sufficient benefit enough to want the funding. Conversely, the options were quite keen on funding, but they wanted a lot more in the service that would possibly make it impractical to start off. However, based on these three services options and state called accommodations, a full business case was submitted to GIST as part of the feasibility study. A quick summary of findings of the project. Regarding data sharing, it was felt that this was quite an interested subject. It was certainly a growing area. There are publishers developing data-only journals and a rough guide from some previous studies of McCain in 1995 and Pirouin in 2008, bearing in mind the different population sizes. There did appear to be an increased number of policies each time people looked at it. So, it's certainly an increasing area. However, when it comes to actually sharing the information, there's a lot more floor uptake. Researchers were perhaps more likely to share with their major colleagues, but not necessarily the colleagues on the other side of the country or the other side of the world. Again, similar reasons down to the hypothetical PhD student mentioned before, they were concerned that other people would trump them in publications. The policies that did exist, it's a possible slight increase in this area, but they were still generally poor, not very clear, and they were missing in some subject areas. There's a general support for George Service, but the requirements differ between research and publishing communities. Although there's some five features that could be some issues of how to go about. However, the data is in an increasing area, so a George Service could benefit the future in this area, and it could help build, while the numbers of policies are smaller, a bit better, more discussion and build now. George recommended to JISC a two-phase study, a two-phase procedure to go ahead. Phase one would be grant funded, and it would build a simple service, focusing on getting a good dataset of the policies with simple technology, then use that to build engagement with stakeholders, build awareness, establish a need for the service. There could still be that machine-to-machine interface with third parties creating applications on top of it, and also to further develop a cell sustaining model. Phase two would implement the cell sustaining model, and there might be a need for some additional funding for breaking even, but there could also be opportunities for grant funding research and development activities. So some final thoughts on what we found for the George Service. The use base for a George Policy Bank would probably mainly be people on service that work within and support the research community. A lesser number of users would be publishers and funding bodies, as their representatives acknowledges some use for collation of the journal data policies that we found. Such a service could provide easy access to journal data policies, provide clarity on when, where and what to deposit, provide guidance on file and metadata formats, and help librarians and support tasks to enable researchers. As I mentioned, there's currently a small number of policies available. We're talking in the hundreds, if we take into account previous studies. So building a George type service would be much simpler and likely built sturdily if done now. A little introduction of good practice now before the policy numbers increased dramatically and no one has an idea what's to do with them. So at the moment we're waiting a decision from JISC on how to take the George concept further as they consider our feasibility study. So my recommendation to you is get involved in research data. If your institution has a research data management plan, get involved. If it hasn't encouraged the Paris Abbey, it's a good idea. So that's a few references there in short. As I said, I can provide them in full if required. So any questions? That was a really interesting presentation and having done a very small desktop survey of journal data policies, I applaud the rigor of the work that was done by George and recognize it was no small task.