 sydd y gallu i chi'n gweld me'n gweithio i ei uchydig gyda'r microphone. Felly chi wedi gweld y fwrdd o gweld. Mae'n nhw'n gwybod yn ôl i chi i'w gwybod yn fwy oherwydd yn ei gweithio'r gwybod ar gyfer ffynol sydd wedi'i gweld gwahodd ei bod yn ysgrifion cynfodol o rhan o'i gyfnodd ymlaen ymgyrch wedi'i cyrraffon yma, Ond mae'n cruseolwyd… …marjdw summary o unedig iawn o unedig iawn arser ukeis… …rhywfyr iawn o unedig iawn log yng Nghymru. Be ddawn i'n hyfforddiad o ran o unedig iawn ar Run and Access i Run and Access… …a bod oedd y cwrnwau o ran o Rhaglen o er mwyn i weithio iawn… …o rhaglen o Lawr and Access i Run and Access i Reitwun i Ynw'r Rhaglen o Rhaglen o Unedig iawn… y dybodaeth o gallu ei chael y gweithio ymgyrch yn gweithio'r iawn i gweithio'r publygau ynghylch. Felly, rwy'n cael ei ddweud yn cyhoedd o'r ysgol a'r cyhoedd. Ond ydych chi'n ddim yn ymweld o ddau i diwylliant ac oedd o'r ddau o ddau i diwylliant o ddau o'r ddau o'u ddau i'w ddau a'u ddau i'r ddau o ddau o'r ddau i diwylliant. Ond y maen nhw'n i ddweud o'r methu ar gweithio, Ac r seedol, mae am ddodd gwirionedd. Saintadiahuhs Roeth Cymddi gwrth Addнов, gallwn gwirionedd, ond ei gallabydd hefyd. Llawn fair medicine o'r amguaredd a drodd mlu a fine yw eu'r hanwedd testur fe'r hyffordd meddwl rhaid, maen nhw'r dyblod pan maen nhw gan name Changing Network, a wnaeth eu waswan producell ei weithg seld. Ond, rwy'n argymau campaign rheliad! Felly, credu'n fawr y tu titaniumol, Cymru ar gyfer y Cymru, a'r cyfnodau ymddangos cyfnodau dynion i'r cyfnodau i'r cyffredin iawn yn Y Nogi. Mae gennym 7 cyffredin iawn, rhai ddim yn dda. A rydym yn cyfnodd ar gyfer y cyffredin iawn o'r 4bion euro. Mae angen i'r cyffredin iawn – ac mae gennym lle i ddweud y sefydliadau. Mae gennym lle i ddweud y sefydliadau i'r cyffredin iawn. Mae gennym lle i ddweud. of money and we basically fund research within the UK academic sector most academic but there is also funding goes into other institutes and now to some commercial organisations as well. So basically what I want to do is give something around a research funder's perspective on data talk a little bit about why we value data talk about some of the data policies we have in our research councils and then I was asked to look a little bit into the future and to see some of the future things I was going to talk about Joe very kindly talked about in our earlier talk so I'll you know some of this work about more enhanced richer publications so I think I'll probably scoot over some of what I was going to say in that area. So data why I'm interested in data my research council natural environment research council funds research in the environmental sciences things like climate change water issues water resources biodiversity and such like this little wordle shows we're interested in data information this is actually developed from our from our science information strategy within the organisation it's just to show that all the research councils really see data as fundamental to the work they do both in terms of underpinning the research and helping deliver the exploitation of that research. So first of what do we mean by data now this is a definition of data from my own research council's data policy I should apologise at this point I'm probably the world's worst person at using PowerPoint so I tend to put up slides which have too many words on so I don't expect to actually read all the words it's just to show we have a definition of data now other research councils have other definitions of data and interestingly there was a recent audit from our audit service within the research councils looking at some of the work of our data centres across some of the councils and one of their recommendations was we really must do something about coming up with a more standardised definition of what we mean by data but what we're talking about here is basically the primary data produced by the research we fund and it's not just digital data it can be samples I mean as as the previous speaker said you know it can be it can be sound clips it can be pictures it can be in environmental sciences it can be analogue measurements it can be fossil specimens it can be samples and material so when we talk about data we have to think in the broadest context even though at times a lot of a discussion tends to focus on what is digital about data we have to remember quite often there is a long tail of other stuff which isn't just digital data so you know for our example here we also talk about the output from models and that's quite crucial say if you work in the climate sciences you know where the model output itself can be actually quite a vast amount of data digital data you know what do we do with that is that a primary output of the research process what about the models themselves are they classed as data they're all part of the research process I have no easy answer to some of these questions but there's a sort of things you have to consider when you'll think about policies to do with data so why do we as research funders value data probably the two main reasons the first one is the data are an integral part of the research record so having access to the underlying data helps to support the robustness the transparency the integrity and crucially the richness of the research record certainly within the climate sciences people may have come across a little controversy in the UK a couple of years ago called climate gate where there was a leak and an unauthorised leakage of emails from the climate research unit the university of east anglia now I have my own opinion about that I hope she won't go into given the partiality of some of the emails that were released because I was actually funding some of the work going on behind the scenes to actually recreate data sets for public distribution but there's an awful lot of research has probably said there but for the grace of god go I you know what happens if someone comes to you and asks you for your research data as a researcher have you got it to hand well how are you set up to deal with requests that come under some things like free information legislation and this is all down a lot of us is our approach now is driven by what was called the transparency agenda you know the access the transparency of research for transparency of government which I'll come on to the other key reason why we value data is to enable it to be reused it's what some people call data sharing what I tend to call reuse and repurposing data sharing implies the data is just going to be used within the research community it's going to be shared with other researchers by reuse and repurposing what we mean is it allows other people to do other stuff with the data now that's not just research and the crucial thing here especially within the UK now I'm not so familiar with maybe how the legislative environment sits in other parts of Europe but certainly within the UK a lot of the debate a lot of the government policy is about openness to government information and by opening up this information allowing other people to do stuff with that information with that data it therefore drives innovation and through innovation you drive growth and through growth you help drive economic benefit and prosperity so what we're saying is in the UK you know if we're opening a research data it's not just for other researchers it's to enable other people to do stuff with those data maybe what the research community hasn't even thought about but there may be other benefits of using those data to drive innovation and from as I said from innovation comes growth so I mentioned this the government has this agenda about open data and transparency and there is something called data.gov.uk modelled somewhat on the data.gov principles in the USA about making public data freely and openly available for others to do stuff with. Now that is also part of what's known as the sort of freedom information legislation so in the UK we have at least two key pieces of legislation to do with freedom of access one is our freedom of information act and the other is what's known as environmental information regulations which is actually a european legislation deriving from the our house convention and that is basically freedom information for environmental information but basically if you're a researcher in the UK in the university sector basically and apart from a few special cases you are classed those data are classed as public data of course universities are publicly funded institutions therefore the data comes on under the auspices of the freedom of information acts therefore you have to be very careful when people request data because you're not at liberty just to say no so if you think about a data policy one of the things you have to think about is how a data policy would fit in with your national legislation to do with freedom information of course a majority of universities and research institutions will tend to be public institutions not all of them but certainly from a UK context a majority so if we're developing policies and I'm going to come on to describing policies in more detail what instruments do we have as research funders to actually achieve our aims now our aims are basically as I said you know to ensure the openness and transparency and robustness of the research record and to enable other people to do stuff with the data generated through research well the first thing is we can have a policy and the policy basically says if you take our money we expect you to do something we expect you to do something relationship to data you know so NSF has a policy and it says things around well we expect you to have a data management plan which is going to tell you tell people about how you propose to share your data with others if we fund you you know we can fund you to deliver stuff related to your data so we can actually fund you to do data specific activities and also as research funders we can provide or we can support through our funding mechanisms actual data infrastructures themselves so again Joe was talking about the work of the EBI you know that is some of that is funded you know through Europe and some of that is funded through the welcome trust which is a major charitable research funder in the UK so that's where funders are stepping up and actually funding infrastructures for data so in terms of what are the data policies for the research councils in general well basically you know data generated through research council funded research should by and large be available for data sharing for reuse though and this is a very strong provider protections and constraints are in place now I mentioned that we can be no less constraining than the freedom information legislation allows but freedom information legislation doesn't mean you have to give everything away to everybody so that there are preventive protections there in place to cover issues say around consent and confidentiality in the medical and the social sciences there are plenty protections in place to cover inappropriate release of data so you don't want to release of data too early otherwise the researchers don't have a chance to publish their research and you know and actually complete the research process so and that is you know that is not to seem to be in the public interest by damaging research therefore there are protections in place to deal with that but I think the key message is when you're developing policies you must realise you know if you have to live within a freedom information regime if you're a public institution you can be no more constrained than the legislation allows so the key thing it's not up to researchers now in the UK to say I like that person I'm going to share my data with that person but I don't like that person therefore I'm not going to share with them if there's no legitimate reason not to share in theory they have to make their data available so within the research councils the basis of our policies is our common principles on data policy and they are almost readable um I do have a slide somewhere which explains what they are in more detail but they're the basically principles that data are public good they should be made openly accessible to others but they shouldn't be inappropriate released um and you know it's it's it's legitimate to use public funds in their management etc the URL is there I won't go into too much detail but it's it's basically to say we have a set of principles we've developed they themselves were were were developed from the OECD the Organization of Economic Cooperation Development has a set of guidelines on I'm trying to think of what they're called um principles on public access to publicly funded research data for my sins I should know as I was one of the authors of those print OECD principles as well but it's to say we know we are seven research councils but we have a set of common policy principles which underpin all our policies on access to data another driver which I've already mentioned this is issue of research integrity you know one of the drivers for us is to ensure that the research we publish is as robust as possible and part of that quest for robustness is being able to access for other people to be able to access the underlying data the data behind the graphs so to speak now as I as I mentioned there's currently there's been a um a draft policy from the research councils on um their new position on open access which is available it's still on the web for people who want to see it but basically it says that research papers must include a statement now on how the underlying research materials data and samples and materials can be accessed some research councils my own included already has a policy statement of that effect but not all research councils do so again this is another driver for us to get researchers to think quite carefully about how they deal with their data and crucially how the research institutions also deal with these issues another area another driver for policies around research data is research basically it's it's codes of practice on research behaviour we have research councils we have our policy on the conduct and governance of good research practice many research funders have these policies about how you should undertake research activities and this is just pulled out of the rcuk code of practice so it notes unacceptable conduct includes um you know a mismanagement of the underlying data and there's a rcuk and there's basically it is unacceptable if you don't make the relevant primary data and research evidence access accessible to others for a reasonable period after research is published and we say by default round about 10 years or so but for major projects especially you know of say medical or environmental impact where policy decisions are based on that a government policy decisions we expect to longer period so it's basically saying you know we as research funders this is what how we expect you to undertake your research so these requirements are both talk to the individual researcher they also talk to the researchers institution as well and the responsibility is a little fall upon a research institution now i'm going to look at the policies of two individual councils my own natural environment research council and the engineering and physical sciences research council the reason i've chosen these two is to some extent there are different ends of a spectrum nirk along with the economic and social research councils has a data policy in place now for over 20 years whereas epsrc basically i think it was last year basically implemented their data policies so they've come to the game fairly late with a formal data policy but they've had the opportunity of seeing how others have done it and they've actually done it quite differently to some extent and um actually if i was starting again i would look very seriously at this policy because epsrc puts their onus on the research institution and speaks to the research institutions that they fund and say right you're the people who are you know responsible for the research teams the research is going out under your name as an institution that's done the work we expect you as an institution to take the main responsibility for the issues around data and to have that responsibility for the long term management of data beyond the end of one project or program um now this is sort of one end of a spectrum of how we can think about responsibilities whereas nirk um and this is what nirk's policy is about very much places the onus of responsibility on the researcher and says it's the researcher's responsibility to do stuff with data so we have a policy you know which and why we as a research and again this sort of tries to explain why we as a research fund to have a policy but it's it's to ensure the environmental data that's collected under our funding are really available for anyone to use in the longer term and to also crucially a second point to help in the form of publication of data which we're going to look at as well coming on to and to meet legislation but our policy very much sits the responsibility is tend to sit on the p.i and i think if i was writing this policy again i would i'd actually say well actually some of that responsibility has to be shared by the institution by default it tends to become an institutional activity but we're less specific about saying it is up to the institution to do it so that's something to consider as organisations develop their policies how much is the responsibility of an individual and how much is the responsibility of the institution they work for now our policy within nirk our key principles are that um we will support the long term management of data but we will make those openly and freely and by free we mean available for free to anyone to do anything with so very much driven by this agenda of reuse repurposing exploitation and crucially we'll make the stuff available for anyone for free apart from very few special cases where for example the data sets we hold contain data which is provided by a third party so if it's a data set say which contains a topographic map or something within it those are produced by the ordinance survey in the uk and they are quite tightly controlled in terms of contextual property but the majority of the data we hold we will make available for free for anyone to do anything with but we also require that anyone we fund offers back to us copies of the data they collect which we will then manage and disseminate through our own network of data centres so i've been talking about nirk i've been talking about EPSRC so there are differences in policies across the research funders and these tend to be based in terms of discipline differences a good example here is the medical sciences you know consent and confidentiality issues mean it's much more um tricky well not you have to take more issues into consideration if you want to share and make available data which is about human subjects because all issues around consent and informed consent about wanting your data shared as opposed to data just collected about the environment we tend to make a lot more freely available so there are disciplinary and access of disciplinary differences which will inform a policy um i've talked about this access of responsibility you know one of them is to do with individual as opposed to institution where do those responsibilities lie and a third access to think about in terms of a data policy is where is the infrastructure coming from to manage the data is it very much just the institutional level like which we were hearing earlier about the facilities being set up at Cornell or is it via a centrally funded provision as from example my research council or the economic and social research council make available central data management facilities and also in terms of how you can actually access funds to support data management activities and here you have to be quite clear if we're developing policies between um those activities that take place within a project and those activities that take pace beyond the end of any one project or programme so within a project by and large from our perspective in the research councils you must bid for the resources you need so if you're going to need a buy informatician an informatician in your project to help deliver data management activities you have to make sure you bid for the money to pay for that person however post project data management from my research council will provide a central facility so you don't need to fund for those activities however another institution may require that you must you must you know you're to support their infrastructure you're going to have to have a certain amount of overhead within any grant application to put into a long-term funding of any infrastructure to deliver long-term data management so again this is a key policy difference which varies across funders in terms of how their perspective on supporting research infrastructures is um so as I talked about EPSRC they very much say it's responsible of institution and that money must come from grants within projects or from block grants or from the money received from the funding councils etc whereas NERC we provide seven environmental data centres which we fund for the long term to manage the data that our research generates we can do this to some extent because we have our network of our own research centres our own wholly owned research centres which are part of the UK's national infrastructure in environmental research so as we have a long-term commitment to supporting our own research infrastructures part of that is providing data management infrastructures within that and on top of that we then provide a a data discovery service a data catalog service which allows you to look across all the data we hold and to search it as a seamless whole and we are moving more towards trying to deliver a more integrated approach for data so as a user you don't have to know which of our seven data centres stuff is held in we're trying to come up with a in a technical phrase a more common web services based architecture where we'll provide a standard service value of web services and other people can build their services on top and dip into our data but the great thing about that is it means that people who may want to innovate say in commercial activities but they want some environmental data can dip into our services to pull data out and mash it up with data from somewhere else we're also talking to the social scientists and the people of medical science about how could we extend this model more widely to research data so that we can start to make it easier for people to do more cross-cutting cross-disciplinary research by pulling data from discipline areas out using the same protocols you know through a set of common service layers so this is still very much you know future thinking but if you know if institutions are holding research data we've got to get it out there and get it used it's pointless just holding and managing stuff if we're not making it available for other people to use now in terms of how we actually go about implementing our policies too often you know I just hear people say well it's all to do with carrots and sticks and as I say give me a big enough carrot and I'll hit someone with it basically we we have to really as research funders recognise we can't just implement policies from on high it's pointless to say well I've got a policy go away and do it I think it may be argued in the US so NSF have developed their data management policy but they've been fairly light on how they're going to actually make sure it it happens um you might want to comment about that afterward um whereas we're certainly with our research cancer saying well actually if we're going to have a policy it's pointless having a policy if we can't make it work so we're finding ways to incentivise people to get into do stuff we're also looking at ways of of monitoring their performance so to speak and and to to censor them if they don't don't do the right things and we have a project at the moment called PIP our policy implementation project to implement our data policy in full as our new data policy came into force a couple of years ago and we're still actively going through processes to make it a living activity within our research community and what we're going to be asking for is both outline and full data management plans so if you're putting in a grant proposal we want to see you've done a little bit of thinking about data nothing major you know it's like a paragraph or so in the case of support but the sort of things you want to know about is well what data of long-term value do you really think you're going to generate you know when do you think you're going to have these available you know and crucially who is the person who's going to be responsible for the data issues during the research activity it's basically to show that the research has at least engaged with some of the data issues while they think about putting their science proposal together rather than coming at data as an add-on almost as a science is finishing saying oh we better do something about meeting the data policy now now if you're actually funded we expect you to produce a full data management plan within about three months or so of getting your award letter and that will be a collaborative activity between yourself and our own research centres and that full plan is very much the contract between the principal investigator the PI and one of our data centres and it will outline the key things you're going to do in terms of data you know the who the what data the where you know who's going to do it where's it going to be done and crucially when's it going to be done and also says what data of long-term value you tend to deposit you intend to deposit with the data centre now this picks up again on one of the issues that joe raised in her talk about we can't manage everything forever so how do we start to make those decisions about what data we're going to keep what we're going to throw away what we say is we will take on board data of long-term value and we'll guarantee to keep those in our in our data centres so we're developing this concept of what we call our data value checklist it's trying to quantify working with the research community how we identify data which we should keep and how we should identify stuff which is less useful to hang on to and can be thrown away as one of our previous chief executives used to say well if it's stuff to do with goldfish physiology I think we can throw that away because all we can do is just put another goldfish in a liquidizer kind of thing but um joking aside this has actually proved it was a nice idea but it's actually proven to be quite a difficult concept to actually formally write down how do I identify data which we want to keep we have a working version we're still consulting on it but it's not an easy answer to give but crucially our management I mean you know that my chief executive wants to know he hasn't signed a blank check to um you know to commit to managing the whole corpus of data generated through no research as it will basically it'll probably bankrupt us at the end of the day so very briefly as my time is rapidly running out what sort of where are we going to go in the future um I think we've been talking about data publication and that's one way of incentivising researchers through being able to formally publish data sets um through the issuing of DOIs for data sets digital object identifiers we now have that facility within NERC to issue DOIs through data site uh Sarah Callahan one of my colleagues is here and maybe want to talk to her informally if Sarah waves her arm around um who's lead is a project manager working for me on our data citation activities there's a role of publishers actually in producing much richer publications and we've talked about some of the work EBI is doing but also we shouldn't forget about the things that commercial publishers can do and commercial publishers may be able to deliver in this area um I think crucially we have to clarify the role of what data centres like what my research council funds which very much have that long term vision of managing data for reuse and the role of repositories in managing data because if all repositories become a ways of putting data into a secure box which you never then open the lid off apart from in extreme circumstances is that going to be useful to anybody another data that is there is one thing but as I said if we're managing these data we're running to get them out there and use by other people and finally under code data's banner we're thinking about something called the agenda for data which is to talk to a large talk to the really large scale international programs of research and trying to get them to adopt very much common principles around data so as these big international programs like the world climate world climate research program and the new iXU program on global change which is just starting up the future earth program trying to get them to come up with a set of common principles principles around data which articulate the value of data and clearly explain if you're going to be doing this global scale science these are the things you must do in terms of data and I'll leave it there thank you