 Hello, and good afternoon. It's John here at UK Data Service, so I'd just like to welcome you all to today's webinar on licensing and governance for research data. So, just a quick introduction from myself and Louise, just to explain who we are. I'm John Sanders and I'm Research Sport Manager at the Administrative Data Service, which is based here at UK Data Service, and I'm here with Louise. And I'm the Director of Collections Development and Data Publishing here at the Data Service. And we're broadcasting from Essex University in Coulterstyr in the UK. I'm aware we've got a good spread of people connected in today's webinar, so just thank you very much to everyone who's joined from overseas. I appreciate it's not the middle of the afternoon for everyone who's listening today, so hopefully it'll be a good hour session for you. Just a quick acknowledgement to some of our colleagues who are helping us today with the delivery of the webinar. Laura and Susan, Scott and Matthew, who've all contributed towards the content of today's webinar. And Laura and Susan are here today, helping to field any questions that you may have. And you should have the facility to log questions through the chat box on your webinar software. We're going to field questions as the webinar goes and log them and then we'll have a slot where we can address anything that comes up at the end. At the end of our formal presentation. So just quickly, I'll outline what we're going to cover today with presentation and then I'm going to hand those to Louise who's going to take us through the majority of the presentation and then you'll hear me again at the end of the final section. So, five main areas we're going to talk about. First of all, core principles of data publishing, second legal and ethical issues surrounding the sharing of data. Third, we'll be talking about pathways to access for data and particularly highlighting the five safest framework, which underpins an awful lot of the delivery we do here in the UK and especially at the UK data service. Fourthly, we'll talk about legal documentation around data sharing governance, data licences and access agreements. And finally, there'll be some worked examples talking about the practicalities of operating governance for data access. So I'll hand those to Louise for you in the first section. OK, so the first part, a short part is covering some of the core principles of data publishing. And I'm sure looking at the audience, many of you are familiar with many of the principles and you do already operate data publishing mechanisms, but I should just stress some of the important points. So, first of all, we know there's many different publishing routes from the DIY to handing something out on a memory stick or a CD or publishing something online through to the trusted digital repository, where we have certification and we're preserving data according to international standards. So there's a whole spectrum of publishing routes. And if you're not aware of it, the R3 data repository registry now contains over 1500 repositories, which cover a huge range of subject areas and you can go and browse it in quite a nice way. So just before we get on to some of the legal issues, there's some basic data publishing requisites. So when a data set is received, there's various things that need to be done to it to make it available and meet the needs of good access. So a usable format, making sure it's long term preserved and backed up, making sure the user documentation is self-explanatory for users or as explanatory as it can be. The data needs to be non-disclosive, where promised, and there needs to be the rights in place to distribute. So the governance and the licensing are very important means to actually enable this. And if you're not familiar with the FAIR principles, these are some of the basic requisites to meet the principles of a FAIR data. Findable, accessible, interoperable, reusable, and having a persistent identification wraps that all up nicely. So a little bit before we get on to ethical and legal side is if you're running a repository, you do need a collections development policy of some kind. So you have a scope defined about the kinds of things you're going to accept and license and provide access to. You need a robust and auditable appraisal and selection process so that you know how to judge data and what to bring in. And that needs to be audited and recorded for future in case things are questioned. You need a really robust rights framework through which you can bring data in and push it out again. You need to be able to manage any access conditions in a way for the longer term. And that's using a robust legal and technical framework to be able to do that. And you need to be able to store up your rate and host data if you're going to choose to do that and have a preservation role through a trust framework. So moving on to legal and ethical issues, we all know from social research or any kind of research involving people or organisations, we are generally, particularly in the UK, subject to ethical review. So not every country needs to have written consent, but we are or actually has ethical review, but the UK does in many other countries and do too. We need to comply with the relevant laws around research conduct and also around copyright of material and around data protection. We need to uphold standards of research integrity and that's about being transparent about research methods and how we generated our data, how we're analysing it and how we're publishing it. And of course we need to avoid social and personal harm when we do research. So obviously these are just high level obligations. As you know in the data sharing world there are various concerns and challenges and there's concerns about appropriate re-use of data, whether that's people understanding it properly or whether they're doing the right thing with data. And then there's the challenge of personal confidential information around normally identifiers that are hard to conceal. That could be particularly particular characteristics of people or places and it could be field work locations. And of course the more that you merge and match and link data together, the risk of disclosure increases. There are a number of ethical frameworks available and actually a more recent one in the UK has been what's called the Data Science Ethical Framework, but actually it's not just around data science, it's around any principles of research and it's issued by the UK Cabinet Office and it gives you a set of principles to think around if you're going to do high quality research. So I'm not going to go over each of them, but there will be a link to this document. It's actually incredibly useful to help you make decisions around the different principles for doing ethical research. So a little bit on personal data. This isn't a seminar about the GDPR. There's many of them available at the moment and as are many institutions at the moment, we are working our way through implications for us, how we process data, how we make data available, the role of consent. So at the moment we're kind of bringing in general principles, but we're not going to tell you how to do GDPR. But basically it applies to researchers in the EU who process personal data about people anywhere else and also it applies to people outside the EU who are collecting data and processing data about individuals within the EU. So it's quite broad. It's not intended to inhibit ethical research, but in many ways it's felt that it does. So I think that the clarification of GDPR actually enables us to think more carefully and transparently about how we reuse data for research. And there's various legal gateways that are required to process and handle personal data. So a couple of examples of legal gateways. In the UK we have a Statistics and Registration Act which dates back to 2007 and it allows the information sharing between public authorities and statistics office for creating national statistics. There are rules around confidential information where disclosure of that and unlawful disclosure is actually criminal offence and punishable by a final imprisonment. So that's a gateway through which we can have access to data from the national statistics office. And there's various processing grounds under GDPR for example consent, public interest, task and legitimate interest which all mean that we can actually work with personal data. And of course they will be appropriate to the safeguard set out in GDPR. Little just some high level things on data protection because they're important to think about when you're creating licenses and creating access mechanisms is that you need to be able to collecting data for specific specified purposes. The processing needs to be lawful, transparent and fair, people have the right to be forgotten, their personal data. We need to think about minimising the size of data where it only keeping data sets that are actually relevant. We need to think about the accuracy of information. We need to think about transfer of data, about shifting data across borders and boundaries and we need to think about retention as well. So as I said I'm not going to go into these but the message is there for GDPR is accountability, much more transparency and documentation of processes and in fact we think that will help the archival and dissemination task quite a lot. And don't forget anonymised data or de-identified data don't fall under this legislation and we'll talk a little bit about what that means bit later on. So just briefly touching upon consent, it's important in the UK that research does think about gaining consent from people to do research. Most research requires written important consent in the UK. I know that's not necessarily the case in other countries but it's important for us to think about that. The consent needs to identify and explain possible uses of data which we know is a challenge but can be done as thoroughly as possible. And the new addition there is about free consent on a granular level. For example if you took a study, a long-term study, every time you collected a new piece of data whether it's a survey or a test or you're collecting floods or things it probably needs to have a slightly different consent and then the future governance around the data needs to recognise these individual different consent. The last thing I'm going to touch on for legal issues is copyright because it's important when you're licensing that individuals do have intellectual property rights over data and that can scale up to organisations and funders. It can be quite complex but normally in social research there's not a lot of monetary value in the things we create so we don't have great big laws, great big lawsuits like you do in the music industry but nevertheless there's intellectual property rights. In the UK we have something called Crown Copyright where material including data created by the silver service and government departments are subject to particular kind of rights. There's also database rights in the EU and the UK which are a class of literary works and there's a rights around the arrangement selection and choice of putting data together and also the kind of structure of the database. So it's important to think about in a collection of data who owns which parts. So a little bit about access before we get onto the legal documentation. We use these four pillars for enabling safe access to data and first of all particularly in the UK this is informed consent for long-term data sharing when we're dealing with individuals or organisations. The second one is where protection of identities have been promised we need to do that as far as we can. Regulated access where it's needed and that doesn't have to be a whole data set it could be various parts and we can think about restricting access to various different categories. For example a group might be that just academics can see the data use might be use can only be non-commercial and a time might be you might decide to put an embargo on data for a number of years and traditionally archives have used embargoes quite a lot. We also need to think about securely storing personal data separately from the kind of bulk of research data. So they're the four kind of principles for enabling safe access to data. We have a mantra here and it's open where possible close when necessary which means when we're negotiating data we ask people to think about the various options across one data set so bits you can open try and open them bits you need to close will discuss which bits they are. So we do operate a very simple spectrum of access we've had this for quite a long time I think we were one of the first archives to publish a data access policy a while back and that was our director who came up with the three tiers open safeguarded controlled and I know we're talking about the word risk and we'll come to that in a minute but open data typically is really zero risk probably aggregate data really no chance of identifying any one or anything. It tends to be under an open license and there's very little restrictions on reuse but we can have a look at some of the license types to deal with this kind of data. The second type is safeguarded and that's the majority of our holdings fit into that middle category and this is where there's zero to very low risk and this requires authentication authorization so typically logging on and finding out who you are. You have to sign a registered agreement that we'll talk about and we'll show you how how we do that in a minute. The safeguarded can also include extra conditions put upon data depending on the nature of the data owner and whether they want to put additional restrictions on and then the final category is data which really does have risk so it is a risk of disclosing personal information. It requires project approval which John's going to talk about later about how you do that. Use a vetting and training and access by a safe haven and any outputs that are checked as well so that's three categories of access. We use the five saves framework that was came up with by ONS some years ago and we've been implementing it for quite some time but it's a really nice simple framework that allows us to provide safe access to data that meets the needs of data protection yet it fulfills the demands of open science and transparency. So the five saves are safe data where we're thinking about treating the data to protect confidentiality so open data would be safe and controlled data would not be safe. Once we've decided on that we then invoke some of the other saves and that's some people making sure that our users and researchers are trusted and trained to use data safely. Safe projects means that their projects are approved to be meeting public good or whatever the demands of the data owner are. Safe settings is where we have a secure live environment which will contain data with some risk and then safe outputs are making sure we're screening the outputs or tables or publications or text from the analysis to make sure it's not disclosing personal information and if you want to have a look at our little video we've got a three minute video that sets out how that works. So just think about how safe is safe I mean we could discuss this all day but when it comes to data it is a relative term there's no such thing as absolute safety and as you know the more you begin to link sources together public registers there's so much data out there then the safety becomes a little more tenuous but it does involve reduction of risk and this is in a manner acceptable to the data owner so they are the ones that sign off the level of risk so it's often a negotiated process and a real think about what risk means and what risk they're prepared to take. So we think of the five saves as a balancing act so if you have open data that's the safe data you don't need any of the other four saves if you have personal data you need all the other saves to be implemented so that you're meeting the needs of data protection and also ethical and needs to so it's a very nice device. Just to give an example of this if you had one survey this is one of our very well known cohort studies from the the British cohort studies going since 1958 called the National Child Development Study you can see that it's deposited as one data set yet we have multiple different collections available under different access conditions so there's the safeguarded one at the top where you just register to get that there's a special license which is another category where there's additional conditions placed upon that and then finally there's the secure access or controlled access where there's extra variables put in that data set which contain geographies or variables deemed to be too sensitive for the other categories so not everyone can get these equally it may be that the third one you need to go a lot through go to a lot more hoops to get access to that whereas the majority of users are pretty happy and satisfied with the first one but it means there's an access pathway for a data set as it comes in and it can meet various different tiers. Just an example of a safe haven there needs to be a demonstrated research need for multietail data and quite often people think they need data in that but actually when you ask them to say why they do they really don't know so there really is a triage process around making sure that those people really do need the additional variables of the geographies and they've got a good you know public case need for doing that it's very expensive to operate controlled access it's something like 30 times as expensive as just a simple registration so not only kind of can clog up the system it we really need to know people are trained and actually need to do that research so they're the five states are invoked in this way you have an approved research application so you become a safe person you're signed off by your institution as a safe person you um you get face to face training an access committee which john will talk about later will sign off a project as being safe you will go into setting and use data safely in a remote access system or in a safe room and your outputs will be checked so that's that's the five states just moving on to licenses now we want to just kind of show you what what we feel is a very simple legal model we've negotiated thousands of data sets over the years we have around 8,000 collections in our in our collection and susan could have been sitting here has been here a long time and then negotiating life invasor for many years and we've seen all kinds of legal documents thrown at us saying this is how i want my data license in fact sometimes it's just like a lawyer's paradise in getting these things to be drafted but we are firm believers that there is a really good model for a data set license that's quite standard license that supplies to most kinds of data as a baseline so the idea is to license your data using a standard license or a different kind of license that we'll talk about in a minute and to make sure it's one document with various appendices in it if you can do that through a broker a trusted broker then it really helps and then there's a user agreement that the broker or the repository invokes with the data user and again these are it's a legally binding agreement to say i'm going to use this data and i agree to all the things you've told me to do in that agreement and therefore there's two legal pieces of information and frameworks and that sit there the license on the agreement what we don't feel is really needed and actually what many owners data owners don't want to do is have a relationship with the data user and this is how we have our our relationship with it with the office at national statistics they had hundreds of users knocking their door many years ago and decided they didn't they couldn't they didn't have the capacity to deal with research users and phds so um we became a trusted broker for them so um the idea really is to try and get rid of the individual data sharing agreement which can be very legalistic and difficult and many data owners probably don't need that relationship unless they're dealing with few users but once you're dealing with hundreds it becomes really hard to operate that and particularly if something goes wrong and need to invoke breach procedures then it becomes very difficult so having a broker to do that is really useful so a license agreement it is a legal arrangement it's a legally binding document and it's a contract between the depositor or owner of the data and the repository it clarifies who owns the data and whether they do have the right um for the repository to publish it they can uh a depository consign on behalf of someone or some of their permission to do that and there's also thinking around making sure they do know if they own parts of the data or if there's any problem in there if there's derived datasets they want to share they need to make sure they've got copyright clearance or that there's an understanding of what can be republished on what can't um the license agreement grants the repository a non-exclusive right to preserve and disseminate data and that means they can have as many contracts as they want with anyone else um and that's quite a useful thing to have and then finally it does set out what the user's allowed to do so part of that will be showing the depositor what the end user agreements look like and the license type should always be displayed to users so they know what what they're supposed to do so just an example also having firm copyright holders and a license means that then you can invoke a citation so we know here that the national centre for social research are the owners of this data and therefore they get the citation as to the ones who created it so it enables us to have transparency around who's actually put the efforts into this and that's done um the person's name there will be the people who signed the license so it's like there's many other kinds of license we're not going to go through all of them but a common one that's actually used that we use for some of our open data are creative commons seem to have got quite a lot of popularity seem to be everywhere as well they're available in both human and machine readable forms they're appealing because the rights are very well set out and quite easy to choose from but one has to be careful because if you choose one they're not actually revocable so if you disseminate something under the license terms in a creative commons license even if the owner decides to change their mind and stop distributing the user's still got a right to use it so just think carefully about the kinds of use the license you want to use it is a bit of a minefield there's lots of different kind of adaptations and types and just to give an example because it's a very clear way of presenting this there's various clauses in there around attribution so the user needs to attribute data share a like the materials can only be shared if they're the same kind of flavour some don't allow any derivatives or derivations of data some allow only non-commercial usage and then there's various different combinations of those so the ones that we tend to use are attribution we always have attribution in our data the ones using creative commons and non-commercial commercial depending on what the data owner wants and that really does mean commercial does mean selling data it doesn't mean having data used in a book on a television program it does mean kind of selling a marketing of data there's a nice license selector out there I think there's more than one where you can work through the different scenarios and things you want to do either for software or data and you can go and have a look at those later but I just want to talk about what we use here because we do we do use a variety of license types our first one is a kind of grander one and that's some a much bigger document in fact it's a very very long document with lots of appendices and it's called a concordat and it sets up a formal relationship and a contract with a government department and in this case it's our national statistics offices for example Office of National Statistics and the ones in Scotland and Northern Ireland and that means there's an annual review of the relationship between us for example when ONS create a social survey we get access to the data in a timely manner and we make it available in a timely manner too which means there's a really nice flow of national statistics micro data to users and it seems to work very very well and we are building concordats with other government departments because it's a nice way to share data and then there's the open data licences we don't have that many data sets but it tends to be aggregate data sets teaching data that are kind of cut down versions of bigger surveys or some historical collections where you know the people are definitely dead and they can go open and then there's our standard licence which is used for almost all of ours and that deals with the safeguarded and controlled data sets and it also allows the owner the data owner or a copyright holder to define the access clause so you saw the three-tiered open safeguarded control they can choose which route they want and any any specific things they want to talk about in there it could be vetting of publications which is quite rare but it can be done can be set up in an appendix so standard four four to five pages and then other things can be put at the end and it's it is preferable than trying to negotiate a brand new legal agreement with lawyers because that can take a long time so just an example I'll have an end user agreement standard terms and additions it's legally binding it's actually click use and then there's additional terms and conditions that can be placed upon data sometimes you click through to agree them sometimes they require you to submit an application which needs to be approved by the data owner or a committee and other times there's a secure access where you need application form and you need training so there are various different ways of dealing with different conditions on the license and just to just to say briefly most user agreements around data cover broadly the same thing so there's something like 18 causes in there but at a high level you've got issues around security and storage making sure you don't like try to identify people making sure you're keeping data safely and you destroy it according to best practice research integrity if you do find errors report them because it's really useful for the next user use them only under the conditions agreed so don't try and sell data while it's a non-commercial license report any non-compliance that that you know about make sure you cite data and do accept that any non-compliance will lead to penalties so that's kind of around good practice and then finally they have to agree that personal data collected about them can be useful service purposes and that's very useful to have that in in the agreement as well so control data very similar things to agree but a little bit more so because they're analysing data in a safe setting and they're undertaking their analysis in there they have to access personally identifiable data they have to agree to conditions for handling personal data and they have to agree to an extra set of breach penalties and they do need to agree to be trained as an approved researcher so there's a few more things they need to agree to and in this case the institution has to counter sign rather than just the user so it's a more complex agreement and it takes a bit longer but that's because we we have a legal gateway to go through a little bit of non-compliance is very important to have a non-compliance policy and we do have a managing license compliance and that sets out all the kinds of things and the kind of scaling up of penalties that might happen so the very smallest thing would be that you're not having access to data anymore and maybe permanently maybe temporarily depending on the on the crime and then you might have the suspension of institutional individual data service access to data services or even funding and if you're accessing data under the statistics act there may be a big fine or there might be a prison sentence so one has to be very careful so um there are sometimes breaches but they're normally really minor unintentional accident for breaches and when we when they're self reported um then maybe they need to be approved again or retrained but they're normally things like accidentally well copying information from a screen or looking over someone's shoulder in in the secure lab so they're normally things that actually are unintentional and can be corrected um and that's a positive thing we haven't had me major breaches i'm now going to pass over to john around some of the practicalities of operating data access because it does require human effort technical effort um and it doesn't happen by itself right thank you very much to you so um really the sort of the final section is just to sort of illustrate some of the practical arrangements uh and some of the thinking about service delivery about how you enact some of the principles that Louise has talked about there um so really this comes about down to um how you do this uh this process of actually handling an application i'm really going to focus in on a particular strand just to be able to illustrate um illustrate some of the practical things that you have to do some of the actual working arrangements um obviously there's consideration you can kind of deal with this as a spectrum in terms of how you enact what are those principles you can think about who it is that's going to look at granting permission for access um and you can you can take different sort of different sort of um you can replace different solutions for that whether it's about having individual people who have signed off on data or whether you have quite structured mechanisms uh not to pre-empt what's going to come but i'm really going to talk about uh to demonstrate all the practicalities really sort of focusing on how you need a data access committee really to discharge that kind of function and you can see there just to highlight there are kind of some things to think about when you establish the practical government arrangements about are you creating single points of failure is there clarity in your process um is it clear to people that have to come on and apply what exactly they have to what exactly they have to do to gain access the whole example here really is looking at that controlled access and it is the top end um in terms of governance and rigor um it is the most service intensive and it's the most difficult and sort of time consuming for applicants to get through so there are elements of this which may sort of cascade down to data that doesn't require so much rigor um and you can think about that but it's easier to talk about sort of the the whole and complete picture of the most rigorous way of doing things and then uh take what from take from that what you need if necessary really important point to make is that um although it's very good to be clear about arrangements and have them well defined um they may need to change over time because they may be dependent on context and circumstance and so uh it's not that things uh don't have variation the opportunity variation of flexibility but what's really important about governance is it's enacted in a consistent and a reliable way as much as it can um so just to pick up on the uh the five safe model that Louise has already talked about what I'm really going to focus on here is the uh the safe project strand of things and how governance is enacted around that um and it's probably worth pointing out that although some of the language that's used about this around things like say approved research or as a gateway to project use um there is really a kind of uh a possibility to draw a distinction between how we think about the governance and approval around the people doing research and how we think about the governance approval around projects and the research intention in it itself and so I'm really going to focus on the the project application journey and that the governance around that so um there are a few different names uh for the kind of the body that normally handles project applications within a data research governance process but I'll refer to them as a data access committee through this and then it's just a brief definition just to kind of explain what that really looks like in practice so basically it's a process where um a group of people are actually looking at individual applications that are proposed to use uh for projects that are proposed to use data uh and collectively come into a decision about whether to grant that access or not as I said it's most time and resource intensive way really to govern data access and so it needs to reserve for those situations that really warrant that kind of level of scrutiny um and that amount of diligence it is a useful mechanism for situations where there are a lot of complex factors to consider because you've got a number of people around the table so to speak they can really consider different dimensions that might apply to whether or not access should be granted uh particularly if there are certain perspectives you need to bring um and I'll talk about membership in a moment or if there are very difficult decisions to make that are quite subjective it's easy to think about this as being kind of applicable in situations where data sensitivity is very applicable um and where there's very sensitive data and there's got to be a caution around the land access um but there might be some other things that drive you into using this kind of solution as well so if you're in a situation where your service will actually um kind of have a lot of overhead in granting access and facilitating it you might want to um think carefully about how much kind of work you take on that and then again if you've got data which in itself is actually finite and there's only so much available for example you've got buyer samples you might want to think about uh ensuring you don't have duplication of use or you're having usage of data in ways that means there's sort of greater potential for further reuse in the future so I'll just walk through um the sort of set up preparation and operation of that data access committee uh just to really straight all the thinking that needs to happen around this really so there are sort of five key things about setting up a DAC um that needs to be in place and the first and most obvious one of those really is um having a decision making framework that um on which access decisions are based and so there's a link through here to uh the criteria that apply when oh and X will make a decision to grant access under through research for example and it's really positive if you can be really clear about what that criteria against which any application is going to be judged are uh both because it helps people applying for external transparency it's also very helpful if you can be clear about what kind of thresholds need to be reached of things to be approved or under which things will be rejected but it might not be easy to be that clear because of the subjectivity maybe of some of the judgments that are being made um thinking about the remix of the decision making framework it's also important for any committee to think about how it relates to any other kind of decisions that are going on um around research projects so the most obvious and easiest to illustrate is when you've got a requirement in academia say for ethical review uh you need to make sure there's a awareness in your process of of the other process and how the two interact whether they're in dependencies you don't want to be getting to a position where you're creating a decision making uh process which is replicating or contradicting something that's been done elsewhere in a really robust and solid sense in practice you need to think about who is who sits as part of this uh committee this panel um having a single person making decisions is a is a big risk for resilience um you may need to have a range of different people involved in arriving at the decision and it's likely that membership is going to reflect the criteria for the assessment and what kind of uh issues are being taken into consideration when approval is granted or not so potentially people who have specific knowledge of the the data or the research area potentially people who are representative of um the data subjects lay members potentially if you're going to look at public benefit as a criteria potentially it's all there and worth worth thinking about you don't need to consider as well that this whole mechanism will need supporting and staffing with some sort of secretary at just to run the thing and and to make sure it operates uh effectively and correct and efficiently the final thing to establish when when you're in the setup phase really is what kind of material you're going to put in front of the panel to allow them to make decisions about individual projects so most obviously some sort of application form or process that will allow applicants to say what it is they want to uh what data is they want what they want to do with it but there might be reports or assessments that from within the service you wish to produce that will accompany um those those applications and give some sort of context for commentary so in terms of um service functions to support a deck there are a number of things that will have to happen before a meeting uh can take place and uh the committee can actually consider applications so the most obvious thing is to have a good slick application form that captures the right information for the deck to allow them to make uh a nice and easy decision uh based on good evidence um obviously application form will do very well to be tailored towards the decision making criteria of the deck and to present the information in a way that's both easily intelligible but it's also clear um in that sense it's really helpful if you have a system that's based around something with online forms for example just because it allows you to automate a lot of the work to support um applicants in input in the right information in there and then there are two stages to the service both kind of reactive and reactive to support people who are individually in there trying to make applications against these data i've linked there through to a couple of documents produced by an HS digital some some web content which really is guidance to people who are applying in to help them refine and establish their application uh and give it the best chance of meeting the requirements um an assessment and allow the panel to judge it there is obviously also and the Louise did mention this earlier sort of some level of manual effort from within any given service the support and adapt to screen and triage applications that are coming in and really you want to be able to help people when they're inquiring about the access to data potentially uh signpost them to alternative less disposed of data sources if that's appropriate make sure you don't have people going through a governance process when they really don't need to and also help screen the applications they've put together to assist them in addressing any obvious uh errors or emissions potentially in the application they have put put put together just to kind of highlight the importance of this really i'm just on this slide pointing out some of the key information which invariably you're going to have to have probably in any particular application uh just to make sure as a basis you're capturing the right information so obviously a description of the project in itself but probably something quite concise and intelligible something that's very clearly laying out the data required potentially where there's any data linkage the ways the years the variables required and importantly a justification of why that data is required for that purpose so you can sort of see the um so the applicants basically able to demonstrate their confidence in the fact that that's what they need to be able to get the outcome they uh they require also useful to capture some information about the team doing the work so you can sort of understand the intention uh and the background and potentially whether it's been through all the other processes like peer review um in order to be able to progress as a sort of project so you can potentially refer in your that process back to kind of assessments that have been made of that project further uh earlier in its sort of lifetime as a project that's evolved at this point obviously as a service you can also be collecting some key information um about the service they these applicants are actually going to require where their data access will happen uh when it will happen how many people and to the extent this relates through to some of the um some of the functions you'll need to carry out about uh approving of individual researchers as well um final point here one of the key things really is to understand what kind of output is going to be produced from doing the research work and that's very helpful both for the service point of view so you can understand what kind of output checking will be required but also to for the panel to be able to understand sort of the purpose and outcome of allowing access to the data so in practice there's a a few key things to think about for operating a DAC uh the most obvious thing really is a cycle of meetings so that um or some kind of cycle within which the panel consider individual applications that are coming forward and that will really need to be tailored and respond to a number of factors the volume of work that's going through the DAC uh potentially reflecting timescales of all the dependent processes um it would allow you to establish kind of a pattern into your own access committee obviously the ability of the secretary at the sport of that can potentially commitment and availability of the different members who are making up the power themselves the method of actually operating the committee could be different it could be uh could be virtually could be face-to-face you probably need to think about what's most appropriate for your uh scenario uh it might be that you want to if you're establishing a DAC uh start off with something that's slightly more intensive and more face-to-face so it beds down the operation then moves to potentially more virtual um ways of operating equally so there are a range of different options about how you actually put information in front of this committee um and present the information to them they need to then make their decision on this can range from simply tabling paperwork to them or circulating by your mother forehand uh having some sort of presence a formal presentation with some advocacy from the service who's supported the application development or potentially through the applicant themselves been able to present direct and there are pros and cons to these different uh these different ways of dealing with this um and it's worth sort of considering what's going to be most suitable i mean one of the most obvious things that will happen with with any access committees they may not be able to from their first pass at an application decide to whether to approve or not they may need more information and it's probably there when you need to think about if there are different ways to deal with that kind of workload through uh email correspondence within the committee or or potentially phone calls to applicants that sort of direct communication the final outcome from the committee really is their their decision which ultimately does come down to approval rejection um and there'll need to be communication of outcomes both to the applicant themselves and they'll probably have a fairly keen expectation about a quick turnaround of a response um but also um probably some thoughts to be given to where you publish those outcomes more widely very helpful from a transparency point of view to be clear about what decisions to approve or not are made and what the reasons for those are and so there's a link through there to the register that NHS Digital again published for all their dissemination of data and it's it's really helpful to be able to be transparent about data releases uh full stop there are a range of different ways of doing them so obviously from a research point of view really the interesting it stops when the governance uh really comes to an end and they can use the data to do their work um there are a few final governance steps to think about after access is complete and when you know research outputs have been produced obviously there's disclosure control and output check-in that's very much a service element provided by data brokers like data service to kind of keep in some kind of control around what data ultimately gets released from secure access type environments potentially there is a role for data access committees back here as well where if there's a want to bring back the actual outputs for final approval um by the data owners themselves or given that's um maybe not a very open uh a sort of restraint to put on the use of data it might be that data owners are particularly keen just to be aware of the outcome of research so it can be prepared for the point of publication for any sort of enquiries about the findings emerging from use of their data um it's also it's also very important really when earlier in the application process we've looked at commitments by applicant about what it is they're going to do and what kind of outputs they're going to produce and disseminate from their research potentially to sort of lead things like potential public benefit or commercial exploitation to be able to track that back and see what the ultimate outcome was from the data use so there's a very whistle stop tour through some practical elements of running governance processes Louise I'll just hand you back to summarise on the final slide then we'll take some questions so just in summary just a few points take away um we need to treat the data or deal with the kind of personal data disclosure risk in there to maximise opportunities for use we can check the data for rights issues prior to publication we need to think about using standardized licenses a clear data access policy um that reaches the spectrum of access spectrum of access is really really desirable setting up methods for enabling the access using the five states making sure the application processes where you do have data behind a gate are fair and transparent using end-use agreements that are standardized enabling access to disclosive data if you can through legal gateways if you if you need to do that and then providing accountability across the life cycle and that's good documentation good auditing and particularly if you're using the five states and safe havens there's an awful lot of auditing that you'll need to go through so that's just a summary of some of the the points um yes we will be making the slides available as a recorded by my webinar and also available as a PDF set of slides on our past events and we can send you the links for those there's an awful lot of resources on our website from data management to advice on a deposit copies of our license our data access policy all those things are hopefully as transparent as they can be um and also we've agreed with um University of Glasgow and I think Valley Recursions on on the line we're going to run a day on the practicalities of licensing and governance in the autumn together because we think um there's a lot of demand for this kind of thing and working through examples and templates I think is appealing to to some people who like the intricacies of this area so we've got various websites and Twitter streams and things that you can and keep connected with what we do we've run a whole program of webinars on lots of different topics and um thank you very much for listening and um we're going to now invite some questions so I'll just kind of coordinate the questions a bit thank you for sending them in and thank you Marta for being asking us lots of difficult questions I'm going to start the first on the retention I'm going to hand over to my colleague Susan Goodogan who will discover that one for you um okay I hope you can hear me okay right um as an archive we have a role for long-term preservation and curation um so we take materials into the collection into the archive and in perpetuity and many archives will have this arrangement and will have successful arrangements in place should they cease to exist um there's a there's sort of two strands to this so there's the archival role there's also the data collected in order to undertake the research project initially anyway and this depends on the purpose how how long you keep it depends on the purpose whether you want to re-interview the participants and you're doing a longitudinal study as well and you have a duty to hold any personal information safely and for no longer than is necessary and whether or not the data that you're holding is personal will depend on whether or not it applies that GDPR will apply to it um I hope that's answered your your question and obviously we can get back to you in greater detail if you'd like I think there's a lot to talk about 10 years I don't think it comes from anywhere in particular it tends to be a rule of thumb that's used and often it's used by universities there's nothing wrong with that but you need to think about whether you've got a longer-term preservation role and whether it's personal data or not because the different rules will apply um so yes hopefully if there's any more clarity we can um we can get back to you on that but again there's no hardened pass rules apart from if you collect in personal data to make sure you're doing it according to the law uh colleagues got us replied and they said that they find 10-year statements after something ricks yet would you get yeah which can't which can be helpful um and there's a lot of couple of questions on DACs I'm going to bring them together so we've asked if um what uh whether we ourselves run a DAC for controlled data no no we don't because that's not really our role um we could run a DAC on behalf of depositors and actually we are part of some DACs we don't run one ourselves and most of the people having controlled access already have a decision-making volume committee that meets regularly um and it depends on the size that they tend to meet every month or so and review applications so um there's no reason why you couldn't convene one but you may want to think about who actually is responsible for doing this and who the ultimate owner of the data is it is very important though to think about because what the data service does do is provide that sort of secretariat and supporting function to enable the DAC to work handling applications ensuring that um only uh only sort of useful business goes forward to the data provider so it acts as a kind of filter there and it is very important to be working closely with DAC in itself to establish the right kind of screening and triage mechanisms to make that work well and just to say that can be an awful lot of work because the detail needs to be there for them to make the decisions so um it's really important that that some human is screening it before now and quite often you'll have somebody presenting the case and then they need to have a really good handle on it um so what the portion of our data are controlled we think it's a very small proportion actually it tends to be additional variables on top of the data sets we have maybe something like 30 to 40 um and they're all managed by a data access committee you have to have approval through a form of approvals panel committee for all of these data sets and they're normally run by the agency or the for example for the longitudinal study is what we have additional variables they're run by a data access committee at the cobalt study centre and they meet every month and they're laid up of the owners of the survey and various other experts and as I said we actually sit on that one to help help make decisions about disclosure risk um let's have a look we would never just have controlled access we would normally try and make sure we've got a safeguarded data set as well because otherwise you're really limiting the number of people who can go in because at the moment it tends to be um people who've got the analysis skills who may need the academic work particular kinds of users so certainly at the moment you don't have undergraduates or master students to use those data sets so again it's quite restrictive putting everything other under control do you want to do the top one yeah another can researchers from outside the UK to put it down to the UK data service yes they can we offer um we have we want to offer an appraisal process I can send links to the information to you and there's no cost but we would be very careful about whether it's a worthy and has enough value for us to um put it through our control access mechanism and again it goes back to having various access levels of data sets in the collection because most people will want to use an end user version and specialist users will want the controlled access version and it does have in-built in-house costs associated with access to controlled access data collection so one of the criteria for deciding on appraisal whether that would go to controlled access will be the amount of people wanting to use it already and whether there's a big demand for it yeah um if there wasn't we'd probably likely wouldn't do that and I don't have a lived data at the frequency of the information that's contained in there as I guess that is your piece um I'm just going to answer the one on streaming data yes we have are having experiences with live streaming based we're actually running with UCL and setting up a smart meter research portal which will stream in live data and households but for I don't know how forever I think we can call some as long as the service is in place and there's quite a few issues that are raised um we'll only kind of ownership and consent because with things like smart meter data it's actually the householder who owns the consent you need very explicit consent about what can be done with that so clarifying consent around these kinds of data um how you're going to deal with ingesting that or streaming it on a daily basis whether you've got the sort of the technologies in place to be able to do that we're actually said a Hadoop system to be able to handle such large amounts of uh column data then also thinking about the curation role how do you do that uh and then also for publishing it how do you add citations to that is it around the chunks when people take bits away or are you time stamping it every day every month every week and getting citations there's quite a lot of issues around that um and yes we are working on all of these things at the moment working on the license and of course the licensing side of the governance on the Netflix it will need it because it's smart meter data and streaming data is only often useful when you're interlink it to other personal household attributes um this will be available in a controlled access environment and it needs licensing and it needs governance and it needs access data access committees and everything that we are actually working on our um license at the moment if you have a question about whether or not we're revising our license to reflect GDPR needs and we are working on it and hopefully we'll have something to share shortly we will be upgrading our data management deposit information with GDPR compliant license form license um end user agreements and website around this whole area we've got a team on that at the moment um we're not ready to do now because everyone's we're still having meetings with funders and things like that but um we'd certainly want to to have a type when we do have it so please watch your space okay i'm going to be the top one are those special specifications for research based in mind uk the data request and access do you want to answer that one yeah sure um so there's well there's different specifications for different datasets so uh for example uh data secure access data which can only be accessed in secure lab can only be accessed in the uk uh so non-uk users would need to you know have a big visiting research or something like that at a uk institution um otherwise special license datasets uh a lot are available anywhere um in the world but there are some that are limited just to the uk um and then el data then visa license data mostly it's just available once you're registered so you know you can be a user from outside the uk use that i mean anyone can register free with our data service mostly the datasets used for non-commercial purposes so as almost you can uh you can justify a you know your your project and and they are not going to be selling data then you can come and use it because it's all mostly downloadable data we do issue credentials as well to get into our system we do have a federated access so people are part of um uk universities can join easily but if you don't have those credentials we can issue those yeah and then you can become a registered user quite easily and we do have users from all over the world using data i think we're probably coming to a close yeah i think i think we've covered all the questions okay so there's a question around teams that dissolve and this is something that john covered actually because we've been archiving and sharing data for 50 years you know people 30 years ago so well i want to know who's using it i'm going to give permission you asked me of course within 10 years they've retired or gone it's not really sensible to have an individual as a person giving permission so we try to guard against that and say please either delegate to someone else who's going to be longer than you or your institution can give permission but actually there does need to be some perpetuity around a DAC and that needs to be built in so it would be up to the department i guess or the university i think every institution actually um hands are differently in a couple of cases of positives they have been involved to us with with really old oral history studies sometimes i um because i was involved in the study can kind of make a decision myself but more often than not either we try and renegotiate away the condition so it no longer needs permission which might mean rethinking out the data or if it's a lot of years later that people might be dead anyway so we've had a large program to make sure a lot of these bespoke vettings are not in place um but it could be done anyway i would suggest probably a central department or somebody with a a curation role in the institution could do that but it can be devolved if you wanted to um it's really up to the day trying to decide what they want to do and that probably would be some kind of contract to put them as the copyright holder in there to make a decision and i think it's also worth thinking about um so as i said really is the DAC model really necessary for this situation if you've got something that's quite transient in terms of collecting the data was there the the depth of potential for reuse to really justify going to that really rigorous governance for it would it be more effective especially if these sorts of problems are anticipated about being able to go back to associated people is it better just to think really from a service point of view let's have something you'd have to put that level of of rig around there will still be plenty of opportunity for reuse if we can put the data into a situation where it's safeguarded or where it's controlled so sorry it's safeguarded or open it doesn't have to have that level of control around it okay i think we've covered all the questions but come to the end of our session so um we want to thank you very much for this thing to us and hope we've answered your questions and again feel free to mail us offline if you want to ask any more we will try to summarise some of the questions and put them as an FAQ and we will make we will make them available as well so um thank you very very much um and we hope to see you at another webinar or an event okay bye thank you very much