 Maen nhw'n Mark Elliott, maen nhw'n ddwyllwyr, Yngrifedd Ffwrdd y Gymdeithasol yng Nghymru, a'r Ffwrdd y Llywodraeth Yng nghymru. Yn y prefiwr yw'r ysgol, mae'n ddysgrifennu ar gyfer allan o'r annanimadau, a yna, mae gen i'n gofyn ar unrhyw gwrs sydd wedi gweld y nannamadau'r anonymadau a'i unrhyw gyrryd o'r anonymadau y pwylo cyfraeitio data sydd i fynd i ddeignio'r eich mynd i'r gwybod. Cynllun o'r pwylo cyfraeitio'r ddeignio a phraeg. Mae'n meddwl i'r pwylo'i bywydol yn gwneud yn argylcheddol sydd yn ystyried, yn gwybod i'w ddeignio â'r pwylo. Mae blynyddu'n gwybod i'r hwylio ar y cyfrifiadau, yn roi'r cyfrifiadau'r gwybur. Y ddechrau'n gwneud ddechrau'n meddwl i'r pwylo'u gyddiadau'r gwybod cyfrifiadau y gallwch yn cydweithio gwahanol am y ddechrau'r dynol. Yn ogymell byddai'r cydweithio'r cydweithio'r dynol, i'w cofodd i'r bwysig o'r gynhyrchol, a'r ddweud yn ymwyfnodol yn cydweithio'r dynol yn cyfrannu'r proses ar y ddau'r cydweithiau. Yn y gweithio, yw'n gweithio'r gweithio, i dweud o'u gweithio yw'r cydweithio, oedd i'r cael ei ddweud o'r gweithio'n gweithio'r cydweithio, o'r dyseminatio'r amser. Felly, y ceisio yn ysgrifennu holl o'r llunio y brifysgol yn ysgrifennu yn y brifysgol yn ysgrifennu. Fy hoffa gwybod y bryddiant, dan roedd y brifysgol yn ysgrifennu i'r taladau o'r brifysgol. Yn hynny'n amlwg ar y brifysgol yn ysgrifennu, mae'n rhaid i'r brifysgol yn ysgrifennu a'r holl o'r brifysgol yn ysgrifennu bod yn ysgrifennu yn ysgrifennu ac mae'r rysgau yn dwyldig yn gwybod o'r panfyrwyr yn gwybod yn gwybod, Mae'n pethau'r rysg yn gwybod yn gwybod o'r panfyrwyr yn gwybod. Mae'r rysgau yn gwybod yn gwybod yn gwybod fel y concept yn fwy o'r prif, ac mae'n ddweud yn dweud mae'r rysgau ond y rhan o'r rhan o'r cyffredinol ymddynt. Rwy'n ddweud, ydych chi'n ei ffhrindio y prif,aratam Callsadows a reifar those to the decision making framework. I'll list them off here. Now, I'm going to go through each of these in terms, so I'm not going to go through them on this particular slide, but what I will say is, there's three different activities. Firstly, is the data situation audit, which is made up of the first five points. The second is disclaimer risk assessment and control, and that's the technical process, the technical part of the process. And the third is impact management, ac mae y byddoedd y pethau o'r cyfnodau yw'r hynny i'w ddyn nhw'n ymddangos i'r gweithio o feithio'r ddweithio arall, nad ydych yn y ddechrau. Fe ydych yn y ddwy fydd yw mwyaf? Mae yr unrhyw hwn yn ystod ydych yn edrych yn ymddiadol ddysgu'r ddataeth gweithio. A ydych yn ddysgu'r stathig o'r ddynamiad? Mae'r stathig oedd wedi bod i'n ddysgu'r ddataeth sydd yn y ddysgu'r gweithio a dwi'n meddwl y gallwch yn ymdweithio'r rysg. Y gwestiwn yma yw'n mynd i'w dweud o gweithio'r dynnu, mae'r dynnu gyda'r dynnu yn y ffwrdd yn y gallu'r dynnu yn ymdweithio'r dynnu, a'r dynnu gyda'r ymdweithio. Daeth angen yna'r dynnu? A mae'r dynnu'n dynnu'n dynnu'n dynnu'n dynnu? Ond yna'r efo. felly here we have a data situationwere we are moving data in fact into three environments. So we might imagine an organization that is collecting data, passing data to a local authority that is part of some legal process and the local authority wants to release some aggregates from that data that is collected from the third party and releasing aggregates means essentially publishing open data. Now, sjt oes data processes will have its own Rasgwadu ers ei ddaeth, ac yn oed yn nhw'n gyn기�wydol yn ôl yn ceisio atroleddwyd yn mynd i'w ddataeth. Rwy'n clywed o mynd i ddweud y cyd-i ar gyfer o'ch s�wn, a'r hynny sefydliadau yn enhygoedd ym mwynd i'u ddweud. Yr ysgriffeith o ddweud, rwy'n clywed o'i ddweud o'r ddweud i ddweud o gynrustl iawn mae sy'n gyd-dweudiantMayor ac y byd yr yd Sewol. Nid yw'r regisloes yr ésan, a wedi bod yn meddwl â'r regisloes Gwath o'r br�doedd cyntaf, sy'n rhaid i ddweud hynny'n credu datblygu'r regisloes. Ond yna mae'r regisloes yn cael ei fawr? Rhai'r regisloes? Yna'r regisloes wedi wych, mae'n meddwl â'r regisloes fy modd? Bydd y cwm yw'r peiriau azirfa gyrfa pan flinio? Rwy'n cael ei ddw i'w ddodgofynu cyfosol y maen nhw'n dweud o'r ddatach sydd wedi fynd i dda i tanc. Yn mynd i'r cyllid o'r ddodgofynu, a'i ddweud o'r ddodgofynu sy'n ddodg os yma? Er mwyn o'r ddodd cyfosol maen nhw'n ddodd. Rwy'n cael ei ddodgofynu. Rwy'n cael ei ddodgofynu. Mae ydych chi'n dechrau, ac mae'n credu ei ddodgofynu i ddodgofynu, ac mae gennych gyda rhoi'r ddodgol. ac yn y rhai gweithio cyfnod yn ôl i'r ymweld yma, mae'n ei wneud yn ymddangos yma o'r dŵr diwethaf o'r ddatganiad, ac yn ymdweud hynny ddim arnyn nhw'r cyflwyno'r ddau, byddai'n rysg o'r gweithio iawn. Mae'n rhaid yn ddweud mae'n ddau ar y cyflwyno yma yma, ond mae'n eich ddau yma yn ddau'r rysg. Mae'n ddau'r ddau'n ddau'n ddau, dyna'n amser? Who are the data controllers for the data? Are there any other parties involved as data processors? What are their responsibilities? Is the data about people an is the base data personal data? So data could be about people and not be personal if it has been anonymised and similarly sometimes data can appear not to be about people but actually still be personal data. Gymdeithas yw y bydd y ddatblygu gan y Deirloedd Cyfrynigol Ddechrau o'u cyfathr, mae'r cyfathr a'r cyfathr rotation. A mae ddaeth o ddaeth o'u ddaeth o ddaeth, ac mae o ddaeth o'ch droi ei ddweud o ffau'r cyfathr, dwi'n ddaeth o'u ddweud o'r ddwylo. A ydaeth o ddaeth o ddweud o ddaethu o'u ddaeth o ddweud o ddaeth, oedd amddod o athloes o ddaeth o ddaeth o ddaeth. Data subjects, who are they? Are they a vulnerable or a sensitive group? What is the relationship between the data subjects and the data? Have they given any sort of consent to its reuse and so on? What type of data do you have? Is it quantitative, qualitative? Is it in the form of micro data or aggregates or some other form? What type of variables do you have? Do you have any variables that would be regarded as sensitive, either in law or just generally in terms of how they're understood? Do you have any standard identifiers? Properties of the dataset that might be relevant. The quality of the data, actually this is slightly paradoxical, but lower quality data is actually lower risk. It's less easy to find somebody if the data is of lower quality. Is the data time linked? Is it hierarchical or flat? Is it drawn from multiple sources? Is it a population or is it a sample of a population? These properties can affect the risk. Now understand the use case. Again, you may not be entirely clear about why you need to do this. What will the data be used for? It may be that there's a specific request to share the data. It's a specific organisation who wants to use it for a specific purpose. Actually understanding that purpose in detail will allow you to arrive at what is effectively a minimum specification for the data that are needed. So what variables are needed? Is all the data needed or will a sample suffice? Who will hold the shared data? Who will access it and how? Essentially these are definitions of the data and the data environment in that new situation. If you've got a well understood use case and then you go back and start thinking about what sort of data you're able to release in terms of its risk, then you can have a dialogue between the self and the potential user. Now in more general use cases where perhaps you're disseminating a data set for research purposes or as open data, you can still usefully think about how users would like to use this data and to sort of think about what actually is of the most value. Understanding your ethical obligations beyond the legal constraints is also important. Where are the loci of consent with these data? Now this can be quite complicated. So the data subjects may not have been involved in any direct consent process. So this happens often when there are multiple levels of data subjects. So for example GP data, consent is often given by GPs to access those data but not by the patients. The data actually are data about GPs and they're also data about the patients as well. So there's a complex mix of different types of data subjects. So who's consented to what is actually quite important in terms of understanding your ethical obligations? And who is aware of what? Not just to do with the notion of consent but also awareness. And are there reasonable expectations that a data subject might have as to what is going to happen to their data, the data about them once they've had it over for one purpose? Is it reasonable for them to expect that it won't be used for another purpose? Or would it be in the normal expectation of a data subject that actually their data would be reused? And that's a very fuzzy area but it's something that's important to understand thinking about in terms of your data flow. Is the data situation sensitive? So is the topic of the data sensitive? Is it about a particularly stigmatising disease, for example? Is the population that the data is about a vulnerable population? Perhaps it's a data about children? And are there any sensitive variables? Again, either legally or in terms of what's generally understood to be sensitive. Okay, now we move on to the technical disclosure control part of the framework. Identify the processes you will need to use to assess disclosure risk. Now, there's a separate talk on that on disclosure risk assessment and control. Here we'll just go through very briefly the main points. The first of these is scenario analysis. Now, this is very important. You are answering the question which you set yourself earlier on which was how might a privacy breach occur? Until you know that, you can't possibly go about measuring the risk. So it's not an abstract notion of risk assessment but a very located one. This is the thing that I'm imagining happening and this is how it might happen and then you can measure the risk of that particular event. Now usually it's some form of re-identification but what resources is this imaginary person who's going to do this re-identification going to be using and that's where we start thinking about the data environment. So these data are going to be in this environment so a potential aversary will have access to these resources in order to do the re-identification and you can think about the mapping between those and that's the function of scenario analysis. So this disclosure risk assessment is a formal process of measuring the risk once you've defined the sets of key variables and again I'll talk about that in more detail in the particular talk on disclosure risk assessment. Penetration tests are a simulation of an actual attack so they're here we say to an individual okay here's the data set you see if you can find somebody in there and there are processes formalising how you go about doing those penetration tests either in-house or indeed as a crowdsourced hacking challenge. Comparative data situation analysis takes the idea that you consider your own situation at the moment as to be safe and secure and therefore operate it as a gold standard. So if you're going to do a one-to-one share with another organisation this can be particularly relevant because you can think about your data as you currently have it in your data environment and then what's the comparative risk in a different data environment and if that is less than the risk in your current data environment it's probably sufficient to say that it is sufficiently safe. Final point here is just consider using a thermostat approach to this and this is a really good strategy if you're thinking about releasing open data go for a really cautious level of risk to start with release that and hopefully nothing will happen and the environment will become used to your data being existing the population will become used to there will be enquiries made about it you'll get an understanding about demand for different types of data so you can enrich your understanding of the use case and then you can go for a slightly more liberal approach just tweaking up the thermostat a little and this technique has been used for example by the German statistical agency determining the data that will be released into their research data centres Once you've done your risk analysis then you know where it is you've got to bring the risk down to that negligible level and what controls you're going to apply and essentially there's two types of controls you can restrict access in some way or other who, how, what, where and to do what or you can place controls on the data so you might for example only release a sample of the data rather than the full data set you might decide to aggregate or suppress variables or you might perturb by adding some noise to the data in some way I'm not going to go into the details of those and again just there will be some more on that in the next talk Output disclosure control can also be applied so if you're allowing access in a restricted environment there's still a question of what people take out of those environment we don't just do an analysis for the fun of it we usually want to publish we usually want to use our outputs for other purposes so what outputs are you going to let out if your means of controlling risk is to restrict access so that only a particular environment is used such as a data centre OK now we're moving on to the impact side of the anonymisation decision making framework the first point is identifying who your stakeholders are and plan how you will communicate with them who needs to know about your share who needs to know about the release of data is the data subject, the wider public, the users these are questions you need to address once you've identified then what is the engagement you're going to carry out will they be involved in the design of the data and that might be particularly relevant to users but it could involve data subjects as well is there going to be a consultation and for some way you might be changing policy around particular use of particular data a consultation with the wider public might be really important in terms of thinking how that's going to play out when it gets publicised and not by you and transparency might well be important so being transparent, your own public announcements about the data, about the data processes who you're sharing with and why are all important in terms of building understanding about why you're doing what you're doing and this will reduce the impact of a breach should it happen what do they need to know? well as about the data, are you going to publish details of your anonymisation process there's some advantage to that it might reassure people about the security however, if you do publish details about some anonymisation processes this actually increases the risk of a breach plan what happens next once you've shared and released the data you should be monitoring use and you should be considering continuing to consider risk risk will change over time some elements of time will make the risk go down the data themselves will be getting older but actually some elements will increase the risk new data are entering the data environment new technologies are available to do linkage and so on and you need to constantly be looking at that also another issue is why you are not generally just considering releasing a single data set this will be part of a series of releases as new data comes in on the same subject so actually you're setting up a president and a policy which you'll then have to shift back from if you now consider the risk to be too high and that itself will need to be managed finally you need to plan what you're going to do if things do in fact go wrong now we remember we have accepted that we're not working at zero risk therefore by definition there is a residual risk therefore there is a possibility that there will be a breach and therefore you need to plan for that as part of your anonymisation strategy avoid a catastrophisation these aren't the same sort of problems as a nuclear power station blowing up and it's very easy when thinking about data privacy to cast them in that light disclosure event mapping is a key thing what is it that's happened and if you've done your breach scenarios well you should have a map for that already what is it that's going to happen and what's going to happen next so it's not simply that there's an event and that's the end of it well actually something will happen next there'll be a player in the media there'll be your own communications whether you talk to an adversary if it's an actual attack that's led to the breach my idea is to be active and be planned ok to conclude the anonymisation decision making framework is a tool which allows you to think constructively about your data situation it moves as closer to a harmonised idea of an anonymisation which ties together the technical and the legal aspects of that process now we have an open access book forthcoming and if you look on www.uknon.net you will see more information about the likely release date thank you