 Okay, so I'm going to talk to you about anonymisation theory and practice and this comes out of a whole set of different work both applied work in the field, working with various organisations on their own particular data situations and theoretical, statistical and technical work dating back to 1996, both here at Manchester and at other institutions. Okay, what am I going to talk about? So, we're going to divide the talking to two parts, I prefer the title, first we're going to talk about what anonymisation is and then we're going to talk about a practical approach to anonymisation using the anonymisation decision making framework. Now, the theory was developed in the most recent piece of work, starting in 2012, springing out of the UK anonymisation network, and what we did was we brought together a group of people, about 30 people, had a series of workshops over 18 months and these were people with various expertise that was relevant to the question of what is anonymisation. Turns out to be quite a difficult question to answer. And over this period of that 18 months, we built up an understanding from the different perspectives from different sectors, all sectors of the UK economy and different disciplines, both legal, computer information technology based disciplines and statistical mathematical data analytical disciplines. And with all of those inputs, we came to a consensus answer and that answer forms the guidance that's in the anonymisation decision making framework. I'm just going to talk now briefly about three different concepts, privacy confidentiality and anonymisation. Now these often get used interchangeably and used to in a confused way. So let's talk first about privacy. Privacy is primarily about people. It's not primarily about information. Privacy concerns a whole range of things outside the information sphere. Privacy is about personal control of your own spaces, access to your resources, access to your property, access to your person. It also does concern access to your information. So control and personal autonomy are very important in understanding what privacy is. It's a complex term. Confidentiality concerns information and data. And that's the key distinction between people and information. And confidentiality is about keeping data in a particular state regarding assurances that it will or won't be shared in a particular way. Anonymisation and is a technical process for keeping a confidentiality assurances. So we anonymise data in order to protect confidentiality. Now we may as a consequence of that have some benefits to privacy. But these are consequential. They are not part of the primary process, which is a protection of confidentiality. Now I want to talk about confidentiality risk, because again it's something that is often talked about in a confused way. The only mature model of risk considers two key components, which is the likelihood of an event happening and the impact of that event will have on individuals or society in general or whatever the environment is that's in consideration. So likelihood is often the sole focus. Impact with confidentiality risk is often ignored and this itself presents a problem. When considering likelihood, another form of confusion arises. The distinction between could and will. So often when asking questions about an anonymisation situation and confidentiality risk, organisations find themselves trying to tackle could questions. And the problem with could questions is that they are almost impossible to solve. So an example is when I leave the webinar studio and go back to my office and then I leave the building and go home, what is my risk of being hit by a meteor? Could that happen? Yes, it could happen. There is a very, very small possibility that I'll be hit by a meteor and die a terrible death. However, is that likely to happen? Well, no, it's not likely to happen. And crucially, it's so unlikely to happen that I'm not even going to think about it on the way home and we make no decisions whatsoever on the basis of my likelihood of being hit by a meteor. So it's a risk that I'm going to ignore. And that's very important. We'll come back to that later, that concept of ignoring risk. So there is an important way of understanding likelihood. Now, impact, on the other hand, is something that tends to get ignored. And there is a tendency to treat confidentiality risks, the impact of them as apocalyptic. So we think about some breach of confidentiality as if it was a nuclear power station blowing up. That's clearly nonsense. So a mature understanding of risk will actually think through, OK, what are the impacts? What are the costs? And therefore, what risk in terms of likelihood does that warrant in terms of protection? And given the benefits that are coming from the sharing of the data for taking all those things into account in the rounded form. Part of this is thinking about how that risk is constructed. So a very helpful equation dating back now to 1991 is this one from Marsha Tao. So the probability of identification is the probability that an attempt has occurred to re-identify somebody multiplied by the probability of an identification has occurred given the attempt. Now, one of the things that happens is there's a lot of technical approaches which focus on the last term in that equation, the probability of an identification given an attempt. But we can have a significant impact on the overall probability by manipulating the probability of an attempt. And this is often not taken into consideration. And certainly within some privacy models such as differential privacy, that is just completely ignored. It's treated as if it's one. And there are some circumstances where that's clearly wrong. So one needs to think about things in the round. And the realization decision making framework is about tending to the whole of this, the whole of the equation and indeed the whole of the notions of likelihood and impact. So in our model we consider a much bigger system of consequences and precursors. So the precursors are the things that lead up to that make the events a possibility that enable event to happen or reduce the risk of an event happening. And then you have some concept of what the event is. We'll talk about later. It's important to be much more precise about that event than there's a vulnerability in the data, which data focused approaches tend to have. And then what are the consequences? What happens next? So there's been a breach. What happens next and thinking about how that plays out. But what is anonymization in this context? Anonymization is a process. The word's involved because it's important to notice that it's not a state of the data. We don't talk about anonymization in terms of an end state. We talk about processes by which personal data are rendered non personal. And this was a key understanding that was arose from the workshops that I talked about earlier on. The legal definition of personal data is tied into a technical process which we're calling anonymization. We convert personal data to non personal. It's very important. Now we need to be very careful when using success terms and the success term in this case is anonymized. Now why is that problem? Well, because it's denotes an end state. We're there. We've reached the goal. And actually the situation is a lot more fluid than that. We're living in an environment where data resources are changing constantly. The data environment, as we call it, and that's a term I'll bring out in more detail shortly, is growing. It's changing. New data forms are being created. New data is being created every second. So therefore considering that one might have reached an end state in an anonymization process is slightly ill-conceived. Now we do have to use the word anonymized in certain circumstances because partly because there's constraints of the English language make it quite difficult to talk about data without using the term from time to time. When we use it, we use it in a very specific way. We use it in the same sense as the word reinforced is used in reinforced concrete. So we don't expect a building that is made from reinforced concrete to withstand a meteor attack. We might expect, though, it would be reasonable to expect that a building made of anonymized concrete would stand the rigors of its expected use. So reinforced as in reinforced concrete. Now the problem with anonymized is that everybody grapples with that. And the meaning of the question is how do I decide when my data is anonymized? And this leads to some unfortunate adjectives, one of which is truly anonymized. And because they want to say something more than people normally say when they use the word anonymized. And of course this is meaningless because truly anonymized is just anonymized. And we get some sort of inflation of this going on without any meaning. And the other day I heard this term, unfortunately, really truly anonymized and that's clearly right. So avoid the use of the word anonymized if you can. And if you do be careful about how you're using it, use it to me. I've reinforced these data against privacy attacks. Okay, I want to distinguish between two terms, de-identification and anonymization. Now just to say, as an aside for this, that a complication here is that in different jurisdictions the terms are used differently. So in Australia, Canada, and the US, the term de-identification is used to mean what we mean by anonymization here. And de-identification here, we mean the removal or obscuring of direct identifiers. And it tackles one part of the normal definition of personal data as containing the Data Protection Act and indeed the Data Protection Regulation, directly from those data. It's where you can look at an item of data and go, oh, there is that person without any external information being required to do that. So anonymization on the other hand tackles indirectly from those data and other information. So when you bring other information to bear in order to make a re-identification, and obviously that's the much more complex problem to solve. So here's the tenets of our approach. Anonymization is not about the data. Now that may seem slightly surprising, but it's actually quite critical to understanding an appropriate and what we call functional way of delivering anonymization. Anonymization is instead about data situations and they arise from data interacting with data environments. Now that's a term I introduced earlier and I'm going to explain what I mean by that. So here's the formal definition that we produced in an article in 2014. It's a set of formal or informal structures, processes, mechanisms and agents that either and there are various things that can happen. They act on the data or they provide context for those data or they define, control and interact with those data in some way. Now this boils down to four things. Agents, which are usually people, but in the days of AI that could be computer systems. And security infrastructure, governance and critically other data. So those four elements make up any data environment. You can describe a data environment using those four elements. And the interaction between those four elements and the data that you're specifically concerned about is what determines the risk. The risk is not inherent in the data itself. Now anonymization as a term or its alternative de-identification in those other jurisdictions gets used in four different ways. The first of those ways is the concept of absolute anonymization. This is beloved by some branches of computer science. What it means is that there is zero possibility of re-identification under any circumstances. Now this is a straw person. Let's kick this into touch straight away. Essentially re-identification is about using information in the data. Information in the data is exactly what you need in order to carry out good and proper purposes analysis and so on using the data. So the exact thing that gives you the data value is exact thing which allows re-identification to happen. They go hand in hand. So you cannot have zero risk of re-identification under any circumstances unless you have even no data or completely trashed data. So absolute anonymization is not a practical possibility. Formal anonymization is the other side of the coin and it's de-identification as we defined it earlier. That's a simple dealing with the direct identifiers. This includes the concept of pseudonymization which is the replacement of the direct identifiers with a pseudonym. So typically here we're talking about names and addresses, ID numbers and so forth. Things that directly identify people. Now this is an important thing to do if you're engaged in the process of anonymization but it's very, very rarely is it sufficient. Now statistical anonymization is a more sophisticated set of ideas and it's primarily derived from the research field of statistical disclosure control. And it's about a set of technical process applied to data in order to reduce the risk of disclosure which I shall talk about a little bit more detail in a moment. Part of disclosure is identification. Now the problem with statistical anonymization and SDC is that it tends to be overly focused on the properties of the data themselves and does not tend to consider the environment in which those data consist. Functional anonymization on the other hand essentially states that you cannot determine whether the data are anonymized or not without just by looking the data alone. You need to consider the relationship of the data to their environment. Let's go back to that notion of disclosure. Unintended disclosure is disclosure of personal information unintended by the data owner consists of two possible processes. Identification and attribution. Now the key thing here is that attribution is the act of disclosure. So identification is the association of some data with a individual population unit which is normally a person for the purposes of this discussion. Attribution is finding something out about that person and that total system creates a disclosure. Now this is a scheme for just visualizing how identification happens. So on the bottom row, which is labeled target file on this diagram, we have a de-identify data set. It consists of a set of demographic and other variables, but there are no identifiers on the data. So we don't have name and address. At the top here we have a thing called the identification file. Imagining here a scenario where we have an intruder. Now there are other ways in which breaches can occur, but we'll stick with the idea that we have a malicious intruder. Now this malicious intruder has this identification file. Now this could be an actual database or it could be a set of information that's just in the intruder's head. Now the intruder knows the name and address of a particular person and they also know some other stuff about that person. And that other stuff overlaps with the information that's on the de-identified data set. In those purple boxes they form what are called key variables. So they're variables which are in common between the target and identification files and can be used to link the two together. Now there are all sorts of complex processes. Anybody who's ever been involved in data linkage will know that is a complex process. We're not going to go into the details of that here. But there are mathematical systems for working out the risk of an identification and a match actually being a true link. And the link here is the link between the name and address which we determined has been sufficient to identify the person. And the target or sensitive variables over to the right here in blue which the intruder, the attacker did not know about before. So that's the disclosure. So identification, the identification process here. Now there are various things which affect and we're going to talk about this very briefly but again it's a very complex and detailed field. The power of individual variables. So whether they differentiate the population. So you notice there was data birth on the previous slide. That has a greater power than age because age is captured as a set of bins and a much larger group of population in each of those bins than particular dates of birth. The skewer variables also important so that creates small categories, for example, ethnic group, we have minority ethnic groups, there are much smaller subpopulation and therefore greater risk of re-identification. And so there are a number of things to take into account. The quality of the data is also important. So lower quality data is actually at lower risk of re-identification. Because the data by definition might be wrong and therefore the intruder will make errors in carrying out matches. And then of course the availability of that data in the environment is actually quite important. So if you're collecting information about something where generally that information is widely known, so demographics are the obvious example, then that's an available key variable. Now we're going to go on to a problem now. This is to demonstrate the notion of attribution and this is a table of counts. Hopefully this is something that is familiar to most of the audience. And in this table of counts, we have professors and pop stars and we have various income bands. So the question which we're going to have a poll about from the moment is what is the problem with this table from a disclosure risk point of view? Where's the risk in this table? So we're now going to present the poll. Okay, so you will see that there are four possible answers. Don't answer it yet because we'll show you the table again. You can have the small count in the bottom right hand corner, which is the five. The fact that income is a sensitive variable or you could have the zero in the top left hand corner or you can answer don't know. So I'm just going to put the slide back up again. So we're going to close the polls now to have a look for about 10 seconds and decide what you think your answer is and then we'll put the poll back up. Okay, so if you can now answer the poll. Oh, oh, we can't cancel the poll and start it. Okay, my apologies. Apparently we there's a technology problem which didn't technology didn't work how we were expecting it to. So only a few people got to answer the poll. Anyway, and now most of you have answered, most of you who answered the quick answers were all towards the low end, the low counts of five in the bottom right hand corner, and that is very common. This is where people tend to focus. Now I'm going to run a little scenario here and explain why that's not the answer. Okay, so we're just putting the table up again. Now, imagine I'm at a cocktail party of professors and pop stars it's a real rave. And I go around that party and imagine that I also have this information about all of the people the 305 people at the cocktail party and I have this information you won't worry about how I've got this information but now it's this percent I have the table. Now I go around that cocktail party and I hear somebody say at the lecture I gave last week. Now, what have I mean to learn about them from that partial information. I've immediately learned that they're not highly paid. So I've learned something about their income, even though they've not chosen to disclose that information to me. And similarly, if I hear somebody boasting about the 10 million pound income they had last year, then I've identified them as a pop star. So the problem in this table is created by the zero. That's the immediate thing you you you're drawn to because there's nobody there so how can anybody be at risk but the zero creates the problem for everybody else in the table. Now, as it happens, we'll see in a moment there is a problem with the five. It's less direct. But we're going to we're going to imagine a gatecrusher is going to arrive at the party. Here he is. And I have to say he's always gatecrushing my parties. And so that now leads us to this situation here. Now we're not going to make a mistake of putting the poll up again because I'm going to give you another poll now. Look, and the question now is, have we now solved our problem? Have we solved the problem with the table because we now don't have any zeros in it. Okay. Okay. And the vast majority of you have identified that we haven't solved the problem. And so probably probably you're thinking, oh, this is a trick question or maybe or maybe you've identified what I've identified. The thing is, I know who this person is in the top left hand corner. It's Brian Cox. So I can just say, oh, there's Brian over there. So I can just do the next thing, which is take Brian out of the table and hit and hit and we get back to the original table. Okay. So we now have a zero again and I can still make those inferences. So this is called a subtraction attack. And this is the reason why small scale counts are a problem because the smaller the number in the same or the more likely it is, I know all of the individuals that are in that particular cell. And therefore I'm going to able to knock holes in the table and then start making inferences about people I have partial information about already. So that's why there is a lot of discussion about small cell counts, not because of the identification problem, because there isn't any particular difference between identifying somebody in this cell, the low pop star cell, then in the low professor cell. Because in order to say, ah, that person is in that cell, I have to already know all of the information in the table to place them in that cell. So it doesn't actually do anything to the person you're identifying. So I already knew that they were a lowly paid pop star. And I haven't learned any new information from that identification. So the process of identification doesn't cause a problem for the person identified. It might well cause a problem for other people. Okay, so that's the complication. Okay, so I'm now going to move on having to give you two simple examples of the sorts of things that are thought about in the disclosure risk context to talk about the anonymization decision making framework. And it's a much broader set of considerations than disclosure control. So what is the ADMF, anonymization decision making framework. It's a system for developing an organization policy within an organization. It's a practical tool for understanding particular data situations. And critically, it's not a checklist. It's not a system you can employ and just go through ticking off boxes and hate presto. You have an anonymized data set. It's a live active system. There are feedback loops, things that you do at stage five or go and affect what happens at stage one. What it is critically is a system for thinking about data situations to enable you to effectively anonymize. Okay, so what's your real responsibility? And this is another critical output from the UK anonymization network workshops. And this understanding, this definition has been agreed between ourselves and the Information Commissioners Office. What do you want to do? You've got to understand how a privacy breach might occur. How a breach of privacy might occur. And that usually is a pre, there's a precursor to that, which is a breach of confidentiality. Understand the possible consequences of that breach. Reduce the risk of a breach occurring to a negligible level. Now, what do we mean by negligible? Now, you remembered earlier and I talked about meteals. Now, the idea of that, that analogy is that I ignore that risk. I ignore the risk that I'm going to be hit by a meteor. So what does that mean? That means that's a negligible risk for me. So here's what we mean by negligible. This is a risk that a reasonable person will ignore. I'm hoping you'll accept that I'm a reasonable person here. So a risk that a reasonable person would ignore. And it's a framing concept for thinking about risk. Now, the really good news here is that it is relatively easy to get your risk down from moderate to high to negligible compared to the issue of getting it from negligible to zero. So negligible is a much less demanding standard than zero. Now, there's a 10 step process for delivering this. I'm not going to go through this slide because each of these points I'm going to cover on subsequent slides, but it's just to bring that all together in one set of slides, only a single slide. It's a 10 step process. First of those steps is describe your data situation. Is it static or dynamic? Now, by static, we mean that the data was sitting in a particular place. And we really want to just do a risk assessment of that situation. By dynamic, we mean that the data is moving. Maybe it's been shared between organizations. Maybe it's been released to a certain set of users. Maybe it's been published on the internet. That's a dynamic data situation. The second point here is then you have to define your environments. What are the environments? So what's the existing environment? What environment are you moving it into? And then, how does your data relate to that environment? That's the data situation. Now, here's an example of a dynamic data situation. So this is actually quite common. So organization X is collecting data and passing that data to an environment two, which in this case is a local authority, but it could be another organization. Now, you notice here that we've used the term anonymized subset. And this is to indicate that the organization X has used a set of processes before passing their original data over to environment two, the LA, which have sufficiently reduced the risk of re-identification, either in transit or within the environment two, so that it's negligible. So we have reinforced the data. So LA uses that anonymized subset under a disclosure agreement, a sharing agreement, sorry, and then releases a set of aggregate tables into the open data environment. So publishes a set of aggregate tables. Now there'll be again a risk analysis process on these aggregate tables. You remember I mentioned earlier on about the problem of subtraction attacks and there is a kind of standard developed over the last 10 years where cell counts below 10 really need to be justified. So the kind of standard expectations is there'll be no cell counts less than 10. And so for aggregate tables of counts, that is where you should probably go to in your first consideration of what those tables of counts should look like. Now for different data, the numbers might be different, but that's something to bear in mind. Next thing you need to do is understand your legal responsibilities. What legislation are relevant and how does it interact and how does individual pieces of legislation interact with regulation? So a recent example which I'm sort of grappling with at the moment is the regulatory responsibility of pharmaceutical companies to publish information via the European Medicines Agency concerning clinical trials. That's a transparency regulation. But against that there are privacy demands arising from the JNR Data Protection Regulation. Now how do those two pieces of legislation interact? And we have similar examples in the UK legislation and indeed elsewhere. Also you need to think about what happened to the data before it reached you. What consent processes were involved? Who has some stake in the data in terms of upstream involvement in the data processing? Know your data. Now one of the things you need to think about is that upstream consideration of where the data come from, how are they collected, who are the data controllers, who are the data subjects. And that can sometimes be complex. This is the example of GP records which are personal data with respect to both GPs and indeed their patients. Complex issues there. Then you think about what's called a basic data specification. So is the data about people and is the base data personal data? Now these two things are not the same. They often are related but there are instances where one might not be the other. I'm not going to go into the details of that. There are some examples in the book which we'll give you a reference to at the end. The type of data, whether there's mixtures of types of data which all create specific problems. Are the individual data aggregates or again mixtures? And then we go into the detailed data specification. What variables are involved? Are these standard key variables? Now in the book, in one of the appendices, you'll see there is a set of standard key variables which are used to represent different types of scenarios whereby a breach might occur. And these are often the first port of call for thinking about your data situation in terms of these types of considerations. Is there any sensitivity, vulnerable populations? What's the quality of the data? Does it have any special features? Is it time-linked, hierarchical? Are there multiple sources? And all of these were impact on the risk. Now understanding the use case, it's really important to take this into account when you're doing your anonymization process. What will the data be used for? Sometimes it's easy to define that. There's a specific use case in mind. But in other cases, the notion is more general. This is typical, for example, with open data. What information data is actually needed for that use? Now users will always tell you that they need all the data. If you ask a user, they'll say we need more data. What do they actually need to deliver the outcome that they're after? And that is what they should be given, the minimum amount. There's a principle of data minimization, which applies regardless of identification risk. Always give the minimum data that will deliver the utility required. One issue that often comes up is, are all the data needed or will a sample suffice? And that's quite important because sampling is a really good way of controlling risk. It creates uncertainty as whether any individual population unit is actually in the dataset and should always be considered in any data share. Who will hold the shared data? How will access to it be controlled? These are the governance concerns that are actually quite important downstream that well-formed governance is created around the data share. Then think about ethical obligations. And obviously these map onto legal considerations and data collection processes. And that's what I say about all of these components interrelate with one another. So it's a system of thinking, not a checklist. What is the relationship between the data subjects and the data? And in some senses that's the fundamental ethical question. And that breaks down in a variety of different ways. What are the loci of consent? Who is consented to use the data? And who should have consented to the use of the data under what circumstances? And what are they consented to? And was this actually genuinely informed consent? Related to that is what is the intended or expected use of the data by what might be called reasonable data subject. What would a reasonable data subject actually expect you to do with their data? And that might be something beyond what they have actually consented to. This is tied up with the notion of a contextual integrity that's employed by Helen Nausenbach. And in writing to that, how are the data subjects aware of the data and the intended use? So that again relates to that relationship between the data and the data subjects. And there is plenty of data which is collected without even the data subjects knowledge. CCTV images for example, being one example. Then we go into the technical processes. Now the thing I'm going to draw your attention to here is the first bullet point, scenario analysis. Don't go straight to the data. That's a mistake that's often made. Think first about what is it that we're concerned about happening specifically? Who is going to be doing something to these data that we don't want them to do? How will they do that? What resources would they need to do that? What's the motivation for doing that? What's the goal they're going to achieve? Now even in the book we describe the system for doing scenario analysis. It's a 12 point system. And it helps you develop that framework. And what we get as an output from a scenario analysis is a set of key variables which you can then do to produce a structured risk assessment. If you're doing something completely new, something that somebody hasn't done before, so you've got no background or precedent to think about the risk against, then you may need to carry out a penetration test. So that's an actual simulated attack on the data. And so an example that I'll give here is the work we did with the Office of National Statistics where they were intending to, or considering, apologies, considering releasing a whole load of data, about 200 data sets that were previously available under end user license and the open government license. So we carried out a penetration test thinking as if the new data situation was open data. And I won't go into the details now. There's a paper on this in the Journal of Fisher Statistics which essentially describes the process and the legal analysis that we did, which essentially showed that actually the risk was too high and therefore ONS made the decision not to carry out that transformation. You can also do a thing called comparative data situation analysis. And this is very useful for sharing data. So if you have evaluated your home situation, where the data is sitting now as being safe, safe enough, you're happy with that, then eventually you're setting a skull standard of which any other data situation can be compared to. And that's actually a much simpler exercise than carrying out insight risk analyses. You can compare the components of the data environment with your own data environment on a one-to-one basis and it becomes a lot easier then to think about where the additional risk factors are if there are any. Identify next what the disclosure control processes are that are relevant. And you may want to be restricting access. Now these are various forms of controlling the environment rather than controlling the data. You may want to further restrict who has access, how they access the data, where they do that access and what they're allowed to do with it. You may want to employ controls on the data as well. I mentioned sampling earlier on. You can also aggregate data. You can suppress some data. You can perturb the data in various ways, adding noise. Now that can be inputs, but if there's an output process you may also want to apply controls to the output of those data before, for example, they're published as per that situation we were describing earlier. Finally there's a set of things that we also think it's important to consider to do with the impact. First thing is identify who the stakeholders are and think about how you'll communicate with them at what stages. Who needs to know about the share? Do we need to inform the data subjects? The wider public, is there an engagement process that's usually done here? Do we need to talk to the users? Now actually talking to the users is really important if that helps you define the use cases in terms of component three earlier on. But you might want to engage in a public relations exercise. You might want to engage in a public engagement exercise where you actually ask for the public's input. And this, for example, is very important in setting up the administrative data research network where there was an extensive series of focus groups carried out across the country. And that both defined the operating procedures for the administrative data research network but also gave us a firm method of going forward knowing that with sample public opinion and come to the view that it was generally in favour of what we were trying to do. Also, you need to work out what is it that people need to know. IUQ, for example, can actually publish details of your anonymisation process. It's not a straightforward decision. If an intruder knows what an anonymisation process you've conducted, they may be able to use that to increase the risk of them successfully re-identifying something. Now what happens next? The key thing here is it's not another release and forget system. You need to continue to monitor risk. Monitor risk because the environment will change. And so you should be constantly reviewing the risk associated with any decision about data. Finally, you need to plan what to do if things go wrong. Now remember we're only putting the level of risk as negligible. That's our acceptable level of risk and that has agreement with our local information commissioner. So here we're on solid ground, but if you're not working in zero risk then that means that the event that you're hoping to prevent could happen. And therefore you need a breach policy. You need to avoid the lure of catastrophisation but think about what happens next. So what will you do if there's a breach? Will you communicate with the intruder? Will you communicate with the media? And so on. This is very important. How will you control the breach? How will you react to it? How will you deal with it? Be active, be planned. Okay, so including remarks. So the ADF is a tool which allows you to think constructively about your date situation. It's moved much closer to a harmonised idea of anonymisation. It's gaining traction across the world. We have consultations going on with a number of countries and only last week the Australian edition of the anonymisation decision-making framework was published. It's called de-identification in the decision-making framework. At the point of my comment earlier on about different terms being used in different countries. But essentially it's based on the same set of concepts. Now I mentioned that an open, I say a consortium should say open access book is available from our website. So if you've not downloaded the book yet you can go to our website. It's free to download. And it gives you a lot more detail on all the things that I've talked about today. Okay, thank you very much. And now we'll open up to questions via the comment system.