 Hi, my name is Mark Elliott. I'm from Manchester University and part of the National Centre for Research Methods and I lead the UK anonymisation network and today I'm going to talk to you about some basic concepts which make up the idea of anonymisation. Okay, so what is anonymisation? Anonymisation is a process by which personal data are rendered non-personal. It's a very precise definition. It ties it into the legal notion of personal data. So we need to know what personal data is. There's a definition of personal data within the Data Protection Act. So it's data that will relate to a living individual who can be identified either from those data or from those data and other information. Now this definition is quite precise. There is some variance between the definition in different pieces of legislation and across different jurisdictions, but the notion of identifiability remains central and so that's the notion we're going to focus on when we're thinking about anonymisation. Now I'm going to talk briefly about some terms and the first pair of terms I want to talk about is anonymisation and de-identification, both of which you may have heard of. Now they do it with different parts of the Data Protection Act's definition of personal data. De-identification tackles directly from those data. It's about preventing somebody being able to recognise somebody directly from the data. So for example from their name and address. Anonymisation on the other hand tackles the other part of the definition Indirectly from those data and other information. So it's a more complex idea and anonymisation is a deeper concept. De-identification is actually basically quite simple. You simply remove or obscure in some way those direct identifiers. With anonymisation the issue is being able to decide whether a set of indirect identifiers are sufficient to enable somebody to be identified. Okay so four uses of the term anonymisation that you may have heard of. The first is absolute anonymisation which essentially means that there's zero risk of re-identification under any circumstances. That isn't a particularly useful use of the term because in order to reach this state of zero risk of re-identification you effectively have data which is itself of no use. It's very limited valid value. Form an anonymisation is the other side of the coin. It's just de-identification. It's a stripping away of those direct identifiers or possibly replacing them by pseudonyms. And this is not sufficient because there remains the possibility that within the data there are indirect identifiers which enable somebody who wishes to re-identify. Statistical anonymisation attempts to measure the risk of an identification happening and to control that risk. So here we're in this middle ground now between these two extremes of absolute and formal anonymisation. And here we're allowing the possibility that a re-identification could occur and we're measuring the risk of it. We're not insisting that our data are absolutely anonymised but we're trying to do something in terms of reducing the risk. Now the disadvantage of statistical anonymisation is it tends to be very focused on the properties of the data. The fourth category, functional anonymisation, acknowledges the value of the statistical approach but also takes into account the environment in which the data exists. And we're now going to go and talk in more detail about this more holistic approach to anonymisation. Okay, here's some principles. The assertion is that anonymisation is not primarily about the data. Anonymisation is about what we call data situations. Now data situations arise from data interacting with data environments. So now we're introducing a new term, the data environment. So here's a formal definition of the term data environment. So it's a set of formal and informal structures, processes and mechanisms and agents that either act on data, provide interpretable context for data or define, control and interact with those data. What does this mean in practice? The data environments consist of agents, normally people, infrastructure, particularly security infrastructure, governance processes and most particularly other data. Data environments tend to be layered so they can have data environments within data environments and partition and the security infrastructure that exists for example on a server partitions the data on that server from data in the outside world. And we can think about that server existing inside that bigger data environment which is the global data environment which contains all the possible data in the world. So data environments are complex items. In order to understand whether data is sufficiently anonymised for meeting your legal requirements and other ethical constraints you might be under, you need to understand that it's about the relationship between the data that you have and the data environment that you're considering putting that data in. So essentially you cannot decide whether data are safe to share or release or not by looking at the data alone. You have to consider the whole of the data situation. So anonymisation is a process and the goal of which is to produce safe data. Now you could get overly focused on that. It only makes sense to even be doing this if what you're trying to do is to produce useful data and that was the point that I made right at the beginning of those four definitions about the problem with the absolute anonymisation definition. We're trying to produce useful data so what we will be doing is balancing that notion of utility of useful data with the idea of trying to produce safe data. Now zero risk is not a realistic possibility if you're going to produce useful data. So that was my criticism of the absolute anonymisation definition. The measures that you put in place to manage the risk should be proportionate to that risk and its likely impact. So you're going to do something to the data, either to its environment in terms of where it's placed and how you're going to keep it and what governance you have around it or you're going to manipulate the data itself in order to reduce that risk, the risk of re-identification. So here we've defined the basic concepts that you need to understand anonymisation. On the next video, the Anonymisation Decision Making Framework, we describe a practical approach for carrying that out. Thank you.