 I'm going to be talking about amplification and summation in the shuffle model differential privacy. So I shall start by introducing differential privacy and the shuffle model before talking about exactly what we've done in this work. Okay, so in differential privacy we have a database which has people's data in it and we're going to perform some analysis. And the property that we want is that the output shouldn't tell you too much about any one person in the database. So in order to formalise this we imagine we have some other database which instead of having me in it has Elmo in it. And we don't want the adversary to be able to tell a different or to tell whether it was me or Elmo in the dataset from looking at the output. So the adversary shouldn't be able to distinguish these two outputs. Now of course if this was statistically secure with some negligible statistical distance then by some triangle inequality argument you can see that even if every entry in the database was different you wouldn't be able to tell any difference in the outputs. Unless the output wouldn't be telling you anything about the data so that would be useless. So we're going to have to relax our notion of what it means to be able to distinguish two datasets. And this definition is the definition of differential privacy. So what's this saying? We've got x and x prime to just two databases that differ in one place. Then we're talking about the probability that any event happens on the output of the analysis. You should think of this delta on the end as being some cryptographically small parameter like the statistical distance. So if you think of that as being zero for a second the condition become just saying that the likelihood ratio between the case when you're looking at database x and when you're looking at database x primed is bounded by e to the epsilon. So this somehow says that the adversary can't learn too much about any one person or learn anything with too much confidence. Okay so how are we going to apply this definition? Well originally the model that differential privacy was developed for is the trusted curator model where we have some trusted third party. Everyone gives all of the data to this trusted third party. The trusted third party computes some statistics on that data with some randomness added to them and then releases them to an analyser who is also our adversary. And we want the analyser to then be able to say something useful about this analysis of the data is going to be useful but without being able to tell too much about any one person because M is differentially private. Then there's the local model differential privacy so here we avoid having to trust the third party we just say everyone is going to locally randomise their own data. So each person applies a local randomiser R to their data and the resulting message they send to the analyser is already differentially private. Now clearly this means that you can't tell the analyser too much about your data point. There's a limit to how much data you can get across to the analyser, how much information. But across a large number of users the analyser might still be able to say something useful about the entirety of the population as a whole. So there's a trade off here obviously we don't require as much trust if we're in the local model but we also can't get as much utility. And indeed it's a classical result now that in the curator model you can get order one guarantee when doing summation of real numbers in zero one. But in the local model you can't hope for better than the square root n error. Roughly speaking this is because each person has to add order one noise to their input and then when you add them all up you're going to get variance n and the standard error square root n. So the shuffle model is an intermediate model between these two and the idea here is that we're going to have a trusted third party but we're only going to trust it to shuffle the inputs. So it's going to take in all the inputs, apply a random permutation and then show the result of that random permutation to the analyser. How we implement this thing we don't care about we're agnostic to in this paper but as some suggestions you could use MPC or a third party or you could use a mix net or something. If you have anonymous channels this kind of automatically gives you a shuffler. So what are actual contributions in this paper so far everything I've talked about has been other people's work. Well here we look at the question of real summation of numbers in zero one and we show that in the one message shuffle model you can do better than you can in the local model. But you can't do as well as you can in the curator model and we also managed to prove a new amplification by shuffling result. This isn't the first time that someone's proved an amplification by shuffling result but we are able to improve at least in certain aspects on previous results. So real empathy summation all I mean is each user has some number in zero one and we want to estimate the sum of them. Previous work by Chirotal. There is a slight improvement here in the local model but it's not in the asymptotex sorry in the one message model but it's not present in the asymptotex. They are able to show however that if you have square root n one bit messages or each person is able to put square root n messages into the shuffler and they all get shuffled together then you can achieve the error of the curator model. So we are looking at what happens in the case where you can only put one message in and in that case we are able to show that with just one message of log n bits you can achieve standard error of n to the sixth. So this is variance n to the third but you can't do better than that this is the best you can hope for. So I'm going to prove the upper bound I'm not going to prove the lower bound because I won't have time but I'd like to explain how these protocols work how you can get this accuracy in the one message shuffle model. So this is our local randomizer it's pretty simple there are only really two things going on here you take your input we're going to put it into fixed point precision so this is randomised rounding to a number in to one of the some number at k precision. And then after that we're going to do kery randomised response which is to say that each person will flipper a biased coin and with probability gamma they will return a uniformly random answer independently of their input. When with probability one minus gamma they will just tell the truth. And at the end the analyzer is going to have to debias this but that's straightforward it's just a linear map. So this is our local randomizer so why does it have the accuracy that I claim? Well there are two sources of error here one is in the randomised rounding when we map to fixed point. This incurs an error with variance n over k squared n because there are n parties one over k squared because that's the variance of the randomised rounding. And then there's the error due to the people who are lying adding inputting uniform random response. And there are gamma n of them approximately and each one is going to be adding something with variance one and so that contributes variance gamma n. And I'll show in a couple of minutes that we're going to have to take gamma to be k over n times some function of delta and epsilon. And if you sub that k over n in the top there you can see that the variance comes out as n over k squared plus k. And so we want k to be about n to the third and taking k to be n to the third gives a variance of n to the third and therefore we get this n to the sixth. So that's where the n to the sixth is coming from as the standard error. Okay so why do I want to take gamma to be this? Why does this suffice? In order to understand why this works we're going to look at what the adversary sees so this is the adversary's view. The adversary's view is equivalent to a histogram and they just learn how many times each input has been submitted or each message has been submitted. So that's a histogram of the possible messages. And we know that these messages have come from some people lying and some people telling the truth. So those people who put random responses in we can consider as this green histogram and those people telling the truth we consider as the red one. The adversary obviously can't see which is which but we're going to tell the adversary which is which. We're going to give them a present which basically tells them the set of parties that responded truthfully. And we're going to assume they also know what everyone else's input is except for yours. And that's a standard assumption in differential privacy that we want to be able to protect against arbitrary side information. So with this present the adversary can now remove everyone who told the truth's data from the histogram. And they're left with the green histogram plus your data entry assuming you told the truth. If you lied obviously your inputs independent of your data and so no information about you is going to be leaked. So we're only going to worry about the case where you told the truth. OK so this is the thing that we call the privacy blanket this green histogram here. And the idea is that your one piece of data is hiding amongst everyone else's data. So it kind of provides a cover for you to hide amongst these are the two situations that the adversary needs to be able to tell the difference between needs to be able to tell whether or not there's been one extra entry added to one extra zero submitted or one extra one submitted. So we're going to look at the likelihood ratio here as I mentioned earlier differential privacy roughly says the likelihood ratio isn't going to be greater than e to the epsilon. So it suffices to show the probability of it being greater than equal to e to the epsilon is at most delta. OK so in order to do that we won't need to work out this likelihood ratio. So in the case where you're submitting a zero this is just going this likelihood ratio is just going to be the number of zero submitted divided by the number of one submitted which is some binomial random variable plus one divided by some binomial random variable. And this won't be greater than e to the epsilon because binomial random variables are quite well concentrated about their means. So so long as these means are large enough will be fine. And how large the mean has to be is going to just be some function of epsilon and delta. And so as we want the mean to be some constant we can take gamma to be K over N times some function of epsilon and delta. And if you're familiar with differential privacy then you might recognize this log one over delta divided by epsilon squared. This is the amount of variance you have to add when doing the Gaussian mechanism in differential privacy. So this shouldn't be too surprising because these binomial random variables that you're hiding amongst are approximately Gaussian. So that's why it's log one over delta divided by epsilon squared. OK so that's the privacy proof. So now what's this amplification thing? So we've shown that we can do summation well. The question is can we extend this argument to other kinds of functionalities or other kinds of statistics we might want to compute? And Erling Sunatal proved recently that if you have some local randomizer in the local model this has epsilon zero differential privacy where epsilon zero is at most. Some constant. Then you automatically get differential privacy in the shuffle model with better parameters. You need to introduce a delta which is non zero. But you get this square root N on the bottom here. So epsilon can come down by a lot if N is large. And this is a useful result if you're trying to prove high privacy. But in a lot of situations you don't want very high privacy. What you want is moderate privacy and high utility. So we're able to extend the regime of this result to include larger values of epsilon zero. You'll note that on the right here there's an e to the epsilon zero in our expression for epsilon. So this does blow up reasonably quickly in epsilon zero. But the point is we can now take epsilon zero to be order log N and still get reasonable privacy guarantee of epsilon equal to say one. And in fact even in the case where we're taking epsilon zero to be a half and delta to be 10 for minus six. This graph shows that the constants that we're able to get out are better than those that come out of Erling Sunatal's result. Kind of because we're attacking things more directly than they were. I should mention though that their result is slightly more general in terms of the systems in which you can apply it. It works in models that are different to the one that we're looking at here. So it's not quite a strictly stronger result. So then you might ask if we have this amplification by shuffling. Can we do anything just by coming up with a local randomizer, some way of computing it in the local model and then applying amplification by shuffling to that? And kind of yes, kind of no. The randomizer that I just showed you for use in the shuffle model, it is a differentially private in the local model with some small value of epsilon. Sorry, some large value of epsilon, so order log n. And you can recover the result that I just showed you directly by applying amplification by shuffling. However, if you just take a local randomizer designed for use in the local model such as a randomized rounding to zero one and randomized response or adding Laplace random noise, then you won't do particularly well. You'll only get either square root n or maybe a log factor improvement. So you do need to choose your local randomizer so that it's optimized for accuracy in the shuffle model rather than choosing it so it's optimized for the local model and then just applying amplification by shuffling. Sorry. So there's another question here which is, I've said in the one message model we can do this, what about many messages? The curator said that with square root n, one bit messages, you could do better. You could do as well as the curator model pretty much. And then a recent note which we've put online on the archive since submitting this paper, we are able to know that yes, you can. There's not a huge amount of new work in that note because we're sure indeed that you can get away with order log n messages. So this is order log n messages of size order log n rather than order square root n messages. So how do we do this? It basically boils down to some result by Ishayatol from 2006 which says that if you have anonymous channels then you can do secure addition with statistical security. So we can add random noise to the inputs, everyone can add random noise to their input and then apply this protocol by Ishayatol in order to get, with only log n messages, in order to get a private form of addition. So the, so there's no need to use, so if you have log n messages available then you can get the accuracy of the curator model. So then what questions are left open? Well I've said that log n messages suffices to get the accuracy of the curator model and I've said that one message doesn't. You might ask what if I have two messages or three messages or log log n messages. We're looking into this at the moment but right now we're not entirely sure. But then we should probably also be looking to do things other than addition. The motivation for this model is that we can maybe have a means of shuffling or anonymous channels and we can just use that to compute a lot of different things. So we only have to implement this thing once and then you can compute a lot of different things with it very efficiently. This makes sense only if you can actually compute a lot of different things with it. So we'd need to be able to do something other than addition in this model if we're going to justify its use. We also need to explain how it's going to be implemented. There's already work on people trying to implement shuffling in trusted hardware which came out before all this work on differential privacy in the shuffle model came out. Just because they thought it seemed intuitive that shuffling people's data before they looked at it would improve privacy. We can also think of maybe doing this using MPC or using a mix net or something along these lines. But these options need to be looked into and it has to be sufficiently cheap in order to justify using this as a model. Another issue is there's a trust assumption here. In my proof you might have noticed I assumed that everyone was following the protocol and we don't really need to assume that everyone follows the protocol. It's sufficient for some positive fraction of people that we know in advance to follow the protocol. But we need there to be enough people following the protocol that enough of them respond randomly and don't tell the adversary what their response was. They can't be merely semi honest. They have to be honest. That there's enough of a privacy brand kit for you to hide amongst. So you might imagine that maybe some MPC means of shuffling will allow you to also verify that this noise is being added correctly. The noise or the randomized response could be done inside MPC rather than being done outside of MPC. And that would remove this need to trust people. And then of course, so I've said this shuffle model, it will work if all of these things happen. If it can be implemented well, if we can do other functionalities in it, if we can remove this trust assumption then it's great. But maybe it's not the best functionality for this. Maybe there are functionalities which allow you to do more and are easier to implement. And so another question would be, what other single functionality could we have that would allow us to do a great variety of things cheaply? Okay, that's all I've got to say. Thank you. Thank you very much, James. Do we have any questions? Okay, if there are no further questions, let's thank James and all the speakers of the session again.