 Hi, I'm going to talk about our paper on the power of multiple anonymous messages, frequency estimation and selection in the shuffled model of differential privacy. This is joint work with Babi Ghazi, Ravi Kumar, Rasmus Paa, and Amaya Velankar. And I'm Noah Golovic. I'll begin by reviewing the central and local models of differential privacy. And then I'll discuss the shuffled model of differential privacy, which is the main topic of this talk. After that, I'll talk about our main results in the paper. The first is a lower bound for the problem of frequency estimation in the single message shuffled model. And what we show is roughly speaking that the phenomenon of privacy implication by shuffling leads to optimal protocols for the frequency estimation problem. And then I'll talk about how we can in fact surpass these lower bounds using the multi-message shuffled model. And I'll discuss how we can make those protocols communication efficient. So we're going to assume that there's a universe U, which is simply a finite set. And there's n users, each of which provides an element of U as its data point. The collection of these n users' data points is called the data set, which is denoted by a capital X. Now, the goal of differential privacy is for the users to release certain statistics of the data set X in a way that respects each user's individual privacy. And in the central model, the way they do this is that each user first sends their data point XI to a trusted analyzer. This could be, for instance, a computer operated by a large company or government. And then this trusted analyzer adds noise to the desired statistics of the data set capital X. And we denote that the output of the analyzer by A of X. And in general, this is going to be a random variable. Now, the algorithm A which maps X to A of X is defined to be epsilon delta differential in private for positive real numbers epsilon and delta. If, roughly speaking, for all neighboring data sets, X and X prime, which differ in a single element, the distribution of A of X is similar to the distribution of A of X prime. Formally, we require that the probability of any event T occurring under the data set X is at most either the epsilon times the probability of T occurring when the data set is X prime, or plus an additive term of delta. Now, one issue with the central model of differential privacy is that a trusted analyzer may not be available in many applications. And this motivates the local model of differential privacy. Here, the analyzer is now untrusted. And therefore, the users must add the privacy-preserving noise themselves. So in particular, in the local model, in addition to an analyzer A, there's also local randomizers for each user. And we denote these by R of X. Given an input X, the user first local randomizes on its own end and it sends the output of the local randomizer, R of X, to the untrusted analyzer. And that then performs an analysis on the R of X1 to R of Xn. And the output is noted by A of R of X1 to R of Xn. Now, the algorithm induced by the local randomizer R is defined to be epsilon, delta, differentially private. In the local model, if the function that takes X to R of X is itself epsilon, delta, differentially private. And by this, we mean that your epsilon, delta, differentially private has a function on data sets of a single element, namely the single element X. And in particular, the supplier is a distribution of R of X to be similar to the distribution of R of X prime for all X and X prime in the universe. This is a pretty strong condition that necessitates adding a lot of noise. And even for simple problems, such as the problem of aggregation, the error, the additive error of the local model of differential privacy has to be on the order of the square root of the number of users. And this is often pretty undesirable in practice. So in this talk, I'll discuss one solution to this problem, which is the shuffled model of differential privacy. So here we see that there's a trusted shuffler noted by S in between the end users and the untrusted analyzer. Now, given the user's inputs, R of X1 through R of XN, the trusted shuffler randomly permutes those randomized, the outputs to local randomizers and then sends their permuted messages to the untrusted analyzer, which then outputs A of the shuffled messages. And now here's how privacy is defined in the shuffled model. The algorithm induced by the local randomizer R in the shuffler S is defined to be epsilon delta differential private in the shuffled model. If the function which takes the data set X1 to XN, the shuffled output of the concatenation of R of X1 through R of XN, is itself epsilon delta differential private. Now, one important distinction I'll make in the shuffled model is the difference between the single message versus the multi-message shuffled model. In general, R of XI, for any user, I may output M distinct messages in which case the shuffled S applies random permutation to M times N different messages. Now, in a special case where M equals one, we're gonna call that the single message shuffled model. Here the shuffled only permutes N messages. In the general case where M is larger than one, we will call that the multi-message shuffled model. Now, one reason to care about the single message shuffled model in particular is a phenomenon that was discovered in some recent papers called privacy amplification by shuffling. And what these recent papers showed is the following. Suppose we have a local randomizer R, which is epsilon L zero differential private in the local model for some epsilon L larger than zero. Then when we shuffle these messages, the resulting single message shuffled model, we're shuffling one message for each user, which is simply the output of R of X that resulting shuffle model protocol SR is epsilon S delta differential private in the single message shuffle model for some privacy parameter epsilon S, which is a lot less than epsilon L, and for some delta, which is not too large. Now, these results are neat because they allow us to essentially design single message shuffle model protocols in a black box manner. The first taking a locally differentially private protocol with a bad privacy parameter and then amplifying it using privacy application by shuffling. And I want application of this general principle is for frequency estimation protocols. So here the universe is the set of integers from one to B for some positive and inter B. And the goal is to compute the frequency of each element J between one and B. In particular, you want the number of users I who hold element J, in other words, such that XI is equal to J. We measure the error for frequency estimation by additive error. So it's the maximum where we're all J and the true frequency of J minus the estimated frequency of J. Let me take the absolute value of that. And so this is some integer between one and zero and N. Now, using application by shuffling, it's possible to show the following upper bound and the error frequency estimation in the single message shuffle model. You can show that you can perform frequency estimation on a domain of size B with error, which is roughly equal to the minimum of N to the one-fourth and B to the one-half up to the polylogic factors and N and B. Now to get the upper bound of N to the one-fourth, you simply shuffle the output of the local randomizer given by the Rapport protocol by Erlen Simbell. And you get the upper bound of B to the one-half. You shuffle the output of the local randomizer given by Warner's randomized response. And our first main theorem is to show that this is actually optimal in the single message shuffle model up to polylogic factors. So in particular, what we show is that for any differentially private single message shuffle model protocol for frequency estimation, the added of error must be roughly at least the minimum of N to the one-fourth and B to the one-half. Okay, so now we'll discuss the proof of our lower bound. The first idea of the proof is to use a reduction that was discovered by Che It Al a few years ago. And this shows that any Epsilon S Delta differentially private protocol in the single message shuffle model implies the existence of an equivalent protocol in the local model, which is Epsilon S plus natural log of N Delta differentially private. And what this means is that if we can show a lower bound for locally DP protocols for frequency estimation, where the Epsilon parameter is roughly the natural log of N, which is quite large, we can get a corresponding lower bound for the single message shuffle model. And this is exactly what we do. So in particular, our main lower bound in the local model is the following. If the privacy parameter Epsilon L is roughly equal to a constant times the natural log of N and Delta L is sufficiently small, say roughly a little over one over N, then any Epsilon L Delta L locally differentially private frequency estimation protocol must have error at least, roughly one over the square root of N times E to the Epsilon L over four. Now, using the result by Che It Al, this result implies our lower bound in the single message shuffle model. Now, not also the this theorem that we show also improves upon previous work by Ducci et al for locally differentially private frequency estimation. Ducci et al showed a lower bound of omega one over root N times E to the Epsilon L. Now, in the regime where Epsilon L is a constant times the natural log of N, E to the Epsilon L is a polynomial N, and thus our lower bound improves upon the prior work by a poly N factor. Now, also note that the lower bound of Ducci et al only applies to the case of pure differenti private frequency estimation, mainly where Delta L is equal to zero. Whereas our theme applies in the case of approximate EP, where Delta L is allowed to be positive. Okay, so now we'll discuss the proof idea of our local bound for local differentially private frequency estimation. And we're also speaking, what we want to use is Fanot's method. What this means is that let's let alpha be the desired lower bound on the error frequency estimation. Now, the goal is to show a certain upper bound and the mutual information between two quantities. Now, to define these two quantities, I'll let V be a uniformly random element of universe of integers from one to V. And let X be a perturbed version of V. In particular, let's suppose that X is equal to V with probability alpha. And it's a fresh draw from universe with probability one minus alpha. And the main step in our lower bound is to show an upper bound on the mutual information between V and R of X, where remember that R out denotes the output of the local randomizer given input X. You want the upper bound to be roughly alpha to the four times N times E to the epsilon L. Now, what makes this proof tad tricky is that this upper bound is actually false for general local randomizers R, even those which satisfy epsilon L Delta differential privacy. So in order to show the desired result, we actually additionally have to use the fact that R can be used together with an analyzer that leads to a protocol with error bounded above the alpha. In other words, we have to use the accuracy of the entire protocol in order to show this upper bound of mutual information. And this use of accuracy seems to be a little bit new in the line of work on lower bounds for differential privacy in the local model. Okay, so the takeaway from our lower bounds for single message shuffle model is that we must have error which is polynomial in either N or B. Now, this could be quite undesirable in practice since N and B can both be quite large and it would make the resulting statistics potentially unusable if the error is too large. So we'll talk about how to get error which is polylog with big N and B next. And to do this, we'll have to use the multi-message shuffle model, of course. But before doing so, I'm gonna first review the measures of efficiency that we require our protocol to satisfy. The first is communication complexity or costs and that's very straightforward. It's just the total length of all messages output by a single user which is measured by the number of bits. The second measure of efficiency is computation. So to define the computation cost of our algorithm, we're gonna only focus on the computation complexity of the analyzer because the user's algorithms are always gonna be very efficient. Now, the issue with respect to computation costs to the analyzer is that the domain size B might be much larger than the number of users N and it could be infeasible to compute the frequencies of all elements J between one and B. So the solution is to require the analyzer to output a frequency oracle, FO, which takes us input at query J from one to B and outputs a frequency FJ, which is an integer from zero to N to denote the estimated frequency of item J in the data set. And we measure computation by measuring the query time, which is the amount of time taken by a single query J to the frequency oracle. Now, one more resource we'll allow our algorithms to use is that of public coins. And so up to now, all protocols I've talked about have had private coins. But the analyzer, and particularly analyzer uses its own local randomness to output the randomized response R of X. And now for one of the protocols I'll present in the multi-message shuffle model, we'll allow all the local randomizers to access a common random string of public random bits. But importantly, these bits can also be viewed by the untrusted analyzer. Okay, so here's our main theorem for frequency estimation in the multi-message shuffle model. We show that there exists protocols which are EPS on Delta Defense in private and have the following properties. So there's two protocols that we consider that we prove. The first is based on the count win sketch. The second is based on the Hadamard response. Those are these protocols of error, which is polylogic and N and B, and also which communication polylogic and N and B. But they differ in the query time. The count win sketch uses public coins, but it has smaller query time, namely polylog and N and B. The Hadamard response does not use public coins. It only uses private coins, but its query time is roughly order of N, linear in the number of users. Now, roughly speaking, the public coins that the count wins sketch users are used to perform hashing, whereas the Hadamard response achieves a similar functionality in a different way. Now, here's the idea for how to construct a multi-measured frequency estimation protocols. The first step is to view frequency estimation as B parallel aggregation problems. And the J's problem for each element J between one and B, we simply wanna add the number of users holding that element, that item J. And for each user, that's gonna be simply a bit, either one or zero, learning whether that user holds the item J. And it was observed by Chodown 2018 that for this very simple aggregation problem, there is a local randomizer, which we denote by our ad, for which we can perform aggregation in this shuffle model with polylog and make error. Now, let's concatenate B of these local randomizers, one for each element of the domain, and have each user output B different messages, one, basically a one-hot encoding of which of the B elements that user has. And what this gives us in the multi-measured shuffle model is a protocol with polylog and make error to perform the frequency estimation problem. Now, the issue is that because each user has to output B different messages, one corresponding to each element of the domain, the communication cost has to be at least in the order of B, and this might be very large in practice, that might make it invisible in practice. So the second idea of a proof is to use a trick either using the Kaltman sketch the Hadamard response to avoid having to send B separate messages. And for the rest of this talk, I'll focus on our protocol based on the Hadamard response. You can see our protocol or a paper for the protocol based on the Kaltman sketch. I'll begin by introducing the Hadamard response, which is actually a locally differentially private protocol for frequency estimation. And for simplicity, let's assume that the domain size B is one less than a power of two. So we can consider the Hadamard matrix, which has B plus one rows and columns, delete the first row, and we're gonna denote that resulting matrix by H. So each of the B times B plus one elements of H is either a minus one or plus one. Now here's how Hadamard response is defined. For an element X between one and B, R of X is gonna be an integer defined as follows. It's probably E to the epsilon over one plus E to the epsilon. R of X is uniform over the columns K of the matrix H such that HXK is equal to one. And with remaining probability, one over E to the one plus epsilon, R of X is uniform over the columns K of H such that HXK is equal to minus one. It's known that this protocol gets error roughly equal to the square root of N log B divided by epsilon in a local model of differential privacy. It's also clearly epsilon zero differential in private and it's also communication efficient because each user just has to send log B bits to the analyzer. Now here's how we can send this protocol to the single message shuffle model in a way that makes it, allows it to remain communication efficient. Given an input X in the domain, the local randomizer is gonna output several messages. One of these messages, which we denote in green here is essentially a signal message. And this gives information about what X is. We denote this message by M and it's simply a concatenation of log N indices of the columns that had of our matrix. We denote these by M one through M log N and each of these items is uniform over the columns of H so that HXK is equal to one because those are the columns that were outputted with higher probability under Hadamard response. Additionally, we're gonna output a bunch of noise messages noted in red here. In a particular, let row equal to a parameter which is roughly log one over delta over epsilon squared. It denotes the number of noise messages. And we're gonna have noise messages noted by M tilde one through M tilde row. And each of them is simply a drawn uniformly from each of the log N length tuples of integers from one to B plus one. And each of those log N integers is drawn uniformly from one to B plus one. Notice that each of these noise messages is completely independent of X and they're added purely to preserve the differential privacy. The local randomizer R of X is the concatenation of these row plus one messages. Now roughly speaking, we have accuracy because we've concatenated enough indices of columns H in that single signal message in order for the analyzer to determine estimates for frequencies for each of the elements in the domain. Now privacy is a little bit trickier to show but roughly speaking, it follows because the number of noise messages M tilde L that are consistent with any given row of our matrix is a binomial random variable, which is sufficiently smooth. And by consistent with the row X, what I mean is that those messages the user could have outputted if their true data point was X. Okay, in our paper, we have several additional results. The first result that we show is the first additional result is a lower bound on the sample complexity of the selection problem in the single message shuffle model. We show a lower bound that will make it D where D is the number of elements in the selection problem. And this is tied up to a logarithmic factor because it's known that you can solve this problem with D log D samples or the D log D samples in the single message shuffle model. We also show a corollary of the Hadamard response-based protocol that which I talked about, which shows how to efficiently implement certain families of non-adaptive statistical query algorithms in the shuffle model. In particular, we show this for statistical query algorithms which are sparse in the sense that for any element of the universe, there's a bounded number of statistical queries that evaluate to one on that element. Finally, there's many applications of our upper bounds for frequency estimation. Viomina discussed here, such as for range queries and quantile estimation. And you can find there a paper at this link. Okay, I'll finally discuss some open problems. One open problem is whether or not it's possible to decrease the roughly linear query time that's achieved by Hadamard response. For private coin frequency estimation, the multi-message shuffle model. We can do this with public coins using Kaplan sketch. A second open problem is the problem of selection on D elements. There's a multi-message shuffle model protocol with sample complexity, order root D, which beats the single-message shuffle model, lower bound of omega D. But it's unknown if we can do better. Now, the central model is possible to get log D's sample complexity. This seems to be a pretty challenging and very interesting open question. And that's it. Thank you for listening.