 So I'm going to start talking about randomness and cryptography. We know that cryptographic algorithms require randomness. Secret keys must have entropy, and certain primitives must be randomized. And it's common for us to assume perfect randomness in our analysis. But we know that real world randomness isn't perfect. So we need to ask the question, can we base cryptography on realistic and perfect randomness? And so what do we mean by imperfect randomness? I'm going to define an imperfect randomness source, roughly by a family of distributions satisfying some property. For example, that they have some entropy. And to tolerate an imperfect source means to have one scheme that works for any distribution in this family. So if we restate the question, what we really mean is why imperfect sources are enough to do cryptography. So this is easy to answer. If we have extractable sources, these are sources that allow the deterministic abstraction of perfect randomness. And so if we have extractable sources, it's easy to pretty much do anything that we could do with perfect randomness. The bad news, however, is that a lot of sources are not extractable. So let's look at non-extractable sources. Well, it's obvious that if the source has no entropy, then we can't do crypto. But what about sources that have some entropy? They're just not perfect. Generally, they're not extractable, so that's bad news. And let me give you a very simple example. These are the gamma Santa Vasirani sources. They're parametrized by a parameter gamma. And what they do is they output bits, sequence of bits. Each has a bias of a most gamma. And the bias of each bit can depend on the outcome of all previous bits. And it was shown by Santa Vasirani that these sources are not extractable, namely for any function that takes any number of coins and outputs a bit. There exists the Santa Vasirani distribution such that the function applied to this distribution has bias at least gamma. So let's go back and look at what we can do in terms of cryptography, when depending on the source, we know that it's possible to do cryptography. We have extractable sources. And it's impossible if our source has no entropy. But where do general kind of weak entropy sources fall in this categorization? And the answer is that it's complicated. It depends on the application that we're trying to do. For example, for a BPP simulation, a series of works have showed that weak sources are enough to kind of simulate BPP algorithms. And so we might want to ask, well, can we have the same good news for cryptography? In the case of authentication, the answer is yes and no. But mostly, yes. We know that many, not all, but many weak sources are sufficient to do max and also for signature schemes under appropriate hardness assumptions. And the reason for this is that we only require that it's hard to guess or to forge a long string. So having min entropy in the source is good enough. However, for privacy, the results are much more negative. We know that Santa Vasirani sources are not sufficient for unconditionally secure encryption. And this was later strengthened by Doris et al. To show that they're also not sufficient for computational e-secure encryption and also commitment, zero knowledge, secret sharing. This was again later strengthened by Bosley and Doris in which they showed that if you can generate a K-bit secret key from a distribution, then you can extract almost K almost uniform bits from this distribution. So what this means is that privacy requires an extractable source. Now, let me go into kind of the main lemma that allowed Doris et al. to show this very negative result for encryption and commitments and pretty much any privacy application. And the lemma is as follows. If we consider a weak source X and two functions f and g such that f of x and g of x are computationally indistinguishable, then it must be the case that f and g actually agree on all but negligible number of inputs. So why does this lead to the very negative result? Well, in privacy applications, we normally require the adversary to have a negligible advantage in distinguishing distributions. For example, in the case of encryption, we require that encryptions of zero are indistinguishable from encryptions of one. So if we plug this into the lemma, we have that encryptions of zero and encryptions of one must agree on all but negligible number of coins. And so this just doesn't give us a very useful encryption scheme. So now we ask, well, can we base privacy on weaker sources if we naturally relax the definition of security? For example, if we consider differential privacy. And if you are here for the tutorial, you've seen the definitions also in the previous talk, but let me just go over these again. In differential privacy, we consider databases which we model as arrays of rows. And we say that two databases, D1 and D2, are neighboring if they differ in one entry. And we also care about queries. These are functions that take as input the database and give us an output. And for the purpose of this talk, I'm just going to assume that this output is an integer. And also for the purpose of this talk, let's just limit ourselves to low sensitivity queries. These are queries in which the answer doesn't change by much on neighboring databases. So we say that a mechanism is epsilon differentially private with respect to a randomness source S. For all neighboring databases and all randomness in the source, all possible outcome Z. The probability that the mechanism on input, the first database gives us the output Z over the probability that gives us Z on the second database. This is, these probabilities are pretty much the same. We model this by having the ratio, be less than or equal to e to the epsilon, which when epsilon is small, this is pretty much just one plus epsilon. And notice here that in the definition, epsilon cannot be negligible. And this is because this would imply that the output of the mechanism is negligibly close for any two different databases. And this just doesn't give us anything useful. So because epsilon cannot be negligible, we can hope to overcome the impossibility results of dot is et al. So I said the mechanism would not be useful. Let me formalize this and give you the definition of utility. We're going to say that a mechanism has a row utility with respect to a source. If raw databases and all distributions in the source, the expected value of the difference of the true answer of the query F of D, and the output of the mechanism is less than row. So privacy and utility are definitions that kind of seem to be at odds with each other. And so we ask, can we have kind of a good trade-off between privacy and utility? This inspires the following definition. We say that a family of mechanisms is accurate and private with respect to a source. If we're all epsilon greater than zero, there is a mechanism in this family that is epsilon differentially private and has utility that is a function of epsilon. And from now on, I'm just going to call these mechanisms non-trivial, just because accurate and private is quite long. So let me show you an example of kind of some non-trivial mechanisms. These are the so-called additive noise mechanisms. They follow the following template, which is they compute the true answer to the query F of D and then they add some noise from an appropriate distribution. And this template has been followed by many works in the literature. But I'll give you just a very simple example, which is that the noise distribution from which we're gonna sample the noise is going to be the Laplacian distribution with mean zero and standard deviation, which is order of one over epsilon. And the work of Dwork and Thal in 2006 showed that this mechanism is epsilon differentially private and has order of one over epsilon utility with respect to the uniform distribution. And therefore by our definition, these are non-trivial mechanisms with respect to uniform. So now our question becomes, are weak entropy sources sufficient to achieve non-trivial mechanisms? And let me show you our results. We only consider the case of Santa Vazirani sources and we show both negative and positive results. The negative result we show is that additive noise mechanisms cannot be non-trivial with respect to Santa Vazirani. But most surprising is our positive result. We show that we show a non-trivial SV robust mechanism for low sensitivity functions. And why is this surprising? Well, it gives us a separation between traditional and differential privacy. As I said before, traditional privacy, we kind of know that you can't base it on Santa Vazirani sources, but here we show that you can achieve differential privacy with Santa Vazirani sources. So I'll start by telling, by giving you, by showing you the negative results, the first, let me show you a very simple lemma that only talks about sets and Santa Vazirani distributions. So let's consider two sets, T1 and T2, such that the size of T1 is greater than the size of T2. And let's define sigma to be the ratio of the size of everything that's in T2, but not in T1, over the size of T2. And you can look at this quantity of sigma as kind of a degree of disjointness. So for example, if the sets are disjoint, sigma is going to be equal to one, whereas if T2 is contained in T1, sigma is going to be zero. And what we can show is that there always exists the Santa Vazirani distribution, such that if you sample a coin from this distribution, the probability that it lands in T1 over the probability that it lands in T2 is greater than or equal to one plus gamma sigma times the ratio of the size of the sets. And if you look at the probability as taken from the uniform distribution instead of from the Santa Vazirani distribution, the ratio of these probabilities, it's actually just going to be the ratio of the size of the sets. So this factor of one plus gamma sigma is really just the factor by which the Santa Vazirani distribution can increase this ratio of probabilities. Okay, so why don't we care this at all? Well, in differential privacy, let's fix neighboring databases, D1 and D2 and a query F and an outcome Z. And let's define T1 and T2 as follows. TB is simply going to be the set of coins that make M output Z on database DB. And again, for differential privacy, we're concerned upper bound this ratio, which just turns out to be the ratio of the probability that if we sample a coin from the distribution, it lands in T1 over the probability that it lands in T2. And by our lemma, this is going to be greater than or equal to one plus gamma sigma, which kind of explains, okay, and then additive noise mechanisms, T1 and T2, you can show that they're disjoint. So sigma, which is the degree of disjointness, is going to be equal to one. So this ratio of probabilities that we care for differential privacy is going to be lower bounded by one plus gamma. And so if we want to upper bound by one plus epsilon, then this means that epsilon must be greater than or equal to gamma. So it kind of explains why we can't have epsilon differential privacy for epsilon less than gamma with additive noise mechanisms. So we can conclude something a little bit more general and this is that epsilon differential privacy with respect to Santa Vasirani distributions requires sigma to be order of epsilon, which means that the intersection of these two sets, T1 and T2, must be really big. It must be one minus epsilon fraction of the size of T2. This kind of motivates the following definition, which is definition of consistent sampling. It's similar to definition that has already appeared in the literature. And what we say is that a mechanism has epsilon consistent sampling. If for all neighboring databases, this quantity, this discreetness that I described before is less than epsilon. And it's kind of very easy to show that if M is epsilon consistent, then M is also epsilon differentially private with respect to the uniform distribution. The proof is very simple, I won't go over it, but you can see it's only one line. Okay, so with this definition in mind and knowing that we kind of need to satisfy this definition to hope to have SV robust mechanisms, we're going to define a new mechanism and our mechanism, what it's going to do, it's going to compute the true answer to the query, add Laplacian noise as before, but then what it's going to do is it's going to round the outcome to the nearest multiple of one over epsilon. Utility is conserved. We still have order one over epsilon utility, but I want to convince you that we now have epsilon consistent sampling. And so if we look at the previous mechanism, the one that just added noise without rounding and consider databases D1 and D2 where the query differs, so it's zero and D1 and one and D2, here I'm just showing kind of the sets of coins that will make the mechanism output, for example, zero and for D1, here it would output one with D2. And you can see, for example, if we fix an outcome, for example, zero, then these sets that here this would be T1 and this would be T2, they're completely disjoint. Now when we round what we're actually doing is we're taking one over epsilon number of these intervals and kind of putting them together, merging them. For example, if you have epsilon equals to one half as I have here, and what you're doing is you're kind of rounding to the nearest even number and putting together kind of two little intervals and now your outcomes are only going to be multiples of one over epsilon, which are, in our case, R is two, and now you can see, sorry, now you can see that these sets, T1 and T2, start overlapping and this is what we want for epsilon consistent sampling. So as I explained, this will guarantee that kind of T1 and T2 will start intersecting, which will help us overcome our lower bound. Now the question becomes, can we implement this in an SV robust manner? And the answer is yes, but it's highly non-trivial. In fact, doing this takes, it was pretty much the bulk of our technical work. It turns out that not every implementation is SV robust, which means that epsilon consistent sampling is necessary, but it's not sufficient to handle Santa Vasirani sources. This leads us to define epsilon SV consistent sampling. It's a natural definition. It doesn't reference the Santa Vasirani distributions and we show that it's sufficient for SV robustness and to ensure SV consistency, what we're going to do, we're going to use arithmetic coding and here we need to be very careful with final precision. So just to summarize, we consider SV sources in differential privacy and we show that differential privacy is possible with Santa Vasirani sources. This shows the separation between traditional and differential privacy, but we know that in the world, that imperfect randomness doesn't have this very structured definition that Santa Vasirani has where each bit is biased with gamma. It doesn't really follow that. So we leave open the question of how to, can we achieve differential privacy with other weak sources? And our paper and our paper also want to motivate the use of consistent sampling as a design paradigm for future differential private mechanisms. This has found useful applications in an upcoming CCS paper that talks about floating point arithmetic in differential private mechanisms. Thank you.