 Hi everyone, my name is Dakchita and I'm going to talk about building low-error efficient computational extractors in the CRS module. This is joint work with Ankit Garg and Yael Kalai who are both at Microsoft Research. In cryptography we nearly always need access to perfect sources of randomness. But such perfect randomness is really hard to come by in practice. Typically randomness is derived from physical sources which may be imperfect or sometimes the adversary may be obtaining partial side information about the secret randomness used by honest players and thereby skewing this randomness. And we have a whole lot of evidence including most recently worked by Breitner and Heninger who computed several Bitcoin private keys that shows that imperfect sources of randomness can lead to explicit attacks on cryptosystems. So a very natural meaningful question is can we convert weak sources of randomness into strong ones? Now let me next talk about what I mean by a weak source. So a weak source is represented by a distribution on n bits that has min entropy k where k is something that's much smaller than n. Now min entropy k corresponds roughly to each point in the distribution being selected with probability at most 2 to the minus k. An example of a weak source with min entropy k is the uniform distribution on a subset of size 2 to the k of the space of 0, 1 to the n. So going back, randomness extractors are sources that help convert these weak sources of randomness into nearly uniform sources. Say we have a weak random source with k bits of entropy on an n-bit input space. Ideally we would like to have a deterministic algorithm that converts this into a nearly uniformly random source on n bits where m can be as large as k. Unfortunately, it turns out that this dream deterministic extractor simply cannot exist even for really weak parameter settings. For example, when we just want to extract one single bit of randomness and the source has min entropy n-1 and let's say we want the resulting distribution to have a statistical distance of at most 0.25 from the uniform distribution on a single bit. Even in this really weak setting, it turns out that such extractors cannot exist. So what turns out to actually be possible is a variant that takes the help of a short uniformly random seed to convert an n-bit source with min entropy k to an almost k-bit uniform random distribution. This is pictorially represented on this slide as an algorithm that on input source and seed outputs a distribution that is close to uniform. In fact, the size of the seed can actually be much smaller than the min entropy of the input source or the size of the output source. And Guruswamy et al showed that the min entropy can actually be as small as polylogarithmic. The size of the seed can also be polylogarithmic and in this setting the error can be a negligible function of the size of the source. And finally, the definition of a seeded extractor can be further strengthened to what is called a strong seeded extractor where the requirement is that the output distribution of this extractor be statistically indistinguishable from uniform even given the seed. Now one problem with seeded extractors is that you need this uniform independent source of randomness called the seed and it turns out that having access, if having access to uniform randomness is difficult in the first place, then getting access to such a seed may be hard. So can we relax things so that the source and the seed can both be imperfect? It turns out that the answer is yes and this question motivated the setting of two source extractors where there are two weak sources of randomness that have a certain amount of min entropy each and the only assumption is that the two sources are independent but neither of the two sources needs to be uniform. Now a two source extractor is then a deterministic algorithm that given two independent sources with sufficient min entropy outputs a distribution that is close to uniformly random. Now for a long time we only knew how to extract randomness when at least one of the sources had min entropy about half of n and this was due to the results of Ra's and Burgein. A breakthrough work of Chatopadhyaya and Zuckerman broke down this barrier and built two source extractors in the setting where both sources can have polylogarithmic min entropy and in fact these log factors were further improved in several subsequent works but in all these works that go to polylogarithmic min entropy the running time of the extractor turns out to be inversely proportional to the desired error and in particular this means that the error cannot be a negligible function of n as this would lead to a construction that grows super-polynomially with the size of sources which would be an inefficient construction. So the dream especially from a crypto point of view would be to have information theoretic two source extractors where both sources have polylogarithmic min entropy and the error of the extractor is negligible in n or alternatively the run time of the extractor is inversely proportional to log of 1 by epsilon and not 1 by epsilon itself. Now this as such appears to be hard to achieve and it's unclear if this is even possible in the information theoretic setting. So we ask is this dream any easier to achieve if we make computational hard tense assumptions and use cryptography. Now prior work has indicated that if the assumptions are sufficiently strong then the answer is yes and in particular assuming an optimally exponentially hard one-way permutation prior work of Kali et al. showed that one can obtain negligible error but it's completely unclear whether such one-way permutations even exist and in fact we would like to rely on well studied cryptographic hardness assumptions. Now on the other hand when trying to relax assumptions and get a construction from standard cryptographic assumptions many known techniques seem to hit a barrier. The barrier is the following any reduction to an efficient challenger assumption would need to embed the challenge somewhere in the view provided to the adversary. Now this view is basically the result of sampling from arbitrary source distributions with sufficient entropy and applying an extractor to the samples. It's completely unclear how to embed an external challenge into these arbitrary source distributions. So therefore we introduce another relaxation the common random string model. This involves a trusted setup phase whereas a random string is sampled uniformly and fixed once and for all. Then the source distribution can be sampled arbitrarily depending on the common random string. Now this differs from the setting of seeded extractors where crucially the source and seed are required to be independent of each other. So what this model basically does is reduce the need for true randomness to a single one-time requirement on the common random string and crucially sources are now allowed to depend on this string. So in this model our main theorem is the following assuming sub exponential hardness of the DDH assumption there exists a constant between 0 and 1 such that there exists a two source extractor in the CRS model for two sources X on N1 bits with min entropy K1 and Y on N2 bits with min entropy K2 such that the extractor has error epsilon where the size of the first source is omega in N and N is considered as a security parameter. The size of the second source is N to the C where recall that C is a constant between 0 and 1. The first source has entropy N1 to the C and the second source has entropy polylogarithmic in N2 and finally the error is negligible in N that is the output of the extractor is computationally indistinguishable from a uniformly chosen random bit. Now here I want to point out that in the computational setting we have the following restrictions the sources are efficiently sampled and we only care about computational indistinguishability of the output of the extractor from uniform. Also note that we focus here on only extracting a single bit of randomness but actually our techniques can help obtain nearly K2 bits of randomness. So our construction of computational two source extractors proceeds in two steps. First we build a non-malleable extractor for high entropy sources in the common random string model and second we compile this non-malleable extractor for high entropy sources to a two source extractor for low entropy sources and both these steps give efficient constructions with negligible error. We will now go ahead and define non-malleable extractors a notion that was first considered by Zimbowski and Wicks. Now a non-malleable extractor can either be a seeded extractor which means it can involve an entropic source and a uniformly random seed or it can be a two source extractor which means it involves just two independent entropic sources. Now for the purposes of this talk we will only consider strong seeded non-malleable extractors for which in the computational setting we have the following definition. So in terms of syntax a non-malleable extractor is exactly like a seeded extractor which means that on input X sampled from a source distribution and a uniformly random seed the non-malleable extractor outputs a sample from a distribution but the security requirement is much stronger. This requirement is that the output of the non-malleable extractor be indistinguishable from uniform even given the common random string the seed and access to a special oracle. Now this oracle can be queried on any string Y that is not equal to the seed and on such a query the oracle returns the output of the non-malleable extractor computed on X and Y. Note that a polynomial time adversary can query this oracle only an arbitrary polynomial number of times. So moving on to our construction our construction of non-malleable extractors satisfying the definition that we just discussed is inspired by the construction of leaky pseudo random functions of Braverman, Hasidim and Kallai. Specifically we will assume the existence of a family of collision resistant hash functions and a family of lossy functions such that each function in the family is either lossy or injective and moreover lossy functions have image size that is much smaller than the size of the domain whereas injective functions have image size that is equal to the size of the domain and finally the lossy function family has the additional requirement that lossy and injective functions must be indistinguishable from each other. So now the CRS of the non-malleable extractor consists of a key for the collision resistant hash function and a set of 2K injective functions drawn from the function family F. Here K also corresponds to the output domain of the collision resistant hash function. Now the non-malleable extractor on input is CRS a source X and a seed S first interprets the CRS as consisting of a hash key as well as 2K functions then computes a hash of the seed S using the hash key from the CRS and let's denote the output of this process by S' Next it uses S' to sample K out of the 2K functions that were in the CRS specifically if the first bit of S' is 1 it picks F11 and otherwise F10 more generally if the ith bit of S' is 1 it picks FI1 and otherwise picks FI0 Next it applies the composition of all the functions that were picked to input X and the output of this process is denoted by X' and finally the output of the non-malleable extractor is the output of an appropriate 2 source extractor applied to X' and the original seed S Now the required 2 source extractor needs to have negligible error but we can allow one of the sources which is S to have high entropy so since we simultaneously don't require low entropy and low error at the same time such a 2 source extractor can be obtained in the high entropy low error regime based for example on Raz's construction So I'm going to leave the construction there for your reference and move on to discussing the proof intuition Recall that the adversary A has access to an oracle that on input any Y not equal to the seed S generates the output of the non-malleable extractor on input X and Y Moreover the adversary is given the CRS, the seed and value Z where Z is either sampled as the output of the non-malleable extractor on input the CRS, X and seed or Z is sampled uniformly at random and the adversary wins the game or breaks security of the non-malleable extractor if it can distinguish between these 2 cases To argue that the adversary cannot win this game we first observe that the adversary given seed S actually cannot query its oracle on any input Y such that S' which recall that S' is the hash of S is equal also to the hash of Y so essentially for any input Y that the adversary queries the oracle with the hash of Y is going to be different from S' and this is guaranteed by collision resistance of the hash function family Next we note that we can actually change the CRS to ensure that all functions indexed by S' are injected and all the other functions are lossy Now by property of the lossy function family this means that the image size of all functions not indexed by S' is going to become much smaller than n1 So what this means is that for every query of A to the oracle at least one of the composite functions in this sequence derived from hash of Y will be lossy and this is good news due to one of these functions being lossy X' will come from a very small space in all of the adversary's queries and this essentially means that this oracle can be simulated given only the CRS the value S' and small leakage G of X on X Now since S' and G of X are really small leakages this means that X and S actually have sufficiently high entropy left even conditioned on all the information needed to simulate this oracle As a result we can conclude that the output of the underlying two source extractor and therefore that of the non malleable extractor is going to be indistinguishable from uniform even given this additional oracle Now I've thrown several technical details under the rug there due to lack of time but we are essentially able to formalize those ideas to obtain seeded non malleable extractors in the CRS model Our next step is to compile the seeded non malleable extractor into a two source extractor for low entropy sources and I will now say a few words about how this is done Now this follows a template due to Benaroya, Chatopadhyay, Doron, Li and Tashma who take any non malleable extractor for high entropy sources or a seeded non malleable extractor that allows an adversary to make roughly T tampering queries such that the size of the seed does not grow too much with T and they combine this with dispersors to obtain a two source extractor for low entropy sources Now what we just described was a non malleable extractor with high entropy sources and T tampering queries but where the seed length is essentially independent of T which is great but this is still insufficient to instantiate the compiler of Benaroya et al because what they needed for their compiler was an unconditionally secure non malleable extractor and what we got was only computationally secure Additionally, our non malleable extractor is in the CRS model whereas the Benaroya et al transformation is for the plain model and this leads to additional complications in our setting Nevertheless, we are able to prove that the variant of their transformation also works in the computational setting in the CRS model and proving this actually turns out to be highly non trivial A key barrier here is that the Benaroya et al technique actually relies on an inefficient reduction that is, given any efficient adversary that breaks the two source extractor Benaroya et al build an unbounded adversary against the non malleable extractor Our technical contribution is in making this adversary against the non malleable extractor computationally efficient To do this, we rely on a leakage lemma due to Gentry UX Jechef Petrizzac and Chum Louis and Paz to actually efficiently simulate the inefficient reduction of Benaroya et al This actually turns out to be technically very complex to implement and requires sub exponential security of the non malleable extractor and this is also what ends up imposing the requirement that the sources be unbalanced Unfortunately, I'm out of time so I'm not going to be able to get into more details about this but I will conclude by reiterating our main theorem which is that under the sub exponential hardness of DDH the two source extractor in the CRS model for unbalanced sources with low main entropy and low error and where the sources are required to be efficiently sampleable and the security is only computational We also obtain non malleable extractors with similar parameters in the computational setting again in the CRS model and here I want to point out that a follow up work due to Agarwal et al obtains two source non malleable extractors that are incomparable to ours but in particular their tampering model is stronger in that they allow both sources to be tampered with Let me wrap up with some open problems The first interesting problem is whether we can completely get rid of the CRS and rely at the same time on well studied cryptographic hardness assumptions This would of course involve getting around the barrier that I discussed towards the beginning of this talk Another question is whether we can obtain two source extractors from polynomial hardness assumptions as opposed to the sub exponential assumptions that we required in this work and finally I think it would be interesting to explore relationships of these new improved non malleable and two source extractors with leakage resilient codes non malleable codes and other types of cryptography That's the end of the talk Thank you