 Okay. So, uh, the plan for today is, uh, to, uh, to- to get to the proving or most of the ideas in the proof of- in- in A proof of the hill result that from an arbitrary one-way function, you can construct a pseudorandom generator. Um, but maybe it's worth kind of recapping and kind of talking about the high-level takeaway points from what we did last time. So what we saw yesterday was, uh, a version of this result for one-to-one injective one-way functions. And the way the proof, uh, went was in- in three steps. So first, we interpreted the- the one-wayness property, the hardness of inverting the function f in terms of a kind of pseudoentropy, specifically this unpredictability, uh, pseudoentropy. That, uh, the fact that it's hard to predict x from f of x, we interpret as saying that in some computational sense, um, x has a significant super logarithmic when we talk about polynomial security amount of pseudoentropy given f of x. On the other hand, x- information theoretically, x is completely determined from f- f of x and so we have a gap between the kind of- this kind of pseudoentropy and the actual entropy in x. And, uh, and then we in the next step try to turn this unpredictability, the pseudoentropy of x, into, uh, pseudo randomness, um, or- or hill pseudoentropy, which we saw you can't use a general extractor, an arbitrary extractor for doing that. But if you have this additional property of the extractor being what's called reconstructive, then, uh, uh, you can get, so this is the f of x together with the seed of the extractor. So this should be the extractor, but that was, uh, the special- special the instantiation with Goldreich, uh, of Goldreich-Levin. All right, the- the output of a reconstructive extractor on something that's unpredictable will be indistinguishable from a uniformly random bit. And so this- this triple is indistinguishable from having n plus one bits of entropy, computationally indistinguishable from having n plus one bits of entropy, but its actual entropy is, uh, the seed length, which here is, uh, the length of x plus the length of s. Okay. So we have a gap of one bit between pseudo- hill pseudoentropy and actual, and- and in fact seed length in particular actual entropy. And then we said by- by doing repetitions, we can make this- this is not n, again, this would be- all right. So I guess this is n, this is d, this is n plus d plus one, and let me write n plus d, sorry. I forgot to include this seed of the extractor there when I had just n. Okay. By doing repetitions, we can make this gap bigger and then we can use any polynomial time extractor to convert hill pseudo-entropy into pseudo-randomness and we will obtain an amount of pseudo-random bits that's larger than the seed length because the hill pseudo-entropy is larger than the seed length. Okay. And the repetitions are just to compensate for the entropy loss that happens when you do extraction. Okay. Uh, so from conversations at dinner last night, I thought it's worth just stepping back from the technical details of reconstructive extractors, which got- got a little bit involved last time, and just kind of step back at an intuitive level, what is a reconstructive extractor giving you? So let's just step back to what is a- what is an ordinary extractor? The definition of a k-epsilon extractor, it takes a source and a seed and gives you some output, is it precisely- I assume this is the definition that was- was used the- the kind of average case or conditional version, that if I have a random variable with min entropy at least k given, possibly given some other random variable y, then the output of the extractor applied to x should be- uh, should be close in statistical distance to uniform given y and the seed of the extractor. Okay. So let's look at the contrapositive of that. All right. So the- an equivalent statement is that if I have a distinguisher between these two distributions, all right, some- some test T that distinguish the- distinguishes these with advantage epsilon, now it's possibly a computationally unbounded test, because we're talking about statistical distance, that implies that x does not have entropy at least k given y, which is equivalent to there being some, again, possibly computationally unbounded predictor of x from y that succeeds with more than two to the minus k probability, or just by definition. It's just a- a restatement of what it means to be an extractor. Okay. And what does a reconstructive extractor do? It- it's- it's, uh, it's meant to give you an efficient conversion of tests distinguishing the output into predictors. Okay. Just not to say, you know, there exists some predictor, but actually we get an efficient algorithm A, if we start with an efficient test T. Okay. Which is precisely what we want when we're trying to convert computational unpredictability entropy, which is talking about, you know, the hardness of predicting x from y by computationally bounded algorithms, when we want to convert that into pseudo randomness like we did. All right. So why- what's the advantage? I mean, people have seen- many people have seen Goldreich-Levin before. What's the- what are the advantages of thinking about it in this language? All right. One is that this is capturing more directly what we want to achieve in the application when we- when we- when we often- when we use Goldreich-Levin to convert unpredictability to pseudo randomness. This is more directly, you know, not referring to predicting individual bits like Goldreich-Levin does, but saying we want to get a string that's indistinguishable from uniform. And secondly, this kind of opens up more machinery of randomness extractors that can be brought to bear and give you potentially better parameters in this kind of, you know, conversion of unpredictability entropy to pseudo randomness. So we didn't see it last time. The only example of a reconstructive extractor we saw was Goldreich-Levin or repeating Goldreich-Levin many times to get many- many bits out. But as I- as I hinted that there are many other extractors in the literature that also have this reconstructive property. And for example, they- they can give you a much shorter seed length. The amount of additional randomness you need to invest into the extraction can be much smaller than what- what you have from Goldreich-Levin. Where in Goldreich-Levin you use n bits for even getting one bit. And if you're going to repeat it many times, you're going to blow up that- that seed length. And there are versions of Goldreich-Levin where you- you try to save on that, but really the right way to think about those- those optimizations is in this language of reconstructive extractors. Okay. So I think that's all I wanna- are there any questions about this? We're not really- nothing I did was really using non-uniformity. And so I think that caused a little confusion. I use this word advice in talking about reconstructive extractors. But- but actually when we did this reduction, we just chose the advice uniformly at random. And so there was no hardwiring anything non-uniformly. I think everything that I've said so far works almost, I mean equally well, but I- I just decided to work in a non-uniform setting because some of the later definitions become a little more complicated in the uniform setting. Yeah. And so I guess the- the other thing that maybe was a little confusing was why did I, instead of saying this more directly that you- in reconstructive extractors, that what you want to do is predict with probability more than 2 to the minus k, I said instead that you could predict with constant probability if someone gives you k bits of advice. Okay. Which is a stronger statement, but it is one that's achieved by- by all the known con- all the constructions of reconstructive extractors that I know. And again it allows for some additional optimizations such as the trick that I mentioned where you don't have to choose all the k bits at random, but if you have- you can invest some of your running time to enumerate over- over all the possibilities and- and improve a bit on the- on the- the prediction. Okay. That wasn't anything I did. The light's going on enough. Any other questions on that? So let's go back now to trying to get pseudorandom generators from general one-way functions. So we start out with a- with a general one-way function. It is still the case that you can say that the un- that- that x has unpredictability entropy, at least given f of x, at least super logarithmic in n, and the actual entropy is 0. But the problem is that this- this statement that x has unpredictability entropy given f of x is, uh, not necessarily an interesting statement anymore. It may not hold for computational reasons, it may hold for purely- ah, sorry, this is not true. Ah. All right. So it is still the case that x must have super logarithmic entropy unpredictability entropy given f of x, but it could be, um, that this is for information theoretic reasons that the entropy of x- the actual entropy of x given f of x is very large. All right. So for example, you know, imagine if your one-way function, um, is one that depends on only the first half of the bits of x. Okay. So I- I take some one-way function g and apply it to the first half bits of the input, and that's how I define my one-way function f. All right. So this is, uh, a function that's 2 to the n over 2, uh, at least 2 to the n over 2 to- to 1, the number of pre-images of any output. Then the entropy of at- the input given f of x is at least n over 2. Okay. So in particular, the unpredictability entropy of x given f of x is at least, uh, all right. This is- I- I can even write this. So it's true also for- for, uh, min entropy. Um, so the- you can't predict x from f of x, uh, with except 2 to the minus n over 2 probability, but that's even for unbounded algorithms. Okay. So the fact- so this statement here is completely useless. Uh, it does not capture anything about the computational hardness of inverting f because we have an even stronger statement that just holds information theoretically for this function. Okay. So what are we going to do? Um, we're going to try to, uh, find some other pseudo-entropy notion that will capture- that will capture better the- the- the hardness of inverting f. In particular, this- looking at just the unpredictability entropy of x given f of x is only talking about finding that specific preimage that was used to compute f of x. Whereas the definition of one-way function says it should be hard to find any preimage. And so we want to find some measure of pseudo-entropy that captures that idea that it's hard to find any preimage of the output you're given and not just the particular one that- that someone used to compute f of x. All right. So to do that, we're going to have to introduce, uh, some Shannon theoretic notions of entropy, which I guess have not been introduced so far. Mentioned it once. Mentioned it once, but- okay. All right. So let's quickly review Shannon entropy. So the- the Shannon entropy, I guess sometimes I'll write- all right. So I won't- I won't write any- when I don't write a subscript of infinity, it's referring to Shannon entropy. All right. So this is, uh, defined to be the expectation when you take a sample from the random variable of the log of 1 over the probability mass of that sample. That's the- the definition. All right. And compare this with, um, min entropy where we take the minimum over x of log of 1 over the probability. So the difference between Shannon entropy and min entropy is that Shannon entropy is- is kind of measuring randomness on average whereas min entropy is measuring it in the- in the worst case. Okay. So, um, here you can think of this quantity log of the reciprocal of the probability mass as- as measuring how much randomness is in that particular sample little x. And we take the average of that for Shannon entropy. And in- in min entropy we say every sample should, um, uh, have a lot of randomness in it. If we want the min entropy to be large. No, this is, uh, you- all right. You can put- the other thing you can do is put the- a max often you'll see here. You could- you could move the min inside and put a max in the reciprocal, but it's- it's equivalent. So log is a monotone function. Yes? I don't think so. I think this is still the best that you can get from the one-wayness of, I see what you're saying. It's hard to know what to compare. I guess is the issue and this is also what's going to relate to when we talk about, um, uh, why we're moving to Shannon entropy. So in this particular case, all right, this is- this is a case of a regular one-way function. All the pre-image- if G is a permutation, let's say, all the pre-images have exactly the same size, okay? And so, um, uh, in this case, one might be able to say something about the unpredictability being in some sense being larger than n over 2 though I'm not- not sure exactly how to do that. But a bigger complication comes from the fact when we talk about general one-way functions that not all the pre-images have to be the same size. You may have some pre-images that are- that are very small, that are even of size one, you might have some that are very large. And so it's not even clear exactly what is the benchmark that we should be looking at. And that's another reason to move to talking about these more average case measures of entropy, like Shannon entropy is that it will help us talk about kind of in average what happens over the different, you know, possible pre-image sizes in the one-way function. Thanks. All right, good. So, all right, that's Shannon entropy. The Shannon entropy of x given y is the expectation over y of the Shannon entropy of x conditioned on y- on this particular value of y. One very nice property that Shannon entropy satisfies is a chain rule which we've already seen, but it satisfies an exact chain rule with no loss in- in both directions. Entropy of y plus the entropy of x given y. All right, so the entropy in- in a pair, you can write as the entropy of one element of the pair plus the entropy of the- the other element conditioned on the- the first one. All right, exactly in both- in both directions. And then another quantity we will use is divergence, is the expectation of w, w of the log of the probability of w over the probability. Okay, this is what's called relative entropy or Kohlback-Liebler divergence. Okay, and it's a way of measuring how similar are the distributions w and x, w, little w. Okay. All right, so if w and x are- are- are identically distributed, these ratios are always one and so you get a divergence of zero. And it turns out divergence is always non-negative. It's- it's zero if and only if w and x are identically distributed. And think of this as a- again, a measure of how similar the w and x are, but it's a- it's not a symmetric- it's not a- it's not a metric. So it's not symmetric between w and x. It doesn't satisfy triangle inequality, but it's a measure of similarity of distributions that- that plays very nicely with- with Shannon entropy. So for example, the divergence between a random variable and the uniform distribution. So let's say w is taking values, let's say these are both in- in- in zero one to the n and the uniform distribution on n is exactly n minus the entropy of w. So we know the unique distribution that has entropy n on n-bit strings is the uniform distribution and as your entropy decreases, your divergence from the uniform distribution increases. And let's compare this. So this is not- this is also related to things that- that you've seen in the previous lectures but maybe not in this language. One can define a worst case analog of- of divergence called, sometimes called max divergence or you can call it relative min entropy, where we take the maximum over w and the supportive w of this log of ratios of probabilities. All right. And I said this is familiar. What way is it familiar? I guess I'll use a little bit of this board for- for continuing. So this divergence is at most k is the same thing. I guess I'll use delta for divergence. If and only if w is 2 to the minus delta dense in x. All right. So who covered dense model theorems? Leo, was it you? Did you- does this terminology make sense? Saying one distribution is 2 to the minus del- dense in another. Okay. So that's- okay. So this is- you know, this is that you can obtain w from x by conditioning on an event of- of probability 2 to the minus delta. Okay. So small divergence you should think of- of this max divergence means that you can kind of think of w as a- as a- as a portion of x and if the divergence is small, it's a- it's a large portion of the- yes. So then the divergence can be- so divergence can be infinite. Yes. So I should have said that also same for the Shannon theoretic version. The divergence can be infinite. It can't be when- when we have the uniform distribution as the second one because that assigns non-zero probability to everything. All right. So again, you should think of- so you talk about dense model theorems, think of this as an average case version of- of measuring density, right? This divergence. Oh, okay. So all right. So one- one- one way is there's- there's an event, possibly probabilistic E of x such that w is identically distributed to x conditioned on- on this event, E of x and the probability of the event is at least 2 to the minus delta. Leo, you look bothered. No? No? Okay. I made a mistake somewhere. All right. So how are we going to use these notions to fix the problem? I mean to get something that better captures the hardness of inverting a one-way function. The claim is that for every efficient, so let me say poly-size adversary a, if we look at the divergence from f of x x to f of x a of x at this is going to be at least super logarithmic in n. So all right, let's- let's- let's parse this. So think of a as a kind of invert potential inverter for the one-way function, okay? So a- so what is a? So a is given an output is trying to produce some kind of- some- some kind of an input. The game that a is playing now is not the usual inversion game, but it's related to the usual inversion game. The game a is playing is trying to minimize the divergence. So let's see what does it mean when a achieves divergence zero? A achieving divergence zero means given an output, a can sample a uniformly random pre-image, okay? Clearly, if f is one-way, no efficient algorithm can succeed in that sampling a uniformly random pre-image, okay? And what this statement is saying is that if f is one-way, the- the success of- of any such adversary is not only- the failure of such ad- such an adversary is not only non-zero, meaning it can't achieve divergence zero, but must be significant. And significant is measured in- in bits as- as used in- in divergence, and the number of bits kind of measuring a's failure is at least super logarithmic, sort of corresponding to the- you know, same reason we had super logarithmic bounds on unpredictability entropy because the- this is kind of log of the inversion probability, all right? So let's actually prove that and I'll just use, like try to fit on this portion of the board. So we- this is a simple enough statement to prove. So the divergence from f of x, x to f of x, a of f of x is at least the divergence from- that we get if we apply, all right, any function i to both variables. So in general, this is for any function i, divergence has the property that if I apply some function here and the same function to this random variable, the divergence can't go up, it can only go down, right? And this is a natural property that you would expect from, you know, measure of similarity between probability- of- of difference between probability distributions. If you- if you apply the same function to both, they can't get- they can only get more similar, they can't get further apart, all right? This holds for every function i, but now we're going to pick a particular function i. We'll take i of y, x to be the indicator of whether x is a preimage of y, all right? So we'll take this- this function i, and now what do we get? Here, we have i will always be 1 when we look at f of x, x, because x is always a preimage of f of x. All right, so this is a- this is- this random variable. It's a Bernoulli random variable that's always 1, as you're- right? A bit that's always 1. And this random variable here, when- when is a of f of x a preimage of f of x, it's exactly when a succeeds in inverting the one-way function, all right? And so this is a Bernoulli random variable with a negligible probability of being 1, okay? And now we just apply the definition of divergence to Bernoulli random variables. So it's the expectation over the first random variable. So the first random variable is always 1 times log of the probability that the first random variable takes value 1, which is just probability Bernoulli 1 is 1. Over the probability that Bernoulli negligible is 1, which is negligible, all right? And that's little omega log n, questions? All right, so now you might be wondering, I mean, other than the fact that this is called relative entropy and we have some constraint about, you know, poly-time algorithms in this- in- in this statement here. In what sense is this giving us a kind of- a kind of pseudo entropy gap that we were looking for? I mean, this doesn't look like a- this is sort of a different, you know, divergence is- is a different and weirder notion. So that's given by the following theorem, which we won't- I think we won't prove, that characterizes hill- Shannon entropy in terms of a- a- a divergence- a- a hardness related to divergence that we have- have stated over there. So let me- so this is from- from work with my former student, Colin Jen. All right, so we have two jointly distributed random variables. Let's call them, let's say w and y are jointly distributed. There's again a security parameter n that's- that's hiding in all of this. And one thing that this theorem will require is that w is short, say logarithmic in the security parameter. All right, then the following statements are- are true- are equivalent. All right, the hill pseudo entropy of w given y, and by this I mean the hill- the- the Shannon version of hill entropy. So w is indistinguishable from some random variable that has actual- has large- actual Shannon entropy- Shannon entropy given y, not mean entropy, right? So the hill pseudo entropy of w given y is significantly larger than the- this statement's saying that the hill pseudo entropy is significantly larger than the actual entropy of w given y, all right? So this is getting to the- the question that you had, Udi, before is- we won't apply this exactly to x and f of x, but here what we're doing is doing a comparison between the pseudo entropy and the actual entropy, like- like you wanted to do. Ah, so yes. So this is an if- if and only if. This is for every delta and the other- the statement it's going to be equivalent to is in terms of divergence. So for every poly-sized adversary, A, the divergence from- right, it is yw since we're talking about yw, yes, yw to y, A of y is at least delta, all right? And there's some plus or minus negligible that should- should be in here. Maybe you can ignore- ignore that negligible thing. So there is a gap between pseudo entropy and actual entropy. Ah, if and only if the following task is hard. The task is given y, try to sample as well as you can from the conditional distribution of w given y, all right? So this is- you should think of this as kind of like a prediction game, right? You're trying to predict w from y, but here the- the- your- your task is not all- it's not to get the value of w exactly right but to try to get the right distribution of values, all right? And that allows you to say something meaningful even in the case when- when w has a lot of entropy given y. Like for example, when you had a one-way function and there might be many pre-images of y, the task of trying to get an almost uniform pre-image is a- is a more meaningful task than just saying, you know, find the particular pre-image w. Good. So we have lost something here, meaning this does not capture the full power of the one-way function, okay? But it'll still be good enough for us. And the point is that it is still retained somehow what's computationally- something that's computationally hard about the one-way function, unlike when we try to work with unpredictability entropy which where we could be stuck with only having the hardness coming for information theoretic reasons, okay? So this is a form, a variant if you may have heard- some people may have heard the term a distributional one-way function where really people did in the past in- in- in Paliazzo and- and Luby, and Paliazzo Levin and others looked at this notion of distributional one-way functions where the task of an adversary is exactly to try to sample an almost uniform pre-image. They measured the distance in statistical distance rather than divergence. And one of the fundamental results from that period of time, I don't remember which paper to attribute it to, is that the existence of one-way functions is equivalent to the existence of distributional one-way functions. So knowing that while we've lost something about this particular f, somehow it feels we haven't lost something in our complexity assumption that this kind of object we already knew was- was very related to ordinary one-way functions. All right, so let's go back to- to- to here. All right, so the first thing we might want to do is, you know, take- take w to be x and y to be f of x for our one-way function, okay? And- but unfortunately, we can't apply the theorem because in this case, w is not short. All right, w is long. It's n bits long. And in fact, the- so we know that the conclusion is true. We know that from what I said, what we showed over there, that the divergence for any efficient a is large, okay? But actually, this statement here is false. The- the hill entropy of x given f of x is no larger than the actual entropy of x given f of x, up to except for maybe negligible difference. And the reason is the same thing that we saw before when we talked about one-to-one one-way functions. You can efficiently test whether- whether, you know, x is a preimage of f of x, okay? And so if you apply that test, you will distinguish- so that- that test is always succeeds on the pair w, y, and you will distinguish w, y from any distribution that has higher entropy because how- how could you w- if you want to think of maximizing the entropy when w is constrained to be a preimage of f of x, the best thing you can do is for w to be a uniformly random preimage. And you can't get any higher entropy than that, okay? So in this case, all right, so this is- this- this equivalence is actually false for this because the divergence is omega log n, but the- the hill, pseudo-entropy is equal to the actual entropy, right? This is up to some negligible- this is just from the negligible, you know, indistinguishability in hill. All right, so what can we do? We have this condition holding, uh, but for when w is long. Well, one thing we can try to do is just break w into small pieces, okay? So, all right, so what I mean, all right, so I'm- the- the pieces could be logarithmic length, but for simplicity, let's- so let's consider the sequence of random variables. It's not just two. I have f of x, uniformly random input x, and then I look at the first bit of x and then the second bit and the third bit and so on, but I'm going to treat these each as- as separate random variables conditioned on the previous ones, all right? The claim is that for every poly-sized a, if I look at the sum of these kind of divergences, so f of x, x1 up to xi, to f of x, x1 up to xi minus 1, a of f of x, x1 to xi minus 1, that this sum is going to be at least super logarithmic in n. Okay, so what is- what is our adversary here trying to do? Previously it was trying- it was given f of x and it tries to sample all of x at once. Now I'm changing the game the adversary is playing and it's trying to sample one bit at a time. So it's given, like if you look at the i equals one term, we're comparing f of x, x1, just the first bit of x to f of x and a of f of x. So a is given f of x and tries to get the first bit of x and we measure how far is it from the right distribution, conditional distribution. Then it's given f of x and the first bit of x and it's trying to get the second bit of x and we measure how close is it again in divergence to the right conditional distribution. We sum this up from 1 to n and the claim is that the sum of these divergences has to be as big as the sum we- as the divergence that we saw when a was trying to produce the whole thing. Does anyone see how to show this? Using what we showed over there as a black box. Suppose for contradiction that I have some poly size, poly time algorithm that gets small divergence here. It's useful to think divergence zero. So given f of x, it samples the first bit of x from exactly the right conditional distribution. Then given f of x and the first bit of x, it samples the next bit of x from exactly the right distribution. Yeah, so you just repeat. You take this adversary, the adversary that wins in this game and you repeat it. It starts with f of x. Here's where we're going to get an a prime. So it's given some y that it wants to invert. It'll apply a to get a candidate for the first bit of x. Now it has this pair, y and a first bit. It'll now apply- will apply a again to get a second bit. And now we apply a again to these three values to get a- to sample the third bit. And eventually we sample n bits of a potential preimage. So this is our- this is an a prime. And one can show that the divergence that's achieved by this a prime constructed from a in this way to the right distribution of preimages is exactly the sum of the divergences that's achieved by the a that's predicting individual bits. And that's by a chain rule for divergence, which is kind of similar to the like Shannon entropy notions have these very nice and exact chain rules. There's a chain rule that we can- similarly a chain rule for divergence that we can apply here to show the divergence between f of x, x and f of x, a prime of f of x is exactly equal to the sum above the sum in the claim. I won't do the calculations a little bit just messy writing down the notation only for- for the divergence chain rule. But hopefully intuitively it makes sense, right? If it's getting approximately the right distribution in each step, overall the- the deviation should- you would intuitively should sum up. Questions? So when we construct a prime in this way, from a, we're using a as a bit by bit predictor, we construct a- an inverter from a prime from a in this way. Yeah, the- the claim is there's an equality between the divergence that- that a prime achieves on the full tuple of- of the preimage and the sum of the divergence is achieved by the- the bit predictor a. So this is good because now the- the individual terms in this sum are talking about the adversary predicting things that are short. In this case, one bit, you could do logarithmically many bits if you want. And so the theorem applies. All right? So what is the theorem? We see that if we sum from i going from 1 to n, hill pseudoentropy of, let's see, the ith bit given f of x, x1 up to xi minus 1, this is going to be- so just term by term, I'm applying the theorem. The sum of the actual entropies, right, outside the sum, there's going to be a plus little omega of log n. Okay? So I'm- I'm skipping- I'm being a little informal here, but the way to think about it is there's some delta i that's associated with each term here. How much divergence and- an efficient adversary can achieve? Okay? And we know the sum of those delta i's is at least omega of log n. And the theorem tells you that for each term, the hill pseudoentropy of the ith bit given the- given the previous things is at least the actual entropy plus this delta i, the divergence corresponding to the- the ith term here. Yes? What do you mean learn anything more? Oh, okay, good. So in this definition, it is actually- so the- the quantity that's in here in- in this sum is the actual first i minus 1 bits, not the ones that it itself produced. Okay? We're given it the actual f of x and the first i minus 1 bits. Okay? Nevertheless, the way that- I didn't go through the- the proof of the statement, when you- when you apply the chain rule for divergence, you still get the sum of these quantities that refer to the actual prefix, even though the- the way a prime is getting the prefix is by applying the single bit predictor. Okay? And intuitively, that's because any difference between x1 and the- the distribution of x1 produced by- by a and the actual distribution of x1 has already been accounted for in the divergence of x1 of the- of- in the first term. So we shouldn't have to pay twice for it intuitively. Now, if you were doing this with statistical distance, you would just apply a- the way to think about it is you do a hybrid argument. Okay? This is kind of like a hybrid argument, but with divergence. We're- we're- hybrids are, you know, the first i minus 1 bits are coming from the right distribution and the- the rest is coming from A. That may just confuse things. All right. Any questions? Yeah? I want to fire the issue because maybe the sum is bigger than omega-log m, but for each A it's different. Yeah. Okay. So this is- so Hugo asked, why am I working with non-uniform model earlier? Part of it was to allow me to be sloppy on- on such issues. So since in a- in a non-uniform setting, having a single- here I talked about a single A, is the same thing as I allow a different AI for each term. Because if I have, again, with the same size bound, but if I have a, you know, an n squared size A for the first bit and then one for the second bit and one for the third bit, I can combine these n A's together. I could use different A's here also. And so in the non-uniform case, you can really make sense of- of each term separately. But I am- I agree, I'm being a little bit sloppy on that. All right. Good. All right. So now let's just rephrase what we have here. So let's look at the following sequence of random variables, x0, I want to include f of x in there, x1, xn. I have these n plus 1 random variables. And the claim is that this sequence has next bit, well, I'm defining this, pseudo entropy, at least n plus omega log n. All right. So what is- all right. So what's going on here? So actually I should have first pointed out this sum here that when we sum up the actual conditional entropies, we can apply the chain rule for Shannon entropy. And this is exactly the Shannon entropy of f of x- of x given f of x. And so one- again, since- so for a one-way function, we don't know what the entropy of the x given f of x is. It might be 0 if the one function is- is 1 to 1. It could be n over 2 like in the example that we have had- had over there. And so one thing- what I'm doing in this next step, in addition to kind of introducing a definition that I'm going to write down, one thing I'm doing is adding f of x to my sequence of random variables. So that I- I add to this a plus the entropy of f of x and I get something which is a quantity that I know which is n. Because the entropy of x- of f of x plus the entropy of x given f of x is exactly n. All right, so what is this definition? There is a sequence of random variables y0, y1, yn. These are correlated with the x sequence of random variables, such that- so what I'm saying here is just a restatement of- of hill of what- of- of what hill entropy. So hill entropy says there's some high entropy random variable, yi, that's computationally indistinguishable from xi given the previous bits. There should be an x0 in here. And the second is that the sum from i going from 0 to n of the entropy of yi given x1 to xi minus 1 is at least n plus little omega of log n. Okay, so this is really just expanding- in addition to throwing f of x in there as x0, it's expanding the definition of hill- hill pseudo entropy. So the- I want to say the sum, we know the sum of the hill pseudo entropies now that I've added f of x as the- as- as x0 is n plus little omega of log n because that's what we- that's what we- we had here. And- and hill pseudo entropy says that, you know, this random variable yi is indistinguishable from xi given the prefix. Okay, so let's sort of step back. This is like a relaxation of hill pseudo entropy. Hill pseudo entropy normally talks about- would talk about the random variable in its entirety being indistinguishable from having high entropy. And here what we're saying is that x, the sequence of random variables looks like it has high- is indistinguishable from having high entropy to an adversary that's- that's, uh, getting the- the bits or the blocks here. I guess the first thing is a- is a- is a string. Getting these pieces, uh, one at a time. So I- I- uh, when I have just x0, x1 looks like it has some- some high entropy and I measure that. Now when I have x0 and x1, x2 looks like it has some- some possibly higher entropy than its actual conditional entropy and so on. Okay? But it's only kind of in time. Once I've seen future bits, that pseudo entropy may disappear. In the- in particular, if I see the whole sequence at once, you know, x, the- the- the- in the whole sequence at once, f of x, x, this thing doesn't have pseudo- hill pseudo entropy bigger than n. Because again, I can- you know, everything is determined by the last n bits and I can check that the first part comes from the- is obtained by applying f to the last n bits. So this gap really comes because we're thinking of this adversary that just sees the bits in order. Yes? Xs. Xs. So the- the game that's being played is, you know, the- again, it's sort of analogous to what- what- what- what happened here. We- we see a prefix of the actual Xs and now we ask, how much randomness does it look like the next bit has? I don't know what happens. I haven't thought too much. If you put y's in here, oh yeah. So I guess if we put the y's there, it would be like talking about the hill pseudo entropy of the whole thing. Because the first condition will tell you that the- I guess the sequence of y's is in this- no, I'm- I'm not sure. I'm not sure what happens. It's- it's worth some thought, but this- this is the definition that- that- that follows from- from what we've shown and is- turns out to be useful in the- in the next steps of the construction. All right. So why should we feel optimistic that this weaker notion of hill pseudo entropy will still be useful in constructing pseudo-random generators where, you know, the adversary is only getting to look one bit at a time? So one reason if you're familiar with the equivalence of pseudo-randomness and next bit unpredictability, one of the sort of basic facts about pseudo-randomness, we know that, I mean, those two are- in our end goal, it's equivalent to talk about pseudo-randomness of- of the entire sequence and pseudo-randomness kind of for an adversary that sees the bits one at a time and is trying to, you know, predict them or distinguish them as it goes. And so the fact that we have this hill pseudo entropy only in this next bit sense, I- fits well with that, you know, next bit characterization of pseudo-randomness. All right. So what are the problems in applying this? There's two difficulties in trying to- three difficulties, maybe. All right. So here we do have a gap between actual entropy, which is n, and this- this notion of pseudo-entropy, which is good. And so now, you know, one thing we- we did before is we say, well, when we have a gap between hill entropy and actual entropy, in fact, n is the seed length here. We have a gap between the pseudo-entropy in this next bit sense and the seed length. We used n bits to generate the sequence x1 to xn. So what we'd like to do is say, oh, well, we have high hill pseudo-entropy. Let's apply an extractor. All right. So- all right. So the- the difficulties, so one is that the entropy gap in individual bit is going to be small. Certainly it's going to be at most one bit, right? Any- any one bit yi is going to give you at most one bit entropy gap. But actually all we know is that in total our entropy gap is like something like log n. So on average, the- the amount of pseudo-entropy contributed by the individual bits is something like log n over n. Okay, so, uh, but this problem we know how to- how to handle. It's the same problem we ran into before. We can increase entropy gaps by taking repetitions, right? So instead of just working with one sample, we can take many samples and combine the blocks and we'll- you know, if we take, you know, more than n copies, we'll get a larger- a larger entropy gap on the individual blocks. The second difficulty is that extractors, uh, work for min entropy and they don't in general work for Shannon entropy because Shannon entropy is only measuring how much randomness there is on average in the random variable, right? And so it could be that with probability of half you have very little entropy and then with probability of half you have a lot of entropy and on average that gives you- gives you a significant amount of Shannon entropy but you're not going to be able to extract something close to uniform from- from that because with probability of half, you don't have entropy to- to- to extract from. All right. So it turns out that repetitions also solve this problem. And it's a standard fact in informat- that's used all over the place in information theory. Um, in- in information theory it's sometimes referred to as the asymptotic equi-partition theorem or something complicated like that. But basically it says that taking repetitions will turn kind of worst- average entropy on average as in Shannon entropy into kind of worst case measures of entropy like min entropy with some loss. So I- I- I think I have time. I'll- I'll go into more detail on- on this step. Uh, no. So, uh, um, I mean, so again let's think of this example. I mean, you can apply Markov's inequality to Shannon entropy, but it won't give you something strong enough for our- our- our goal. Uh, I guess it kind of goes in the wrong direction actually. So, uh, again think of a random variable with probability of half it has, uh, uh, uh, like n over two bits of- of entropy and with probability half of conditional entropy and probability half it has zero conditional entropy. Okay. So with probability half you have- you have nothing to extract pseudo-randomness from, right? It could be completely determined and easily predictable with probability half and we need to get something. We want to get something that's indistinguishable from uniform. All right. And then the third difficulty is that we don't know how the entropy is divided among the different blocks, right? We know in total what they sum to and what their average is, but we don't know is it, you know, maybe the- the first half of the blocks have all of the- the- the extra, you know, the- the pseudo-entropy here. Maybe it's the second half. Maybe all of- maybe it's spread out evenly among all of them. And when we apply an extractor, we need to know how much to extract, right? And we- we're going to do our extraction kind of one block at a time on- on- on each block because our- our pseudo-entropy only holds at the- at the- at the level of individual blocks. It doesn't hold globally. We can't apply the extractor to- to everything at once. Okay. So let's do racer. So let's solve the first two difficulties, which I said are solved by- by repetitions. All right. I'm going to state a- a simpler version of- of, uh, what we- uh, what we need, but it should give you the- the idea. So let's say I have X, I call it Z. Z is a random variable taking values 0, 1 to the n. Then- and so, uh, Z to the t is, uh, t independent samples of Z. Then the claim is that, you know, for every epsilon, Z to the t is epsilon close to a random variable, uh, Z tilde, uh, with the main entropy of Z tilde is at least t times the Shannon entropy of Z minus order, uh, square root of t log 1 over epsilon times n. Okay. So when I take t independent samples of a random variable, the Shannon entropy increases exactly by a factor of t. Um, and that follows from the chain rule. When X and Y are independent, then the entropy of X given Y is just the- is the entropy of X and- and so the entropy is just some. But the- what I want to say is not only does the- is the Shannon entropy large by a- larger by a factor of t when I take t independent samples, but actually, uh, effectively the main entropy is large. Okay. But it won't be exactly- it won't be exactly, uh, Z- the main entropy of Z to the t that I talk about, but I talk about some random variable that's close. This is close in- in statistical distance. Some random variable that's close to Z- close to Z to the t. All right. Up to a epsilon distinguishing advantage. All right. And how much do we lose? Um, the- the- the main point is that what we lose is only like square root of t and not linear in t. So if we take t big enough, this term dominates and this one becomes less significant. Okay. And there's something that you should lose. Intuitively it makes sense that you- you know, how close you want to be to, uh, the two random variables to- to be, you might lose something based on that. And then it turns out you'll also lose something based on the- kind of the domain in which these random variables live. Okay. But think of taking, uh, you know, for- for this term to dominate, well, I guess we'll do the calculation- calculation later. But we're thinking- thinking of taking t to be like a large polynomial in n, then this factor of n will get swamped by the relationship between t and square root of t. t is bigger than like n squared or something like that. And the epsilon, we just need epsilon to be negligible. So this thing we- we just need to take it to be super logarithmic in n, the log 1 over epsilon. So this is a- a basic enough fact that it's worth just seeing why it's true for- for those who haven't- haven't seen it. Um, all right, let's look at a fixed tuple of- of, uh, say, uh, output z- little z1 up to little zt and look at what- what is the log of the probability mass or the reciprocal? All right, this is a measure again of how much entropy there is in this particular sample. By independence, this- this- this probability breaks into a product of probabilities of probability that z is z1, z is z2, and so on. And then since I take a logarithm, the product becomes a sum, just z. So now let's consider what happens when we consider, um, so when- when we take- when we take a- a- a random tuple little z1 up to little zt and look at this random variable. It's a function of the sample little z1 up to little zt that we took. In expectation, the expectation, um, of this quantity, well, it's the sum of the expectations. The expectation of one term here is exactly t times the entropy of z, right? The- the definition of the entropy of z is the expectation of this log reciprocal probability mass when little z is sampled, uh, according to the distribution capital Z. All right? So that's the expectation, but now this is a sum of t independent random variables. And so we can apply, uh, turn off type bounds to say not only do we understand its expectation, but it's actually going to be very tightly concentrated around its expectation. And so what does it mean? It's very tightly concentrated. It says when I take a random sample from the- a random t tuple, uh, with very high probability, the- the amount of randomness in that sample is measured by this log of reciprocal probability mass will be close to t times entropy of z, all right? And that- with very high probability, the failure- that's our epsilon, the- the- the failure probability, all right? So the other things we need to- to modify, um, in order to ensure that we really get min entropy, which requires kind of that you have a lot high- high entropy always with probability 1. And here we're just saying with probability 1 minus epsilon, we have high entropy. Okay, so turn off type bounds. I won't go through the calculation, but this root t log 1 over epsilon is something you- you, uh, always pay in turn off type bounds. Um, and this n, uh, comes from the fact- from the range of the random variables. So these- these are not random variables that are bounded between 0 and 1. Okay, but we expect that generally speaking, these random variables are going to be between 0 and n. Um, things that have probability mass much smaller than 2 to the minus n are very unlikely to occur on a re- you know, when- in our- when we have a random variable taking values in 0, 1 to the n. Okay, and in fact, in our application, you can just- it's- it's directly the case. You know, our- our random variables are coming from evaluating the one-way function. There's only, you know, that's determined by- by your n bits, so the probability masses are all at least 2 to the minus n, all right? So this- this quantity is what, uh, when you scale up a turn off bound to- to- to random variables with a larger range, you pay the size of the range in your- in your deviation. Yes? Ah, okay. So this is a real number, right? This thing is a real number, okay? And I'm claiming that this real number is between- is always between 0 and n. Or, yeah, yeah. No, sorry, yeah, the notation is confusing. Oh, the- so basically what happens here? I mean, um, all right, so all the entropy measures, like there's a- this, uh, uh, you know, Min entropy and Shannon entropy are two points on the kind of spectrum of Rennie entropies, uh, there's also, you know, I don't know if anyone talked about 2 and H2 at all in any of the- they- they did, okay. So, um, you know, Min entropy is the smallest of these Rennie- Rennie entropy measures. Shannon entropy is the- is the- well, I guess there is max entropy. Basically, all the Rennie entropies get concentrated around t times the- times the entropy of z. Ah, computational ones. Um, ah, okay, good. So, uh, we want to apply it to this computational notion of entropy here. How does that apply to this computational notion of entropy? When we take many samples, what we're saying is that the- the entropy of the- of the y's, if we take many samples of these- these y random variables will- will be, uh, will be large, the Min entropy of those, okay. Um, and then that many copies of the y's are going to be indistinguishable in this next bit sense from many copies of the x's. So, um, all right. So the- the reasoning is it applies to hill type entropies, okay. So that was for the information theoretic entropy and applies to hill type entropies because hill type entropies, you're saying you're, you know, one sample of- of x is indistinguishable from this high entropy y. So many samples of x is indistinguishable from many samples of y. And now if many samples of y has high Min entropy, we've seen then that many samples of x is indistinguishable from having high Min entropy. I don't know about any of the other entropy measures. If you look at, uh, that were discussed, I'm not sure. Uh, it certainly doesn't follow in a- in a- as easy, a black box way, um, as what we did here. Thanks. Okay. So that solves the- the, uh, first two problems. The third problem, there is a- I guess I- I'm not going to go into the- I think the- the calculations of how many copies you need to take so that we- you want that the loss that we're getting from this doesn't, uh, doesn't overwhelm the entropy gap that we have. Maybe I won't go through the- the calculations. Um, the, uh, the third problem, there's another kind of simple trick. Oh, you can't see- I'm going to run out of space if I maybe use this board. All right. So let me just, uh, say what we just did. All right. So, um, all right. So we take many samples of our- of our next-bit pseudo-entropy generator, right? That generates a sequence that has next-bit pseudo-entropy. And what- what we just showed is that the- the pseudo-entropy Shannon, the- the pseudo-entropy in the first block, um, grows by roughly a factor of t and turns from Shannon entropy into min entropy. Similarly, the pseudo-entropy in the second block conditioned on the- the sequence of first blocks, um, by a conditional version of what we- what we showed here, um, uh, becomes, uh, grows by a factor of t and, uh, again becomes min entropy rather than Shannon entropy and so on for all- all of the blocks. Okay. So in total, um, the sum of the- of the hill pseudo-entropies becomes like t times n plus little omega of log n minus this order root t log 1 over epsilon n, because n's the sum of the- the lengths there. All right. And if you take t big enough, this win from the little omega of log n dominates this, uh, the loss that we have. Um, and so the total amount of pseudo-min entropy, if I sum them up in the blocks conditioned on what's- what's before, is larger than t times n, which is the- the seed length that I use to generate this. And so what you can do then is apply an extractor to each of these. And it turns out you can use this- the- the same seed on- for all the extractions, but it's a minor- minor point and I can get pseudo-random bits out. All right. So that's what the construction looks like, except we don't know again how much to extract from the first column, how much to extract from the second column, how much to extract from the third column. So the- the trick there is to also do repetitions that are going horizontally. All right. So I- I take a sample and another sample and another sample, all right? And I do that in each of these rows. And then the other thing that I do in the rows is I randomly shift them. Okay. I'm just giving you a vague idea so that in each column, I'm not getting the same block every time, but I get, you know, with 1 over n plus 1 probability, I get f of x in that- in that- in that column. With probability 1 over n plus 1, I get the first bit, probability 1 over n plus 1, I get the second bit, so on. And that way in each column, the Shannon entropy is the average of the Shannon entropies of the individual bits as opposed to always, you know, this always being x1. You could do that. You could do that actually. The reason we do- I- yeah, actually I don't know why we do a random shift. You're right. What you- what you say should work also. That I don't know. That I don't know. I mean the- the- yes, the- you can reason about the- the entropies. And the entropies are- the conditional entropies are sensitive to the order in which you- you look at the bits. Okay. So I think we're out of time. I just wanted to give you a- a taste. As you can see, the final construction of pseudo-random generators from one-way functions is not so complicated. This is- this is the whole construction. You- you evaluate your one-way function on many random inputs. You put them together by taking some repetitions in rows and columns, and then you just apply an extractor, and this can be any poly-time extractor. It doesn't need to be a reconstructive extractor to each of the columns, because we've already gotten to through this sequence of steps to hill pseudo-entropy. So we know what we do with this random shift tech- trick, or this cleaner thing that Udi suggested where we- where we, we- we just do fixed shifts. We shift over by one, shift over by two, and so on. We have to throw away what happens at the end because you don't have everything represented. But in most columns you have- you'll have the same number of first bits, second bits, third bits, and we know what the sum of the entropies is. And so that tells you how much to extract. Every- every column is now going to contribute the av- the average amount of pseudo-entropy. Yes, in the paper, so this is all done concretely in- in the- in the paper. So this- this construction is from I guess combining two things. So- so this is a work with Iftok-Hatner and Omer Reingold, and then the paper, which I mentioned with my student Colin. And in the papers we do things in a concrete security way with calculating the security loss and so on. I won't be able to reconstruct for you exactly what the bounds- bounds are. But the bigger- I mean, the big open problem so- so with all of this, um, the seed length, the best known seed length by optimizing this- this approach, we know how to get is something like n cubed, or more generally, uh, this is in the setting of polynomial security. This is something like n cubed over log squared of the security. So security of your one-way function. Um, so from a one-way function on n input bits. All right. So if your one-way function is exponentially hard, we actually get a nearly- you can get a nearly linear seed length. Um, but if you're only polynomially hard, you get a- because this- this log squared of security is like, you know, like n squared and from, you know, only polynomial security one-way function, you get an n cubed seed length. And the number of invocations of the one-way function is- is similar. It's actually the same kind of, uh, bounds. The number of times the generator needs to evaluate the one-way function. And it's an open problem whether- whether these losses are necessary, whether it's necessary to have a seed length that's super linear and n or to invoke the one-way function, um, uh, many times. I guess invoking the one-way function, there is a lower bound of Hollenstein and Sinha that says that we need at least a linear number of invocations of the one-way function in the case of polynomial security or, in general, n over log of the security. Um, but that's still very far. The- that- that's the only lower bound of that type. Um, we have no lower bound, non-trivial lower bounds on the seed length that a black box construction of pseudo-random generators from one-way function can achieve. Um, and, uh, and nothing better than that linear lower bound on the number of invocations. Um, so I think these are, uh, I mean, I think these are, yeah, interesting- very interesting open questions, certainly from a foundation's point of view. And we have no reason to- to believe that there couldn't be- I mean, this is a- this is already a lot better than, you know, the classic constructions of Hill, which adds something like n to the eighth, uh, seed length. But we have no reason to believe there couldn't be a really efficient construction of pseudo-random generators from one-way- one-way functions, where the- the seed length is, you know, 2n. One-way function on n bits and I get a pseudo-random generator with a 2n seed length. I- I've no- I- I don't believe it's the case, but I have no idea how to rule it out. I mean, there would be some- yeah, I don't know. I don't know. I think just as a- I mean, as a- as a cryptographic construction, uh, it's- it's of- of interest. I don't know any one killer application that really- really needs that. Yeah. Yeah. I, um, in the original paper, uh, with, uh, with Iftok and Omer, we used Goldreich-Levin to get this next bit pseudo-entropy. All right. And then- then later with- in the paper with Colin, we realized that actually the one-way function already has the next bit pseudo-entropy without doing any Goldreich-Levin's or Reconstructive Extractors in- in here. Um, so I don't know. I- I- it's a good question. Is there- can- can- can this benefit from more powerful tools like that? Maybe. Any other questions? All right. I think we have a break now. Um, and we can take- I- I think the- the- what I plan to do in the second- second lecture is, uh, less technical. So maybe we should give people the full half hour. All right. Let's do a full- so was it 11.38? We will resume.