 So a couple of things I wanted to mention that I realized I didn't say before the break. Um, I defined this notion of conditional computational entropy, right? That is conditioning on a random variable z in general. And then I did the application to dp-Hellman, which was again conditioning on a random variable. Um, what does it mean that x changes and not z? It means that, right, for every value, so you have a- let's see where's my erase. We're not going to need this stuff anymore, so let's erase. Um, yeah, it should be why I already corrected it on one slide, but not on the second copy. Thanks for- yeah, see on this copy it got corrected. That's- computers make it really easy to multiply mistakes. So, okay. Uh, so what I'm saying is how do you think of this, uh, kind of x comma z, y comma z? You have a distribution, right? So you have a joint distribution, you know, x and z. And you can view that for every z you have, you know, x sub z for every little z, right? You have x sub z here and you have, you know, x, uh, xz here and so on. So what I'm saying when I say that y, you know, x changes but z does not. You're allowed to change. What I mean is that you keep this distribution and then in each column, you have a different, right, the indistinguishable version still has z here, but now in each column has a y sub z, right? For every value of z you have a different y sub z. That's- that's the way to think about it. And I'm not saying that they're indistinguishable column wise. I'm not saying that every column is indistinguishable from every column. That would be really, really weak because for example, a column in this case corresponds to g to the a, g to the b. Once you fix a column, the adversary knows a and b because it's fixed. There's no randomness anymore. You're not going to- you're not going to get anything once the adversary knows a and b. So this indistinguishability is really of the joint distribution xz and yz. Makes sense? So it's important to notice this difference. Okay. So that's one thing I wanted to mention. The other thing I wanted to mention is I led you through this definition. I talked about how it's really cool for something like Diffie-Hellman where we don't know the subgroup, but we don't need y to be constructive. But then I gave this theorem statement which is missing, let's see. This theorem statement, both for the min entropy and the computational version of it, for now is not about leaking a random variable. It's about conditioning on an event. So you didn't actually need that previous definition that I did on the previous slide in order to understand this slide. When we condition on an event, we just have a distribution x that happened to be conditioned on a distribution z. Think of it as x sub z, x sub z depending on. So there's no average going on here. It's a fixed random event. We will get to average eventually, but for now we're still on a fixed. So this xz needs to be indistinguishable from yz and that's it. There's no averaging over z here. So that's the thing I'm going to prove first and then I'm going to average over z, which is just going to be a silly technical thing we're not even going to bother because it's just writing some equations. But this is where the meat of the argument is going to be. So I want to prove this theorem for computational entropy. Notice that it's, as I said, worse in a couple of ways. The quality of the entropy degrades, not just the quantity. It's now the delta goes up. So it's easier to distinguish now. Also, I don't get to write an x on the left. There's h in, I guess I have a laser even. Wow, okay. See the z here? Not here. I don't know how to put it in. I don't know how to prove it with that z. That would be nice because the way we analyze the fuzzy extractor construction is that you put in some randomness, you take out some randomness. We said put in is you get something and I don't know how to do that here in general. I have some specific cases where I can do some but not in general. So it would be nice to do it, we don't know how to do it. All right, so that's two ways in which it's worse. Now comes kind of an even more annoying way in which it's worse, which is that this is, okay, so, right. Actually, let me before I explain one way in which it's worse, demystify the terms dense model theorem. Because people may have heard this term and they don't know what's dense and what's the model. You know what a theorem is but you don't know what's dense and what's the model. Okay, so here's an idea. Imagine that z, the probability of z is relatively big. It's not a very unlikely event. So that we can view x conditioned on z as relatively dense in the entire space of x. Okay, so let's say you had a pseudo random generator and z is the event that the first bit of the seed is one. It's a relatively likely event, the density is half. What's a good model for a pseudo random string? A truly random string is a good model for a pseudo random one. So that's the word model, and the word dense is, we now condition on the event that the first bit of the seed is equal to one. Our pseudo random generator, our coin flipper was stuck when generating the first bit of the seed and it gave us one and everybody knew it gave us one. That's the event on which we're conditioning. So now we have a dense thing inside a set whose good model is a random set. So it's a pseudo random set, set of all sort of random strings. We have a dense subset of it, which is all the strings that could be generated with the seed equal to one. The good model for the pseudo random set was a truly random set. What about a good model for this subset, this dense subset? Is there a good model for that? And the theorem says yes, there is. It says, right, if the pseudo random set had a really good model, which was truly random, then this dense set has a good model inside that truly random set also. It's, you know, you just lose one bit of entropy. So this dense, you know, half of a random set is going to be a good model for the half of the pseudo random set. That's what it's saying. So z, probability of z is equal to a half. We have half of a pseudo random set. Then half of a random set is going to be a good model for it. It's not going to be quite as good a model because you're going to, you know, you're going to lose in delta. The distinguisher is going to be more likely to tell that it's a model and not the real thing. So that's the term dense model theorem. It originated in a totally different application about to number theory about the frequent, the density of primes and arithmetic progressions. But I'm not prepared at all to talk about it. I don't know it. So but there, there's also an issue of model and an issue of distinguisher and an issue of density that come up. Okay. So there are many, many variants of, of this theorem with slightly different proofs. I'm going to do my, my, my favorite proof, of course, because I think it's modular and simple, but that's probably not the best proof at this point in terms of parameters. Okay. And as I said, this very clean statement is actually not for Hill entropy. So now I have to define more notions of entropy. Part of the kind of discussion before this event was how many notions of entropy do we want to define in a week? So we're going to define quite a few, but not all, not, not all that we know. So this is for some kind of notion of entropy that's called metric star. And my goal for the next five minutes is to explain what the heck that is, okay? And then the second theorem we're going to prove is that you can go from metric star to Hill. Hill is the one we know and love. It's indistinguishability from, you know, from high, high distribution. So we can try to prove these two theorems. Okay. So what is this, this metric star thing? Let's remember Hill. I just rewrote, you have Hill up there and now you have Hill down here. It's the same definition. I'm just going to write it a little bit more explicitly. It says there exists a distribution of high entropy, such that for all circuits, these are distinguishing circuits, the expected value of this circuit is within the range of the expected value of the circuit on the truly random thing plus or minus delta. So that's different. So the expected value of this circuit on a sample from x is roughly the same as an expected value of the circuit in a sample from y within delta of it. That's just unwrapping the definition of indistinguishability. Okay, so what is this metric thing? Metric switches the quantifiers. It says, it's not that there exists a single awesome distribution that is completely indistinguishable. But if you give me a distinguisher, I will fool that distinguisher with a distribution. Give me another distinguisher, I'll fool that one. Maybe all with different distributions. It's not a single one that fools everybody. Notice that this actually is kind of a good enough definition, because if you think about an extraction argument, let's say you have an adversary against, I don't know, a signature scheme. Yeah, same question. I haven't put the star on yet. The star is coming next. Okay, so this is just metric. So remember we had this argument that heel entropy is good enough. Whenever you want to use an extractor to extract, you could extract from heel instead of from true entropy. You'd still be okay, because indistinguishability will save you. The same argument actually works for metric. So you don't really need a heel. Metric is good enough. If you have an adversary against, I don't know, your signature scheme, okay? And you generated your signature key by applying an extractor to something that had metric entropy. The process was, right, let's just drive. I'm going to worst colors progressively. There's the blue, okay? So you had a metric extractor key signature. Let's say you're trying to prove the security of this chain of events. Right, so the signature was designed for uniform key, and now you're doing this crazy thing. Why is this good enough? Kind of by the same argument. Suppose you have an adversary for a signature scheme. Fix it. For that adversary, there exists why that fools it, yeah? Because there exists why that the adversary cannot tell apart, we could as well be extracting from y here. If we're extracting from y, we'll get a uniform key, almost uniform key. Therefore, the signature will be secure. So this quantifier switch does not take away from the application. Because for every fixed adversary, you will be able to extract, and that's good enough. Even though for different adversaries, you're kind of extracting from different y's, that's okay, every adversary has something that fools it. From every adversary's point of view, you actually have a lot of entropy. For different reasons, kind of, right? For some adversaries, it's one y, for another adversary, it's a different y. But for every adversary, there is some y that fools it. That y is high entropy, from that y we can extract. Therefore, no particular adversary can win. So metric is good enough for many, many applications, actually. Let's see. Feels non-black box like, but I can't, right? The extractor works for all ys. So I'm trying to figure out if it's, so I'm not. So I think whether it's black box or not depends on actually how you prove this metric thing. This argument only needs the existence of y. Once you've proven metric, then you kind of, I don't think there's any non-black boxness here, but this definition feels non-black box. So I think it depends on how you prove that something has metric entropy. Once it has metric entropy, I think the argument is black box. That's my intuition, but I have to think about it more. Okay, so that's just what we said, that this thing is still good enough for extraction because for every adversary, we can fool it with some distribution. Here is the kind of annoying, here's an annoying thing about, about metric star which is what makes our life a lot worse. It's not the metric, it's the star that makes it worse. We're only going to allow deterministic distinguishers. So normal distinguishers can do coin flips, right? Now we're only going to allow deterministic to compensate. We will allow them to output a value in the range zero one instead of just zero or one, which is your classical distinguisher. Okay, so our distinguishers can now say, I can't flip coins, but if I could, I would output one with probability 0.3. And the way it's going to do that is just output 0.3. It's not equivalent because, okay, good. So this is a very good question, is it equivalent, right? If you have a distinguisher that with some probability output zero, with some probability outputs one, you can't force it to say, tell me what the probability of zero or one is. This distinguisher is allowed to only output probability of zero or one. It has to kind of know it. Once you know, if you know the probability, you can output it. But it's right, it's a smaller class of distinguishers, so the entropy is not as strong because it's only secure against smaller class of distinguishers, those who know their own probabilities. Make sense? So if you have some distinguishers, some does output zero, some doesn't know how it works, you can't, you can't win. On the other hand, right, if you have a distinguisher that knows its own probability and then suddenly you allow it to be probabilistic, of course, it'll just flip the coins at the end, right? It'll say, I want to output one with probability 0.3. Look, I was just given the freedom to flip a coin. Okay, I'll flip a coin and I'll do 0.3. So this is a smaller class of distinguishers. So if you think about, let's see, do I even, no, I think. Yeah, we're not going to do slides for a while. So let's, let's turn this off. Let's draw some pictures and try to use the board. Yeah, yes, yes, thanks to, I was going to mention it, thanks to Piotrk and Skorsky. There is a separation of these two things. Okay, so I'm going to turn off the slides. Can we turn off the lights? Because I'm going to just be using boards for a while, thanks. So there's suddenly a lot of definitions of entropy floating around, right? So let's try, let's try to, let's try to make it a little bit more understandable. So we have distinguishers and we now have kind of, they can be Boolean or, I don't know, continuous, whatever you want to call that. And they can be deterministic or randomized. That's the distinguisher classes. We've got four distinguisher classes. And now we also have to decide whether we're going to have a single distribution that works for all distinguishers or for every distinguisher, you know, for every distinguisher there exists y that works for that specific distinguisher. That's eight entropy notions. You could see that we could spend a long time on every one of them and that would be an incredibly boring thing I think. So we're not going to. The observation is, so this is called metric. This is the hill notion. All four of these quadrants are equivalent for hill. So we collapse those four notions, at least, and that's a nice feature, right? So for hill, it doesn't matter whether you're deterministic or randomized. Intuitively for the following reason, there exists a y that fools all distinguishers. If it fools, you know, if it fools deterministic ones, it also fools randomized because it fools it for every single value of the randomness. Make sense? Hardware of the randomness, it fools it. So that's okay. This trick does not work here because for every value of the randomness, you may have a different distinguisher. So a different distribution, I mean. So here it matters. You can't hardwire the randomness. You'll get a different distribution for every value of them. But here randomized deterministic doesn't matter. And because it doesn't matter whether it's randomized deterministic, if somebody outputs a value in zero one, you can make it randomized by simply outputting one with the probability of that value. And you'll get that the expectation is the same, right? What we're measuring is always the expectation of d of x versus expectation of d of y. And you can convert, if you allow randomization, you can always convert from this to here. By just flipping a coin with the right probability. And once you convert to here, you can convert a deterministic in hill. So in hill, these all four are equivalent. I'm not stating these formal theorems and proving them because we just fill up boards with symbols. I can give you a reference if you want to see the symbols. But intuition is there, that's right. You can convert this to randomized and from randomized to deterministic. So for hill, these are all the same, all same for hill. So for the other one, so for metric, you mean? Okay, good, so let's see. Because you can first convert it to deterministic. So for hill, there is this implication, or rather the other way, right? If you're secure against deterministic distinguishers, you're also secure against randomized. So you really don't need randomized distinguishers. So what you do is first you convert from here to randomized, then it's actually, I don't need randomized distinguishers. I could as well be deterministic. No, I'm losing you somewhere. Let's, okay, let's understand the question again. The question is if you're going, if you have a class of distinguishers that doesn't know it's probability of outputting 0, 1, just outputs it. That seems like a stronger class than the class that outputs it's probability. It's a bigger class of distinguishers. But in the hill case, it's actually not any bigger than simply all deterministic distinguishers. Because in the hill case, there's a single distribution that fools every single hard wiring of the randomness. So we could as well only worry about deterministic distinguishers and just ignore the randomness. And then there's no probability. Is that coin fixing works in hill? Because there's a single distribution that works for all distinguishers, in particular for every fixing of the coin. So you don't need to know your own probability. You just, every single one of those distinguishers is not good. It's not going to be able to distinguish. So let's see, in hill, okay, so actually drawing implications is tricky. Because what are you drawing implications between the converses or the other positives, right? So hill that is secure, okay, so here's the argument. If you have hill entropy that is secure against these guys, the weakest class, they're only deterministic, they're only Boolean. So hill against these implies hill against these, is that okay? And hill against randomized implies hill against these guys. Because at the end, they can always flip a coin with the probability that, these guys out about a string saying 0.37, then they can flip a coin and become these guys, okay? And then this is just a smaller class, so you get hill against these. And then this is a smaller class, so you get hill against these. So this implication is done by coin fixing. This implication is done by coin flipping and the other two implications are because they're smaller class. What you don't have in the metric case is this implication. Yes, we're going to have to make, sorry, which one? For the hill or for the metric? So for the hill, it's just a coin fixing argument. So for every value of the coin's the distribution, for every value of the coins of D, Y will fool it. Because fix the coins, that's a distinguisher, why fool it? Therefore, for the average coins, it also fools it. It's an averaging argument without any work at all, right? While for the metric, this implication is going to be terrible, and it's going to be lossy, and the reason it's going to be lossy is because we don't have this luxury of fixing coin. So for the metric, there's no such simple implication. So Puy, I think, did this answer your question? Or you want to compare this to metric and see where this fails? Yeah, okay. All right, so first of all, right? If you have a smaller distinguisher class, then you have hill against that class, if you have, right? Let's go with the implication here that goes up, okay? Because we have a smaller class of distinguisher, so if you can fool it, we're definite. If you can fool it bigger, you can definitely fool the smaller. And here also, smaller class of distinguisher. This is fix the coins, it's true for every fixing. And this one is convert to Boolean. So this is continuous, and this is Boolean. Convert to Boolean by flipping coins. Okay, so now we can try to draw the same matrix for metric and try to see where the implications fail. So this is the same thing, but for metric. We have, again, 0, 1 versus continuous, they're looking awfully similar now. And we have deterministic versus randomized, and these two implications are still there. This is a smaller class of distinguisher, so if you have metric against this, you have metric against this. This is a smaller class of distinguisher. It's this implication that fails. There's no coin fixing argument. Because once you fix coins, you have a different y. For every fixing of the coins, you have a different y. So randomized distinguisher, I don't know with which y to fool it. I have all these ys, what am I gonna do? This implication also is okay. Because the same argument, once you're already randomized, you can convert to Boolean from the implication that does not work, where I said, how are you gonna kind of go back? It's the implication that I said does not work, it's this. From lower right to upper, from right. If you know the probability with which you output 0 or 1, how do you deterministically than output 0 or 1? You don't. Or your other question is, if you have a randomized circuit let's see. Yeah, you could also try to do an implication directly back here, and you're also gonna fail. You have a randomized, you wanna say that circuits that are randomized could just be represented by circuits that are continuous. But you don't know how to output the probability. So this, when I said an implication fails, I was talking about this one. While in hill, you can go around the circle to get this implication. So, that's not clear why it should be reason. So actually, do I have a clean example where I, so you have a randomized distinguisher. No, because you need to sample many times. In order to know your probability of outputting something, you need many samples. You get a string. You don't know if it's random or pseudo random or whatever. You do some computation on the string, you output 0.57. That's one version. The other one, you get a string. You do some computation on the string, and somehow one comes out at the end. How do you know what the probability of that one was? It's hard to tell. You have to do it multiple times. So, basically, the moral of the story is that, for hill, these are all equivalent with relatively straightforward arguments, like one line arguments, you just have to think about them. I mean, after somebody tells you it's equivalent, it's very easy to prove. And for metric, these are not all equivalent, because of this one missing implication. You're confusing me. So, if you're secure, we're trying to say, if you have metric against these guys, then you have metric against these guys. Oh, yeah, yeah, yeah, yeah. Okay, you do. Yes. There's no diagonal. That's, thank you, yes. There's no other diagonal that's missing. The other one, sorry, the other one. Wait, hold on. If you're in the upper left, it's, yeah, yeah, yeah, okay, good. I'm messing up. In the upper left. Upper left is where the diagonal is missing, because you're stuck. So, you cannot prove this implication, right? Is that better? Let's talk through this again. If you have a deterministic distinguisher that is Boolean, yeah. The other diagonals also, the other down. I think the, let's just remind us ourselves where we are in the metric star. In the metric star, we're here. And the metric that we really kind of want, then the one we need for extraction is this one. This is the one that you want in order for extractors to work. Why is this one the one that you want in order for extractors to work? Because extractors are randomized things, right? They have a random seed. So, your distinguisher needs to be secure against a randomized algorithm. Or rather, your distinguishing needs to be, your distinguishing the extractor case is a randomized algorithm. It's an extractor that flips coins. So, this is the one you want in order for extractors to work. And we proved hand-waverly that it does work, right? We said, for every, for every distinguisher, there will be distribution. And the distinguisher is the randomized thing. It's the extractor itself. Plus the, whatever, signature scheme or something. While the one we can prove is this. And we don't have the implication, okay? Instead of saying which ones we don't have, let's look at the ones we have. I think otherwise it's super confusing. We have some, we don't have all. Particularly, we don't have the one that we really want from here to here. So, whereas here we have a full circle of implications that are basically for free. Okay, did I manage to confuse everyone or anything? Sort of. All right, so, let's just step back for a second. We can consider distinguishers that are randomized or not, that are Boolean or continuous, that gives us four versions. Nobody bothered with that stuff for Hill entropy because it's all equivalent anyway. And it's equivalent with one line proofs. Like, sit down and do these implications carefully which a board is not well suited for. It's really better on the pen and paper on your own, right? To, you will see that they're almost immediate. Because you can just convert distinguishers from one class to another. While this implication is not there. And the chain rule we can prove is for this, stupid entropy. And the one we really want is this much better entropy. Because that's the one we need for extractors. So then we're gonna have to convert. So perhaps another picture that may be cleaner because it doesn't have eight notions is this one. Let me draw it without messing up. Yeah, okay. So, we have metric entropy here which is good for extraction, we already mentioned. We have metric star entropy here. If you have k bits of metric, you also have k bits of metric star, right? This is, but not the other way around. If you have k bits of metric star doesn't necessarily mean you have k bits of metric because it's a smaller distinguisher class. Good enough for extraction. This one is, right? So this conversion is lossless. The amount of entropy you have in metric is this way. Now this incurs a loss that is necessary in some sense. There's a black box separation proving that it's necessary. And the black box separation is by Krzysztof and Maciej. And you can find it on your print probably. So from here to here, the loss is necessary. And then there is hill. Of course, if you have hill entropy, you also have metric because there's a distribution that falls all distinguishes and it falls every single one. So it's good. And then you'd like to go here. In fact, we will prove this implication. That's our second theorem for the day. And this gives you a loss in circuit size S or distinguishing advantage delta. Or distinguishing advantage delta. This picture is nicer because instead of having eight notions, it has three, right? Kind of get rid of all the equivalent ones and the ones that don't matter too much. If you don't have to squint, it's up there. Good. Okay, so which of these things are we gonna prove? We're going to prove a chain rule for metric star. And then we're going to prove a conversion to hill. So now you start with some hill entropy. You start conditioning it. You say, okay, well, hill implies metric star. For metric star, I have a chain rule. Then I can go back to hill with some loss. So that's how you get a hill to hill rule, right? You say, I start with hill. That means I have metric star. If I have metric star, then I can go. Then I can condition by the chain rule that Leo is going to prove right now. And then I can go back to hill with a loss in circuit. That's the whole process, right? So we are going to prove two theorems. Chain rule here and this conversion. Good. All right, so the next stop is proving the chain rule. So before I, I'm still not going to use the slides. So you can keep the lights up, thanks. Any questions before I start proving the chain rule? More about the 75 entropy notion. They kind of divide them into useful and useful to work with and the ones we can prove theorems about, right? The useful to work with are hill and sometimes metric. The one we can prove theorems about is metric star. Okay. So, oh, why is it called metric? Because you can define, it defines a metric on, let's see, on distinguishes or distributions. So the definition, the name is, I think, due to Barak Shulte and Wigderson, but I'm not sure. And it defines a metric either on distinguishes or on distributions. You can measure distances between, what? Every distinguish, every distinguisher gives you a distance between distributions, I think, right? That's, I think, the reason they called it that. It's sort of delt. Sorry? It's delt. Yeah, I wasn't, I was never convinced by the name, so I cannot just, I cannot give you the justification. But maybe if you read their paper, they have a better explanation. Sorry, yeah, I don't know the right answer. I had an eraser. I don't anymore. Ah, thank you. Thanks, good. Okay, so here we go. We can try to prove that if an event has some probability, P, then the amount of metric entropy goes down by log P. Right? Remember, if probability, so I'm gonna introduce the letter P. If probability Z is equal to lower case Z is P, then H metric star of X conditioned on this event is greater than or equal to H metric star of X minus log 1 over P. But there's another loss, which is if you had delta here, if you had original entropy was good against delta and S, then the new entropy is only, you know, we can only prove that distinguishes don't distinguish with delta over P advantage, okay? So how are we gonna prove this? Why don't we define X sub Z, this distribution, just so we have a shorter name for it? It's gonna be called X sub Z. All right, so we're gonna try to prove it by contradiction. We're gonna say, if you can distinguish this thing, then you can distinguish this thing. So suppose, let's do the counter positive of that, right? So there exists a distinguisher D that works for all Y. Yep, and then tells X Z from, okay. So let's look at the space of all points, right? So here is, here's my distribution X Z. Here is my distribution Y. It's on the same space, I'm just drawing it separately. Y has many, many points. For all Y of high enough in entropy, right? Of high enough entropy. Of high enough entropy, which is, so let's call this thing K, K minus log 1 over P. Okay, what does it mean that it works for all Y? I throw a distribution, the distinguisher tells it apart. I throw another distribution, the distinguisher tells it apart. Here's the first claim, we can compute the expectation of this distinguisher on X Z. And we can compute the expectation of this distinguisher on Y, and they have to be different by Delta, plus or minus Delta. First, I wanna say that for every Y, it's going to be either above or below this one by Delta. But you have to kind of commit to which one. Either for every Y it's gonna be above this, this expectation is gonna be above this one by Delta, or it's gonna be below this one by Delta. It's not gonna be for some, it's above for others, it's below, why not? So imagine there's a distribution Y for which this expectation is above this one. And there's another distribution Y, Y2 for which this expectation is below this one. So for one it's above, for the other it's below. Why can, why can that not be? You can kind of average these distributions with the right probability. You get a third distribution Y, that still has high mean entropy, because you're averaging too high entropy distributions. And now it's gonna be indistinguishable from X Z, because the difference is gonna be zero, right? One is above, one is below, average them out. Now they're gonna average out to exactly this expectation, right? And they average to exactly this expectation, and this is a good distribution for following this distinguisher. The difference is zero. So you can mix the distributions one above one below and you'll get a distribution that's exactly at X Z and it's still gonna be high entropy. Okay, so we sort of removed the absolute value, right? We said that either all the Ys are gonna be above or all the Ys are gonna be below. Without loss of generality, right, we're only gonna do this proof once. Let's assume all the Ys are below, okay? So for every high entropy distribution, this difference, without any absolute values, just this difference, is greater than or equal to delta. Or actually delta over P, right? That's, that's our assumption, right? So we can say that for every Y, this difference is greater than delta over P. Okay, now kind of think about it. If you wanted to get as close as possible to X, which points would you pick on the right? Remember now, this distinguisher is deterministic. So for every point, it outputs a value. This distinguishes just a mapping from points to values. For this point, it says 0.5. For this point, it says 0.1. For this point, it says 0.2. You can actually write down these values, right? 0.1, 0.3, 0.7, 0.4, 0.2, whatever. There's some, some precision values. If you remember, for every Y, you're gonna be below this. Which Y would you pick to get as close as possible? How would you, like, you know, if you were designing a distribution that would be as close as possible, full this distinguisher, which one would you pick? How would you do it? You have exponential time, don't worry about, like, constructive. Bigger numbers are better, right? You're trying to get as close as possible from below. We know you're having to come from below. Because we know for every distribution, you're lower. Just pick the biggest numbers you can. Take, take the point on which this output is the biggest. Throw it in the distribution. Take another point on which this output, stop when, when do you have to stop? Well, you have to, you can't stop too early because it has to have high min entropy. You stop once you've thrown enough points to get the right min entropy. Okay? So, still, right? The kind of what forces you to be below x is that you have to throw in a lot more points than what x is. x is small, y is big. You have to throw in a lot of points. Some of them are not gonna be as big. So on average it's gonna be smaller than y, right? Okay, now our trick is to convert this distinguisher that is currently, you know, not Boolean. It's on the line zero now. We're gonna convert it to Boolean. We would like to have a distinguisher that simply outputs zeros or ones. And the way we're gonna do it is we're gonna put a threshold. We're going to say, if your value you're trying to output is above point five, then point five is an arbitrary number right now. We're gonna figure out what the right threshold is. If you're above this threshold, output one, if you're below this threshold, output zero. We're gonna change the distinguisher. That does not add anything to circuit size, one comparison. So that's, we're gonna ignore that change in circuit size. We're just gonna hardwire thresholds. Yes, simply because I don't know how to prove things other way. Because I mean, right, I will actually get metric against Boolean distinguishers, which is even better. But, so I'm starting with a general distinguisher and I'm converting it to a Boolean. And it's just a proof technique that I need. So let's take, so first of all, let's take this distribution Y. That is the closest you can get to X. This is the distribution that almost fools the distinguisher. Not quite well enough, but it's, you know, the one that has all the highest points. We know that these two expectations differ by delta over P. We want to put a threshold. We want to convert the distinguisher to another one that's Boolean, but maintain this property. How are we gonna do that? Where are we gonna put a threshold? So some points are gonna go up to one. Some points are gonna go down to zero, but we want this expectation difference to be the same. So we're, right, it's still gonna be a deterministic thing. It's just instead of outputting, you know, real value is gonna output zero or one. Turns out that all we need to know is that a threshold exists, right? We're just gonna hardwire it then into the circuit. And threshold exists by a simple argument about expectations. So, right, expectation. What is the expectation of the distinguisher on XZ? It's just a nice, you probably knew this fact and probably forgot it. It's a fact about expectations. It's kind of, you're adding up probabilities that the distinguisher on XZ is greater than, let's, this is gonna be rho, the Greek letter. D rho, sorry for the rho, but a bunch of other useful letters are taken. So, it's gonna be that one. This was in some probability class some time ago. You just, you're integrating the cumulative probability, the CDT, right, the cumulative, this should be something, see, yeah, okay. So if you don't remember it, work it out. It's a nice, it's a nice feature of, okay. So now if you have two, if you have a difference of expectations, right, if we're subtracting expectation D of Y, now my Y is the one that's closest to X. So now here I need to subtract, of course, this, and I'm going to subtract inside the integral. I'm going to put the D rho on the outside and I'm going to subtract here the probability D of Y greater than rho. Is that okay? So assuming you believe this fact about expectations, which you probably forgot but knew at some point, this is just how we subtract two things. This integral is on the line zero one and it's greater than delta over P, therefore at least at one point it should be greater than delta over P, right. Otherwise the area, you know, the bounds of integration are from zero to one. So at least at one point you have to be greater than, so this function needs to be, this function is greater than delta over P at at least one point. In fact, on average, but therefore at least one point. At at least one point rho. Sorry, my rows are kind of like these, but one point rho. That's going to be my threshold, right. If this probability minus this probability is greater than delta over P, then I'm just going to say this is going to be, if it's greater than rho, then I'm going to send it to one. If it's less than or I'm going to send it to zero. So now I have a distinguish a D prime that has the same exact advantage with this Boolean. So let D prime be same as D, but output one if D greater than rho, greater than or equal to let's say, it doesn't matter because zero otherwise. Let's just grade it because that's what you wrote here. The difference between the expectations is this much, but now those are expectations of a real value distinguisher. How do I know that I can convert it to a Boolean distinguisher that has the same difference of expectations? I have to actually find, I have to convert a real value to Boolean. So I don't see another way. Maybe there is. So what do you mean by a point? It's not a point in the space. It's a point, it's not by saying a point, I don't mean a point like a point in the point. I mean a threshold. So I mean that, right? So this thing is 0.4, 0.1, 0.3, 0.7. I'm saying there is a value 0.675 such that if I send everything above 6.75 to one, so I'm going to redraw this picture now with zeros and ones according to that. The expectation of D on this, minus expectation of D on that, the difference is still the same delta over P. So it's not a point in the space of points, it's a point in the probability kind of the output space. By the way, some would have been good enough because these are discrete outputs, we don't really need an integral. We don't really need an integral because the outputs are discrete. It's a finite size circuit, so you could do a sum instead of an integral, but it tells me that this point exists. It tells me that there is a, so right, I have one function, I have another function, I know that the integral of their difference is at least something, and therefore at least at one point, the difference is big. This point, the difference is big. That's the point at which I will send things, this guy to one and, you know, above this point I'll send things to one, below this point I'll send things to zero. Yeah? So I don't see how else to conclude it. I mean, I'm using a very specific statement about a function. I'm saying if the integral of a function is equal to da to delta over p, then at least at one point it is delta over p. I'm not quite sure. So I'm not sure if I'm, yeah? Okay, so the goal is really to figure out at which point to send things up or down. Okay, and we know that it's such a point exists. So now we can convert this circuit to another one. So there is a tiny loss in circuit size that I did not write on the slide because it's so trivial it's not worth mentioning. It's a comparison. Okay, now let's think about the circuit. We're done, actually. This is going to be the contradiction circuit. It's going to give us a distinguisher between x and every high entropy distribution which is a contradiction because x is indistinguishable. Let's just try to understand why it works. So I took a distinguisher that was between y and every high entropy distribution. I put a threshold on it. Now I have a distinguisher between x and every high entropy distribution. Why is that? Okay, so let's draw a slightly more careful picture. This is actually going to be my y. I claim, so what is the entropy of y is k minus log one over p. If I want a higher entropy distribution, which is the stuff that I have here, if I want an entropy k distribution, it needs to be p times bigger. Or yeah, one over p times bigger. So what I claim is everywhere outside of y, this distinguisher, the new one, the d prime is zero. Why is it zero everywhere outside of y? Because remember, we put a threshold and it's less than one on y because of this, I mean this difference is positive. Therefore it's less than one on y and now it's a zero one distinguisher so it should be zero somewhere on y. And y has the highest points, all the other points are even lower. So it's definitely zero outside of y. So that means that the expectation of d prime on any distribution, y prime of high enough entropy, of entropy k is less than or equal to what? Well, you took this and boom, you added a bunch of zeros, factor of p more. So it's just p times expectation of d prime of y. You just diluted it by zeros, that's all. Because this d prime is gonna be zero everywhere. So that's why we needed a threshold actually. Somebody asked me, why are you gonna try to put a threshold at all? Why do you need the zero one thing instead of a? Because I wanna say that now that I expand, I'm not increasing the value of the distinguisher on any other distribution. And now expectation of this distinguisher on x, I mean at the very least x contains xz and everything else is non-negative. So it's gonna be greater than or equal to p times expectation of this distinguisher on xz, xz. That's simply because this is a p-th portion of the entire space x by the definition of this event. Yeah, and then this is a non-negative distinguish. So now if you subtract the two, you get the difference is at least delta. So now this new distinguisher can actually tell apart x from any distribution of high entropy. Which one? This one. This one. So the important thing is that, so remember now we have ones and zeros, right? The distinguish on some points, I'll put ones on some zeros. The important thing is that everywhere outside of y, it must be zero. I'm sorry? So this is y, the inner circle is, the outer most circle is the whole space. The inner circle is the y that we took that maximized the probability of the distinction. Came as close as possible to x, but we know it was always below. And this is y prime, which is any distribution which may or may not actually contain y, of high entropy, high enough for this entropy k. What I'm saying is, look, this distribution, what is d on this big y prime? It outputs zero somewhere, one somewhere else. The only place it's allowed to output ones are places inside y. Because remember, y had all the highest output points. We send some to zero, some to one. And we know that y is not one, because otherwise this would not be positive. So therefore there's zeros inside of y, and therefore everywhere outside of y, there must be zeros, they even lower. And we had a threshold, right? So inside y, there's some ones and some zeros. Outside of y, they're all zeros. So if you have a high entropy distribution, you're not gonna, you know, these are the most ones you can get. And they're one p-th fraction of your whole, or p-th fraction of the whole place, of the whole space. So this is actually where we use the thresholding and where we use the very careful construction of y as the best distribution you can get. That, you know, we define the threshold specifically, I guess that maybe also answers Hamlet's question. Hamlet's question, we define the threshold specifically for this y, so that everywhere outside of y we send things to zero. Now this new distinguisher is really, really low everywhere on a high probability distribution, on a high entropy distribution. We want it to be low. And now here it's still reasonably high because we are not diluting it by that much. Positive, you know, it's positive, so it's never gonna be that low. We're diluting it by p. So here we're diluting it by p and here we're diluting it by p, but with greater than or equal to less than or equal to sign. So overall we diluted the whole thing by p. The distinguisher now on the original x and any other y is going to not perform very well, which can't, sorry, it's going to perform well enough to, sorry, going to perform with at least delta, which is right because delta over p times p. And that contradicts the metric entropy here. I know proofs take a while to sink in. This is normal, so let's just do the high level picture. We said, suppose there's a distinguisher that can tell apart this conditional from anything, high entropy. Take the best high entropy distribution you can get that is the least distinguishable. Put a threshold according to that distribution so that some of it is sent to one, some of it is sent to zero. Now this new distinguisher with a threshold would work too well for the unconditional distribution. Too well meaning distinguish of the advantage delta and that's our contradiction. I'm sorry, this implication, right? Okay, so this actually has nothing to do with the specific instruction of the distinguisher at all. This is just an observation that if this is xz and this whole, so this is little xz right here and this is the big x, it's, you know, it's p times, this is p times smaller and right, it's p times smaller simply because it's a smaller number of points we've conditioned. And even if it's zero on all these points, at least you have p times this expectation but it might not even be zero on all these points, it might give you even more. This is, the reason this is an easy one is because we want greater than or equal to and the distinguisher is positive. So if you work on this, if your expectation over a small space is pretty big, the expectation over a bigger space doesn't shrink too much because the worst you have over the points that are not in the small space is zeros, right? So the weight of every point here is one piece of the weight that it was before and so, does that make sense? So this does not use thresholding, this just uses non-negativity of the distinguisher, right? This is simply from the definition of expectation, it's the, you know, it's the value on the point times probability of the point, the probability of every point does not shrink by more than. Yeah, yeah, and I see, so if we set leakage equal to nothing, yes, I think so, right? So this says that if you have, let's see for a second, if you have a distinguisher that is, if you have a distinguisher that is continuous zero one by thresholding, I can make it a Boolean distinguisher. Yes, yeah, I think so. And perhaps that's actually, maybe that should be a separate step somewhere because that would make life easier. Somehow I haven't thought of that, I don't know why. I think you're right. Because that's what we just did, made it into, yeah, so I think you're right. Yeah, so for deterministic distinguisher, because the thresholding really relied on the fact that it had like a value at every point, right, yeah? So this implication is by thresholding. I wanna verify it offline before claiming it because maybe you're missing something stupid that I think you're right. We just did that here, I think so. So, okay, so let's again review kind of the four steps because they're really four very big, each step is not terribly complicated here. First observation is that every y should be, all y's together should be the below or above the distinguishes behavior on x. If it's a mix then you can average the y's. If it's without loss of generality all below, let's get the highest possible y, the ones that comes as close as possible to x for this distinguisher. Put a threshold according to that y, send some things to zero, some things to one and then just look what happens when you get bigger distributions. But when you get bigger distributions and all you're adding is zero then it doesn't grow. And when you have a bigger distribution and you're adding maybe zeroes and maybe bigger values then it doesn't shrink. And so one thing doesn't grow too much, the other doesn't shrink too much. And so overall you can get this result, okay? Good, there is a reference in my slides, there's a reference to the proof. So it's written down fairly cleanly. I'm sure that the first time you're missing details but that's okay. Good, let me point out a couple of interesting problems that happen when we try to extend this to entropy that was already conditional. Since I have the proof on the, even though I haven't talked to you about conditional chain rule, it's, wait, how are we doing at time? We need to end at one, right? Somebody, yeah, okay, so we're okay. All right, good. Then I have time to explain what I wanted to explain. Wouldn't it be nice? Remember in the information theoretic case, we had, we could say that, we already start with conditional entropy and then we leak some more bits, it's still gonna be okay. We'd like to be able to say the same thing here. Let me try to explain where things get stuck here. Imagine that what we try, what we wanted to prove is, you know, you already had X conditioned on Z1 and then you were conditioning further on Z2. So here's kind of two places where you would get stuck. One place is the very first argument that says either all the Ys are below or all the Ys are above. How did we say that that has to happen? Because we said if they're not, you could mix them. But now imagine that, so this is the theorem, let me maybe write down the theorem I want so that you could kind of see where things get stuck. So I wanted, this is like a wish that the entropy of X conditioned on Z1, Z2 is greater than or equal to entropy of X conditioned on Z1 minus, you know, something like a zero of Z2. And of course there's gonna be some loss in parameters and so on, but let's not write them down on the board because it's a mess. That's what I would like to be able to prove. I say, oh, so yeah, let's fix Z2. Let's fix Z2, that's easier. You write Z2 equals Z. So then instead of entropy, I could just do the, you know, log one over probability that Z2 equals Z. So a good example for this sort of thing is, excuse me, you're starting with Diffie-Hellman. Your Z1 is G to the A and G to the B. Your X is G to the AB. And your Z2 is maybe you leak the first bit of G to the AB. You know, somebody finds out that the first bit of G to the AB is one or that the first bit of A is one. Some other useful property. And then you'd like to be able to say something there. So for example, let's just write this down. EG, G to the A, G to the B. EG bit of A and this thing is G to the AB. G to the AB. Where will I get stuck in this argument? So how am I gonna have to carry it out? Well, I'm going to have a different X for every value of Z1. But there's not a single X now. For every value of Z1, there's an X, right? And by the kind of the hill definition that for every value of Z1, there's a Y. So there's a Z1, there's an X, Z1, there's a Y. Another value of Z1, there's another X, another Y. It's not clear why all the Ys have to make the distinguisher be below all the Xs or the other. You can't mix them, right? For every fixed Z1, you could of course say that this, all the Ys for this fixed Z1 have to be lower, all the Ys for this fixed Z1 have to be higher, but you can't mix them. So okay, but the way we sort of put a threshold on the distinguisher, we kind of, we said, either they're all below or they're all above, let's, without loss of generality, assume that they're all below and that I put a greater than threshold on the distinguisher. Otherwise, we'd have to flip the distinguisher and do a one minus. But now for some Z1s, we have to flip it. For other Z1s, we don't flip it. How, we can't come up with a single D prime anymore. Because it has to depend whether we flip it or not. This without loss of generality is now with a loss of generality. Because for every Z1, you have to decide whether it's above or below. And moreover, even if all the Ys were always, let's say, below for every Z1, they could have different thresholds. For every Z1, you have a different picture of numbers here, right? And you have to have a threshold so that you send some numbers to zero, some numbers to one. You get different thresholds for all different Z1s. So these are kind of the two points where you get stuck. You cannot convert this. This D, remember, was independent of the Z1. It had to kind of work for all. And now you're going to get stuck when trying to convert it for different Z1s. Because the D prime has to behave differently. So there are different ways around it. I'm going to give a couple of references, I think. One way around it was proposed by Macha Skorsky, which was first kind of define entropy to make sure that it's always below or always above. But let's pick one, always below so that the absolute value always has to be open the same way. Different definition of entropy, yet another one, right? Just define it so that the absolute value has to always be the same. And only work with Boolean distinguishes so the thresholding is not needed. And that notion Macha called modulus computational entropy. So if you Google the words modulus computational entropy, you'll find the paper that does the definition. And basically, this proof goes through because you don't have these two steps that fail. Then let's see how far do I want to go down this rabbit hole? Because there are many definitions under which you can try to prove something, right? You could also say, well, if for every Z1, for every fixed Z1, you have computational entropy. Then of course, this also works because just do it for every Z1 and then average. Which doesn't make sense for D.P. Hellman because for every value of A and B, you don't have any entropy. Once you fix A and B, there is no entropy. But that is called decomposable entropy. And that's what Ben Fuller and I wrote down at some point. So there's these two weakening notions of entropy for which you can do this. Either decompose it for every Z1 or slightly better have these absolute values always open at the same time. A much better way to deal with this problem, I think, is to have a very different notion of entropy, which I'm not going to define now, but I'll define in the third third of the after lunch part. And for that, we will actually try to outline a proof that gives us this. But I think a nicer notion of entropy than either modulus entropy or decomposable entropy. So let's see. I have all this on the slide, so I don't want to write it on the board and then have you watch the slides again. But there are several notions for which you can try to get this chain rule. For metric star, we cannot. And I think Krzysztof tomorrow is going to prove to you that we cannot, in fact. That this really does fail for reasons not just for proof. But there are some for which we can. And so the reasons, so the ways we can is one modulus entropy. Remember, I said we were wondering how many definitions to give. So I'm not going to define all these things, because it would be insane to give all these definitions. But you can Google them, right? Which is kind of this, but all the absolute values up in the same way, entropy. You can do decomposable entropy, which is even weaker, where for every z1, you have entropy. You can do if z2 or z1, I forget which one. If z2 is efficiently computable, do you also need z1 to be efficiently computable? Because, remember, oh, you need z1 to be efficiently computable, right? OK, wait. x, sampleable given z1? This is the Chan-Kalai-Raz-Huelff. Ah, OK, OK, OK, OK. Yes, I want to write it down. Sampleable. Why sampleable given z, right? Why things? Why sampleable given z? And I will have references when I bring the slides back up to those things. At this point, you can try to get these. Or the fourth one, which is completely redefined hill. And that's the one we're going to do later. So for now, this proof fails. And are you going to show tomorrow the impossibility result? The very end. The very end, OK. So the proof of metric star really does fail for a good reason. But you can all sort of redefine all these things and try to do it. And I'm going to do the fourth way, which I think is the best. OK, so we have the metric to metric statement. Metric star to metric star. We have this statement, which is, I guess it was on the slides. It's this, yeah? And now I said, actually, what we care about typically is hill entropy, or at least metric, but not metric star. Because we want to be able to extract and extract as a randomized. And this is only secure against deterministic algorithms. So what do you do? We need to do a conversion. So what I want to prove to you is basically a quantifier switch. Hill entropy says for every, you know, there exists a distribution that falls all distinguishers. What we currently have is that for every distinguisher there is a distribution. So now we need to come up with a single distribution that falls all distinguishers. Our goal is, given that for every distinguisher there is a distribution, how can you then come up with a single distribution that works for all? This is the metric to hill conversion, or metric star to hill conversion, yeah? Make sense? So this is the next thing I want to do. Maybe we'll set it up, and then go to lunch, and then do it at the end after lunch. So I want to say that for all distinguishers there exists y sub-distinguisher. I'm going to subscript it just to emphasize that it falls that distinguisher, right? Then there exists y that works for all d. Of course, somewhere you're going to have to pay something. What you're going to have to pay is, you know, I didn't actually write down what happens after, right? You're going to have to pay in the circuit size and the distinguishing advantage. It's not going to be quite as good. It's only going to fool smaller distinguishers and not quite fool them as well. Smaller circuit size, bigger delta. That's what we're going to get here. So how can we do this? Here's the idea. So I suppose not, as always. So what does that mean? That means that for all y exists d such that d of x minus d of y is greater than epsilon in absolute value, right? Let's, yeah, we can leave it like this. We can get rid of the absolute value because the circuits we consider are closed in the complement. If you have a circuit that works this way, just do 1 minus. And you have a circuit that works the other way. So let's get rid of the absolute value. So claim 1, there exists a distribution d hat. So since circuits are already capitalized, I can't do any more than put a hat on them. So now a distribution over circuits is d hat. Distribution d hat over circuits of whatever the right size. We'll figure out what the right size is. Such that for all y, the expectation over randomly chosen d from d hat of d of x minus d of y is greater than or equal to epsilon. This is actually a huge claim, right? Because they're saying we can replace, for every y, we had a circuit. And now we're saying we can replace it with a single distribution over circuits. It was non-constructive. And now we're basically saying there is a distribution. I don't know how to build it, but there is one. Just sample a circuit from that distribution. You will likely fool that specific y. It's a huge step. If we can do the step, the rest becomes easy. But we should do the step after lunch, not an empty stomach. Yeah. So good. So these distinguishers are OK. So what we want, right? So this is our goal. This is that this goal is called metric star, because these distinguishers are deterministic continuous. And this is hill. And here it doesn't matter, because they're all equivalent. So whatever we come up with will be good enough, right? So we're trying to convert metric star to hill. And so I said, suppose no hill, then this. And then I'm going to say that this happens, which is very close already to metric star. We just now have a distribution over circuits. So once we get there, we'll be almost done. This is the big step that we need to the next thing. Good. So was I wrong about lunch? Was it not? Nobody's keeping me on time, so I'm confused. Yes, until? Wait, it is one. OK, good. And lunch is until 2.30. OK, great. Thanks.