 The chain rule is super clean. You lose exactly the right number of bits. Probability is p, you lose log 1 over p bits, and you lose exactly the right amount in circuit size, log 1 over p, I mean, 1 over p because circuits are not measured in logs, right? There's no other losses at all. It's very, very clean and the proof is four sort of elementary steps put together. Now comes the messier part, but also the technically more, more interesting part of converting this notion of entropy to a notion you can actually use, which is hill, right? And so what we said is that this is what we need to do is a quantifier switch. We need to go from for all d, there exists y to there exists y that fools all d, right? So and where we're going to do this is by contradiction, we're going to say suppose there is, suppose that for every distribution that is high entropy, so this is a high entropy distribution, right? This is entropy k. There exists a distinguisher that distinguishes it with probability better than let's use Delta because I was using Epsilon for extractors. Epsilon is going to be my extractor parameter and Delta is going to be my distinguisher parameter, so I'm going to use Delta here. The claim I want to make is that there's a distribution over circuits that is a universal distinguisher, not a single circuit being a universal distinguisher, but a distribution of a circuit. And I'm going to make that claim on the other board, so that then we can kind of, I'm going to prove that claim on the other board, so we can just plug it in later on this board. And really this is the meat of the argument because once you have this distribution over distinguishes to get a single one from it is not that hard. So now let's consider a game of circuits against distributions. Okay, distinguishes against distributions. So it's kind of a two-player game. I'm not going to formally define a two-player game. I don't need to. But I can say here are all my circuits. And here are all my distributions. There's a slight annoyance in that this axis, the horizontal axis is infinite. I want it to be finite. So there's several ways to do it. Discretize them if you don't want to worry about only allow certain things, or only consider flat distributions. Flat distributions are, well, what do you think flat distributions are? Because any even entropy distribution is just a convex combination of flat ones. It's not an obvious statement, but it's a good exercise to think about. Any distribution of an entropy K is a convex combination of flat distributions of an entropy K. So it's usually enough to just consider flat ones. So we're going to ignore the fact that this axis is infinite. Technically, you make it finite by either considering only flat distributions or discretizing somehow. Okay. Make finite by considering only flat. By considering only flat. And again, there's a fact that any even entropy distribution is a convex combination of flat ones or discretizing. Let's ignore this because it's a silly technical issue that's not important. So now you have a circuit and a distribution. So there's a Y here and a D here. And what we're going to put here is how well this distinguisher does on the circuit. So we're going to put the expectation of D of X minus D of Y. Intuitively, what's going on in this game? Circuits really want to distinguish. So the circuits are trying to maximize this point. You choose a good distinguisher by trying to maximize it. So this is called the max player. This player wants to maximize. This thing is trying to be indistinguishable. It's trying to pretend that it's X. So it's trying to minimize it. This is the min player. This is the max player. How many of you have heard of the min-max theorem? Okay. Good. So about half. So now we're going to state the theorem and use it here. The theorem isn't about circuits or distributions. The theorem is about any such two-player game. There's a player who is trying to do what? Trying to minimize. Think of a player choosing a distribution. It's trying to minimize and this player is trying to maximize. What that value right here? At every point, at every intersection, we can write down a value. Every row and column gets a value like that. Trying to maximize. So the min-max theorem basically says, okay, imagine that instead of just, so in game theory, you have what are called simple strategies, which is where you pick a move and then there are the mixed strategies where you pick a distribution of moves. The min-max theorem basically says if you allow the doesn't matter who goes first, as long as the first player is allowed to pick a distribution of moves. So you can imagine that, we'll see how this relates to the quantifier switch, but intuitively it should, because quantifiers one goes first. So if the first player is allowed to not pick a specific move, not a specific circuit, but a distribution, let's say the circuits go first. The circuits pick a distribution. They want to pick a distribution that will maximize after the second player tries to minimize. They're trying to maximize the minimum, but they're not going to tell. If the circuits go first and the distributions go second, the circuits aren't going to tell the distributions what circuit they're using. They're going to say, here's my move, it's a distribution on circuits. Or the other way, the distributions go first. They pick a distribution on distributions which is messed up, but that only happens because our player here is distributions. They pick a distribution on distributions which itself is also a min attribute distribution, that kind of doesn't matter. They pick a distribution on distributions and then I try to maximize the distinguishing by picking a circuit. The result is the same. What is the result the same mean? It says that this is a theorem due to von Neumann in, I don't know, 30 something. von Neumann says, okay. So if we call this, I guess, well, let's leave the axis labeled the way they are. So max over d-hat, remember hats of distributions now? Of min over y of this value. So let's call this value just so we don't have to rewrite it. Let's call it g of x, y. G of d-hat and y, so not x, y, d and y, x is fixed. Is equal to min over all y-hats of max over all d's of g, d and y-hat. What do I mean by putting a hat here? I mean, of course, the expected value. When you apply a function to a distribution, you measure the expected value. That's what our notation means. So the expected payoff, the expected g is a payoff. G is how much you get. The expected payoff is the same regardless of who was first, as long as the first player doesn't have to commit to a move, but has to only commit to a distribution on moves. This is a non-constructive version of the theorem. It doesn't tell you how to pick distributions of moves. There's also a constructive version of this theorem, which is known by various names like boosting and learning theory or regularity lemma, stuff I don't really understand and I'm not going to talk about. We're only going to have the non-constructive version. I'm just going to mention that the constructive version is also useful and can get you various other forms of the same hill-to-metric result. So we're just going to do the non-constructive. We just know that this thing is equal, not how to find either one. Okay. So 1930. Actually, I probably would have the date, 28. Okay, 1928. There we go. So this is a theorem. We're not going to prove it. Okay. So now, once you know that this is the theorem to use, like the whole magic of this proof is knowing that this is the right theorem to use. This theorem was not about circuits and distributions, it was about two-player games in general. But now, if you think about sort of applying it here, what does it say? Well, max over d hat means that there exists a distribution over circuits. There exists a distribution over circuits such that no matter how you try your hard to pick a y, this will be big. The min over y will still be big. That's what the top line of the Feynman theorem says. That's the result we want to get. Okay. How do we get it? Well, we use the bottom line of the Feynman theorem. We know that the best distribution does not, that's our assumption, right? Even the best distribution, remember distribution on distribution is just a distribution. So it doesn't matter. It's still a min entropy distribution. The sample of distribution, then you sample a point from the distribution, still each point is probably due to the minus k at most. So this is a min entropy distribution. So the best min entropy distribution cannot be distinguished, or rather can be distinguished, sorry, the max. Good circuit distinguishes it with advantage delta. That's our assumption, right? Suppose not, our assumption is that for every y, there is a distinguisher that's good. Yes, I take the expectation over the randomness of d hat. The randomness of d hat. So now it's not even the randomness of d hat is a distribution on circuits. It's not like it's a single randomized circuit. It's a distribution on, so it's not a meaningful distinguisher yet in our sense. It's a distribution on all possible size s circuits. So there's two to the s circuits, and we have a distribution on this huge space. So it's not the end of the result. We actually want the single circuit that distinguishes everything. For now, it's just a distribution on circuits, right? So that's my claim one. There is a distribution d hat over circuits that distinguishes well. And that's what the expectation is, right? I sample a circuit from the distribution, and then I apply it to the thing. So what I'm saying is that claim one is not proven. That the left board moves claim one. And all I did is a draw a table and invoke the magic of Feynman. As I said, this is not constructive. I'm not actually telling you how to construct this distribution. It's just existential, but there are constructive versions of this. If you think about, if you know the machine learning sort of problem, right? You have a bunch of bad experts, and you're trying to boost them into one good expert. You're weighing these bad experts. That's the distribution over circuits. Machine learning is actually a constructive way of doing that. Coming up with a distribution over these bad experts. For now, I'm just saying it exists. Okay? So I'm not doing the constructive one, but I want you to know that you can. Do we believe the proof of claim one? Any questions? Claim one says, suppose that every single distribution Y can be distinguished by someone. Because that's the not hill, right? We're doing the counter positive. So we're saying, suppose not hill, that means for every distribution you can distinguish. And by the von Neumann quantifier switch, that means that there's a single distinguisher for all of them. It's just that the single distinguisher is no longer a distinguisher in our set's distribution over circuits. Yeah? So now we have a single distinguisher. It's just not, it's not within our class of distinguishes. It's a distribution over circuits to write it down, would take a huge number of bits because it's two to the S possible circuits and so on. But the advantage is that we did not lose anything. This is the same delta. We didn't lose anything in delta, which is nice. The quantifier switch does not cost us anything. It's just that what we get is now not a circuit. I'm writing an expectation in claim one. So it's the expectation of this whole thing. There is, because when I put a hat above something, what does it mean? G is a function. What does it mean to evaluate G on a distribution? G is evaluated on a pair circuit comma Y. G is evaluated on, you know, input from the left and input from the top. If I have a distribution on the bits from the left, then what I'm really measuring is the average, right? So in two player games, you worry about the average payoff. That's why you have a, that's why it is, yeah? So there is an expectation implicit in that notation of G of D comma Y hat, or G of D hat comma Y. Good, you know? Okay, so so far this actually cost me nothing, except that I got something really horrible. I didn't have a circuit, I have a distribution over circuits. Now we're just gonna do a Chernoff bound. We're gonna sample a bunch of circuits from this distribution and we're gonna hope that we approximated things well. So how well do we need to approximate them? Well, now it's a little bit tricky because we need to approximate that well on every possible input point. We want the circuit to be correct on every possible input. You know, the ultimate circuit has to distinguish. And so we're going to try to get within this distance, but there are kind of many places where we can mess up. So we're gonna have to apply the Chernoff bound to every possible input point. So let's just think about this for a moment. Let's see. Good, so step two, we can approximate D hat, or rather not, right, D hats, let's see, to within, let's call that parameter gamma because I'm running out of parameters, right? To within gamma, on every, this is important. It needs to approximate it on every y, every distribution y by a single circuit. Now this is nice, right? A single circuit of size, I have to cheat because I don't remember this. Yeah, of size n over gamma squared times s. So s is the size of an individual circuit in each row of the matrix that's there. So now I just need to show you what that single circuit is. And now I'm done, right? Once I have a single circuit, so then I get the resulting theorem. I just have a drop in circuit size. If I wanna be, you know, let's say if I want gamma to be equal to delta, so that I lose at most twice in the distinguishing advantage, then I get the resulting theorem. I just have a drop in circuit size. If I wanna be, you know, in the distinguishing advantage, then I have to have n over delta squared loss in the circuit size. So this metric to hill conversion becomes lossy because I have, you know, worst distinguishing advantage and much bigger circuits. And this is just proof. So it's kind of a funny, it's a probabilistic method, but so usually when you wanna prove something by a probabilistic method, you wanna prove a thing exists because its probability is greater than zero. Here a thing is gonna exist because its probability is almost one. It's gonna really exist because if you just take a random sample of circuits from D hat, a random IID sample of circuits from D hat, and you average them, you're gonna get very close toward the expectation of D hat. That's Chernov or Hoeffding inequality or whatever you want to use, right? So take enough samples, average them. You get actually a circuit that is outputting a continuous zero one value. Remember we had the continuous zero one, that's where you get the circuit that outputs continuous zero one because you take a bunch of these from D hat, you average them all and then you output that one value. You've approximated the expectation very well. So if you do this at random, it's gonna work. We don't wanna do it at random, we wanna fix the circuit so fix one and it's gonna work. It's a probabilistic method argument, such a thing exists. So if you simply randomly choose the right number of circuits n over gamma squared circuits from D hat and average their results, you will very likely be, you'll likely be, let's see, I think it should be like gamma times two to the minus two n close to D hat. I may be missing a factor of two somewhere from Hoeffding inequality but if you just plug this in. Or yeah, no, sorry. Let's do this, let's do this right. Okay, let's, we have enough board. We can do this right. Where's my, oh, let's see. Now I wanna leave this, just gonna apply Hoeffding cleanly with all the appropriate details. So what does Hoeffding say? Probability that average of T samples differs from expectation by greater than or equal to alpha is at most E to the minus two T alpha squared. To the minus two T alpha squared. Yeah, if you don't remember this by heart, it's okay, but that's what it is. Okay, let's see. So then I think I end up with gamma over two. Okay, so then if you take this average circuit, let's call the average circuit D prime by a single circuit already have a D hat. So now I'm gonna have a D prime. Sorry for the camera. Probability, D hat, expectation of D hat of X. Sorry, D hat of X minus D prime of X. Oh, Y, I guess this is Y. Oh, no, no, this is one specific point. That's the point. So this is for each point on the distribution. This is not for, this is for each point in the space. You've got two to the end points in the space. This is true for every point in the space now. For every point in the space, this difference, you know, the chances that it's greater than or equal to, let me do this right, gamma over two are less than or equal to, wait a minute, yeah. E to the minus, let's see, E to the minus two T where we gamma over two squared. And I set my gamma to, all right, no, what is my T? What is my T? That's the point. My T is, what is my T? Yeah, it's the number of circuits, which is, I just don't, I really can't see it, so you guys should tell me. And over gamma squared, right? And over gamma squared. So now gamma squared will cancel and we will get E to the minus two N and then we're just gonna do a crude approximation and say that this is less than two to the minus two N. You could improve constants here if you wanted to. Okay, no? And over two, did I lose somewhere? Oh, why am I doing gamma over two? That's silly, I don't mean gamma over two. Sorry, gamma, gamma squared and then work, yeah? No? Oh, yeah, sorry, that is correct. I should not have put a gamma there. That's where I was getting stuck, okay, good. So the point is I don't just want, I just don't just want to be good in expectation, I want to be good at every point. I want to get enough samples that I'm so close at every point that I can take the union bound over all points and still be close. It's gonna be, yeah, so actually I do need over two if you, okay, the reason I need over, okay, let me explain for a moment. So, yeah, I actually wanted gamma over two. That's it? Yeah, take times four or something. I actually wanted gamma over two because, yeah, because so now I need a couple more of these things. I need t to be bigger, right? I need t to be times four. So let's put a four here. No, I think, Delta Delta Delta. No, I'm not aiming for Delta. I'm aiming for a separate parameter. Separate parameter, yeah. Okay, so I'm gonna take four, four and over, yeah, I'm gonna take four times more, okay, good. Okay, so I take enough circuits so that we're very close on every point, right? Or rather, we're not very close. We're close with very high probability. And then I take the union bound of all points and there will be at least one circuit that is close on every point, gamma over two close on every point. So now, therefore, exists d hat that is plus minus gamma over two to d. So it's just d prime, that's plus minus gamma over two to d hat on every point, on every point in my space. The original expectation, okay, so let's think about this. Yes, you sample, right, you sample the circuit from d and then you sample a point from x and you sample a point. I mean, this is average over x and this is average over y. So far, I'm not using this fact at all. I'm just saying there is, if there's a distribution on circuits and there's a single circuit that is close to this distribution on every single point in the space, I'm not using the delta fact, right? So actually, let's not even do on every y, on every x. So we can approximate d hat to within plus minus gamma over two on every point in our domain. This has nothing to do, like these claims are independent. You can approximate any distribution on circuits to within plus or minus gamma over two on every point in the space. And the reason we need this is because this approximation should be correct on every distribution y. But if it's correct point-wise, then it's also gonna be correct on the whole distribution. So any d hat can be approximated by an actual circuit so well that it's correct on every single point in the space. Remember, after all, it's a distinguisher of points. We can talk about distributions, but really what it gets is a point. So now I have a d prime that is almost like d hat, d hat is a weird beast that we don't know how to handle while d prime is an actual circuit. That's okay. I can build a circuit. Even my poor circuit design skills permit me to take this many circuits, take an adder at the end and divide by n. By t, sorry. So this thing should be like zero or one, right? No, no. Remember we have metric stars. So we allow ourselves to have a real value at the end with precision whose denominator is t. That's our precision, right? One over t. So t is the number of circuits we chose. We're gonna average them all. So we really do have a circuit. What I'm saying, this is not some fictional thing. You take this many samples, you get a circuit. At least one of them will work because almost every one of them will work with high probability, so there exists one that will. And it's gonna approximate d hat on every point. So now if we run it on an actual distribution x, it's gonna be within gamma over two. On actual distribution y, it's gonna be within gamma over two. So the worst you're gonna lose is gamma over two here, gamma over two here, you're gonna lose gamma. So you're gonna get to within delta plus gamma. That's why I needed the over two because I will be slightly incorrect here, slightly incorrect here, and maybe in the opposite direction. So now putting these two claims together, claim one plus two means that now there exists a single circuit, d prime of size. I really may be slightly wrong with the four constant here, but that doesn't matter, right? Of size this, that distinguishes every y with advantage delta plus gamma. So now it's your call how to set this gamma. If you want a bigger gamma, you pay in circuit size. So what we have done is we started with, suppose hill does not work, and then we have the four metric does not work either. We have a full reduction. It's kind of an unconstructive reduction, as I said, because it uses this magic of min max. But that's it. Now we've converted metric to hill. Because we said suppose not hill, then not metric. Question? Oh, hill, sorry, metric star. Yes, yes, thank you, metric star to hill, yes. Good. Okay. So maybe, I mean, I'll certainly pause for questions, but maybe we should do an application before doing more proofs. I feel people are like getting, falling asleep. Probably the lunch was good, not going to sleep. Is an application good? I could do another proof of something. I could do another chain rule, because this. So this, what we ultimately proved is a chain rule. Remember our goal was to condition, and now we know what happens when we condition. You lose something in circuit size, and you lose something in the delta. And now you can trade them off. Let's see. So I want greater than or equal to, yeah, so it's delta minus gamma, you're right, thank you. Yes. Right, so you had a good distinguisher, now you had a distinguisher that was good, and now it's much bigger, not so good. That's the point. Small distinguisher was good, now we've got a bigger one that's not so good, that's exactly our loss. We're having to keep track of the losses. So what we can say now, suppose you have a pseudo random generator, and you leak five bits out of its seed, we can actually talk about the hill entropy of the output. We know that it became five bits less, and also we can talk about against what kind of distinguishes and what kind of epsilon. Yeah? So as, yeah, go ahead. It's a fixed event, good, it's a fixed event. I haven't said what happens when you leak five bits, but actually a very simple averaging argument, this is nothing sophisticated. If you want to then talk about hamming weight leakage, like I did at the beginning, you will lose exactly the size of the hamming weight. I'll state, why don't I just put the slides back on and I'll show you the theorem instead. Yeah, and then we'll do it, we'll do it that way. Good, where are my slides? Okay, so I think we're switching back to slides for a moment. Oh, slides, no. Ah, there we go. No, so I don't want to do this. Yeah, so this is what I wanted to say, this is the corollary that you were asking about the bottom line is, and it's a simple corollary that I do not want to prove because it's just formula pushing, there's no insight at all there. This, the stop line is about how to condition in a single event, the line in the rectangle. But now we can also condition on a leakage of some number of bits or the kind of the average case and it still works. And what do you lose? You lose the number of bits that you leaked, that's stage zero of Z, that's the bottom line. And then you also lose, you used to lose the probability of Z and now you lose the size of the support of Z. Again, for metric star. So that's, my point is not working, that's this loss, this is the size of the support. It's just putting the log back, taking the log back out. That's what you lose in the circuit size. This is again, metric star to metric star, then you can apply the hill conversion and lose some more, according to the, as I said, there's no sophistication in this argument, I don't want to do it because, it's in the same, this whole proof is from my paper with Benjamin Fuller that says computational entropy after leakage or something like that, I can put it up at the title. But, this first one is what matters. The second one just follows from the definitions of these. The way we define conditional is by averaging and it just follows immediately from the definitions. You just have to think about what the definition says. There's nothing tricky like min max or whatever there. All the trickiness is improving that line. What we do not have is, if the right-hand side was already conditional, what would you get on the left-hand side? If the right-hand side was already conditioned on Z1 and then you wanted to leak Z2, what would happen? That we don't have. So there's kind of two points that I want to make that we'll see I may not be able to make both. One is that even though we don't have this conditional, it would be nice, but we can still work with this even without the conditional thing, which is kind of surprising because in the public key setting you almost always have to condition on the public key, the adversary knows it. So your starting is always conditional. How are you going to start without conditional? It doesn't even make sense. Turns out you can. And the second thing I want to say is that there are situations where you can start with conditional and condition again. You just have to redefine your entropy in five different ways and I was going to try to do one of those. But I don't know if we'll have time for both. So what I want to do is try to do an application of this chain rule. Right, okay, so we already did this. I'm not going to do this thing. Okay, so let's do an application. And I'm going to probably do the application mostly on the slides and somewhat on the board just in the interest of time. So the application is something many of you probably have not heard of. You've probably all heard from the paper, Goldwasser and McCauley 1984 that encryptions should be probabilistic. The public encryption deterministically doesn't make sense, right? In fact, the paper is called anyone? Probabilistic encryption, right? That was like, that was a huge deal because the idea was completely foreign. People thought that RSA is encryption, so it's deterministic, right? And these people say no, it shouldn't be probabilistic. Okay, but actually, maybe if you have sufficient randomness in your plain text that the two plain texts are never going to repeat then maybe it's meaningful to talk about deterministic encryption, right? So people have talked about deterministic public encryption. This is kind of on the topic of randomness in photography but totally different, right? So you don't have to have encryption, you don't have to have randomness to the encryption if your plain text themselves are a good source of entropy. That intuitively makes some sense, right? So this idea goes back a few years and then people have a bunch of different constructions and I want to show you a simple construction and we're going to analyze it by using this chain rule. I'm gonna prove that it's secure by using this chain rule. So our goal is to have deterministic encryption, right? This is my notation for encryption. I'm gonna put a message or randomness in the public key and out comes the ciphertext. This should be familiar. And sometimes you want deterministic encryption to make, so for example, encrypted search easier, right? If your value is, you know, you encrypt your photographs and you put them in the cloud. Your photographs have a lot of entropy that no, it's not like the adversary is gonna check if that photograph was yours or not, unless they really know what that photograph is. Your JPEG is not gonna be accidentally repeated. So it kind of makes some sense to try to do that. Okay, so it's possible if the message M, M is gonna be my message, uppercase M is gonna be my distribution of messages. If messages come from somewhere that has entropy. And the security notion, which I'm not gonna define formally, I'm gonna sort of zoom in on one particular question that comes up and define that piece formally. The security definition is basically that, you know, it's like semantic security. You can do as much with the ciphertext as you can do without the ciphertext. As long as the message came from a high enough min entropy distribution. So it's semantic security for high entropy message distribution. Does security goal make sense? Okay, so now, how are we gonna do this? And for now, let's assume that our high entropy distribution is actually uniform. It makes life easier for the purposes of this application, for now, okay? Since I'm only doing the application superficially because I wanna show the application more than the details. Okay, so how are we gonna build this from normal encryption, right? If M has entropy, then why don't we just get the randomness from M? We have a standard encryption scheme, whatever your favorite encryption scheme is. Public encryption scheme. It uses random bits. You don't have random bits. What are you gonna do? You're gonna get them from the message. Okay, but how are you gonna get them from the message, right? Because they should be sort of intuitively, they should be independent of the message. The encryption scheme was designed to have randomness that's independent of M. You can't just put R equals M. That's not gonna work. How do you get independent randomness? Well, you push something through a trapdoor permutation and you get the hardcore bits out. How many people know what hardcore bits are? Okay, good. So for the few who didn't raise their hands, I'll just remind you that it's hardcore bits look uniform even given f of x. So f is a trapdoor permutation, R looks uniform given f of x, okay? So this is our idea. We're gonna start with a message, right? So, sorry, so this is the properties that we want, right? X can be recovered given the secret key. F inverse is the secret key. F is the public key. Not of the encryption scheme, but of the trapdoor permutation. And R are the hardcore bits. And imagine that you have a trapdoor permutation that has a lot of hardcore bits. Like it's reasonable to conjecture that half the bits of RSA are hardcore simultaneously. So you really have a lot of hardcore bits. They have a long randomness and long enough to use for encryption. So again, I'm gonna set my x for the trapdoor permutation equal to m. I'm gonna push it through and now I have suddenly two things. I have f of x and R and I'm gonna put all the mall to my encryption. So f of x is gonna play the role of m. Hardcore bits are gonna be formerly known as randomness and public key is public. And now I've just repurposed any encryption scheme to be deterministic, right? So that's my construction. The problem is it doesn't work. Anybody can see why it doesn't work? This is not secure and there's sort of kind of a generic attack. Looks good because I used randomness to come from the message. And the message has entropy. So where am I losing? What's that? No, so I'm gonna stick up into the public key and f inverse into the secret key. I'm gonna add them to the key. So how does decryption work? Decryption will recover f of x and then trapdoor inverse together. Right, encryption was never designed to hide R. Encryption hides m. A general encryption scheme will not hide the randomness that was used to encrypt. That's not its goal. May or may not. Okay, big deal. R, hardcore bits. Ah, but hardcore bits can be correlated with the message, right? They're uncorrelated with the f of x. But they're correlated with x. They could be the first half of the bits of x. So you could actually be leaking very useful information. Encryption doesn't hide R and R is correlated with m in a computational sense. R could be just bits of m. So this is not gonna work. Okay, so what do we do? Oh, right, I already said that. This is our construction and I already even figured out what it was. That doesn't work. Here's the property and here's the part that you're just going to have to believe me because I don't have time to prove it. Here's a sufficient property for this thing to actually work. The sufficient property is that the hardcore bits kind of hide input. And what does hide input means? It means that the hardcore function looks uniform even if the input was somewhat deficient. It wasn't, so for example, okay? Hardcore, right, so it should look uniform for any input distribution, m, even conditioned on some reasonably high probability event. That's a weird thing. What does it mean? So here's an example. Here's an example of not robust hardcore bits. If a hardcore bit reveals the first bit of m, why is that not robust? If you condition m on the first bit being equal to one, then the hardcore bits are no longer uniform because the first hardcore bit is the first bit of m and that bit is one and so it's not uniform. So this is sort of saying you can not, this conditioning can fix some property of m and your hardcore bits should not correlate with it. So this is the intuition. I don't have time to give a proof of that fact. So the fact that I want you to believe is that it's sufficient to have something called robust hardcore bits where robust hardcore bits mean that they look uniform even when you condition the input distribution on something, on some event and probability of quarter is enough which is kind of magic that falls out of the proof somewhere. So as long as you leak no, it should look uniform even if you leak up to two bits of m. That's sort of the intuition, that's the quarter. Don't have time to prove it but that's what I need from my hardcore function. If I have a hardcore function like this, then this whole construction actually works. So my goal is now to build a hardcore function that will be secure even if the input distribution is not quite uniform. So it depends on your assumptions, right? But it's reasonable. So for example, if RSA is secure under the RSA assumption itself, you can get half of the, let's say use 2048 bit RSA, you can use 10, you can get 1024 bits out. Yeah, because then you can PRG it, whatever you want, yeah. You just need enough to run a CDRAM generator, right? So you have enough hardcore bits often under reasonable assumptions. So what we want is, okay, so now we're trying to build robust hardcore functions which the first half of our RSA bits are not good because they reveal the bits of the message. We want something else. So we want something that doesn't and the trick turns out which is not terribly surprising, right? We want something to be secure even after it's conditioned on an event. How do you kick out an event? You use an extractor. We've talked about this, right? The way you kick out adversarial knowledge which is the same as saying kick out some weird event that someone knows about is to run an extractor on it. And we're gonna just put the extractor seed also into the public key. So we have a space for the seed. That's not the issue. So now, as I said, we're gonna just take it on faith that being robust is enough and what I wanna prove to you is that this weird beast on the bottom right is actually a robust hardcore function where robust means hardcore even after you conditioned the distribution of the answer. So that's what I wanna prove. And I think I'm gonna actually try to do the proof on the board because doing it on the slides is just not, not conveying much. Let's see. Good. Okay. So let's do a couple of failed attempts before we do the real proof, yeah? I'm sorry. Oh, the seed, the seed lives in the public key. Because it's a public encryption scheme so I can put a seed once and for all and not worry about it. So I'm gonna erase the min max because we, not the min max, but the specific application of min max. And try to prove to you that this deterministic encryption scheme is actually secure and intuitively, okay, we're applying an extractor to something, right? How are we gonna prove that this thing works? How do you prove that extractors work? You wanna prove that the R to which you're applying the extractor has entropy, right? I mean, that's the only thing we can do with extractors. We can apply them to entropy strings. So we're gonna try to prove that R has entropy which will mean that R prime is uniform. Okay, how do we prove that R has entropy? Well, R was uniform before leakage. It's hardcore bits after all. Now we leaked Z, but probability of that is only a quarter. So we should lose two bits of entropy. So it still has entropy and therefore we can extract. Does the high level logic make sense? It was uniform, we conditioned on Z. That means it's uniform minus two bits. Now we extract that R to work. The reason this doesn't work sort of out of the, just immediately as I said, is because all of this entropy is conditional, right? We have to consider kind of what do we need. Right, what we can say that, you know, this marker is starting to die on us. Let's see if it recovers. Is the contrast high enough? As you can see in the back, yeah, okay. So we can say, you know, the hill entropy, right, of hardcore bits of M conditioned on what? On the definition of F, this is the RSA public key, for example, if it's an RSA trap, the permutation, right? And on F of M itself, right? This is high. This is, you know, equal to the length. So that's great. That's kind of by definition of hardcore bits. That means to be a hardcore bit. If I give you the public key and I give you F of X, you still, these things still look uniform. So that's great. Unfortunately, now I want to say, okay, great. Now condition this further on, you know, on the event Z. But I don't have this conditional chain rule. I have a chain rule only if I started from this. But that's not very meaningful. Of course, the adversary also knows this and this. So I kind of, I'm starting in the conditional way of doing things and now I'm further conditioning on Z. I don't have that chain rule. Okay, maybe I'll do it the other way. Maybe I'll, you know, I'll start with hardcore bits of M. I'll condition on Z. I know what that does because that's starting with things that are not conditional and then conditioning. I can apply the chain rule we just proved. Z is an event, Z is not a distribution. So now we still have a single distribution. It's not a conditional distribution. It's not like, right, it's still a single thing. It's the same as HC of M conditioned on Z. So now this is a single distribution. So then I can, this is great. Now I can further apply the conditioning and condition on F and F of M. The problem is that these things are way too long. I'm conditioning on very long things as I'm losing. I mean, I have fewer hardcore bits than the length of M. So that's too long. My chain rule is going to give me nothing. So for a while we thought, okay, hey, we need a conditional chain rule. Why don't we go back and prove a conditional chain rule? But we couldn't and for good reason. So it turns out you don't need a conditional chain rule. So I'm going to try to show you that actually you neither one of these ways is necessary. You don't have to start with conditional to this. You don't have to. You can actually do this all together at once with a few simple transformations, one of which is the chain rule. So don't do anything conditional until you absolutely have to. We know that F, F of M and hardcore bits of M are indistinguishable from F, F of M and uniform. And let's say these are L bits long. So we're going to use the letter L for the hardcore bits. That's just the very definition of what it means to be hardcore, right? And so the entropy therefore of this string, like this, no conditioning or anything, is the number of bits of M that we're going to use no conditioning or anything, is H Hill, sorry. This is computational entropy is equal to, well, whatever min entropy of F is, public keys have, there's meaningful to talk about their min entropy, plus min entropy of F of M, which is the same as min entropy of M because F is a permutation plus L. Now I condition on Z. I know by the chain rule that this thing is going to be greater than or equal to this minus two bits because Z is probability of quarter. What does that actually mean? That means that F of M conditioned on Z. So let's actually, instead of writing conditioned on Z, let's just say that M Z is equal to M conditioned on Z, which so I can omit a line. F of M Z, H C of M Z, this triple, is now indistinguishable from something, from some triple A, B, C. I don't know what that triple looks like anymore, but I know that it has high entropy altogether, all three of those things, right? Such that H infinity of A, B, C is that thing, right? Greater than or equal to H infinity of F plus H infinity of M plus L minus two. It's the same thing. Okay, if these two things are indistinguishable, I'm omitting the indistinguishability parameters. There's some delta, some epsilon, we're gonna ignore them. Certainly if you add an extractor seed to both sides, they're still indistinguishable. So I'm gonna put an extractor seed here. This is gonna be called seed. I'm just adding some uniform bits that doesn't help a distinguisher, therefore it's still indistinguishable. If two things are indistinguishable, you can apply an efficient function to both sides. The results will still be indistinguishable. Otherwise you have a reduction, right? So let's apply an efficient function to both sides. The efficient function is called an extractor. And we get, right, F of M Z, this is our R prime. That's what happens after you extract. And the seed are indistinguishable from A, B extractor with seed on C, and C. This is almost what you want. We wanna show that R prime is close to uniform, given all this other stuff, right? That's our goal. Our goal is to show that R prime are hardcore bits, even when you start with M Z instead of M. Remember? We were trying to show this as hardcore bits when it looks uniform for M Z. But what is extractor with a seed on C? How much entropy does C have? Why are you allowed to extract from C? Better have entropy, right? If it has entropy, then we're done. Because then the right hand side has uniform to left. But how much entropy does it have? We know that the joint thing, A, B, C has entropy. But that was after the chain rule, and so the reason we don't know that individually C has entropy conditioned on A, B is because we couldn't apply the conditional chain rule. We couldn't condition on an A, B first. We only know that the joint triple has entropy because we wanted to apply it to the non-conditional distribution. We don't know that C conditioned on A, B has entropy. We know that the triple has entropy. This is where you remember the chain rule from the very beginning of today, which was information theoretic. C conditioned on A, B and C. How much entropy does that have? Well, it, right, so the entropy, this is purely information theoretic. On the right hand side, there's nothing computational. So the left is computational. The right is information theoretic. This thing is greater than or equal to, yeah? A, B, C, C, minus H0 of A, minus H0 of B, minus H0 of C, which is not interesting because we just add it here and subtract it there. So that's not the interesting part. That's our first chain rule. Okay, this is funny. At the top, we were adding H infinity of F. At the bottom, we're subtracting H0 of A and there is kind of a substitute for F. H0 of the support size, H infinity is the mean entropy. If your public keys are uniform over their support, then it's the same thing. If the public keys are weirdly distributed, then it's not. So this works for systems where the public is uniform over the support. Make sense? You have to actually argue that this, you know, this uses the leakage lemma, right? So we substituted A, B and C for F, F, M and UL. Some A, B and C, we don't know what they are. You need to argue that the support does not grow when you do the substitution. If you go back through the proofs that we did earlier today, support never grew because the support doesn't grow. The support of A is actually no bigger than this and therefore you're not subtracting too much. So what you get in the end is that H infinity of C conditioned on A, B and C is greater than or equal to L which is what you want. That's the number of hard-core bits you have. Minus two, that's the loss because of conditioning on Z. Minus these funny terms, you know, H zero of A minus H infinity of A, sorry, not A, F, minus H zero of A. Let's see, this is M minus H infinity. You have to pay these price because we gain H infinity, we lose H zero. So at the cost of paying these terms, you can get away with not having a conditional chain rule. You do conditioning only after you get to the information theoretic world because there is a conditional chain rule so you can do conditioning all you want. But before you get to that world, you will gain the min entropy. After you get to that world, you will lose the max entropy, support size. So if these two things are the same, you don't lose anything. If your public key distribution is weird, then you will lose something. Did I lose people? I think I lost people. So because F is a permutation, those are the same. It's the same, F is a permutation so you can talk about M and F of M equivalently, right? Yeah. So I have to say it's weird and I don't fully, I don't know if there's a generalization of this. I haven't thought about it enough. It's kind of strange, right? Because you have conditional entropy, you want to condition further, you can't because we don't have this result. So we're gonna work in the unconditional as long as we want and just accumulate entropy, right? You really care about this, but I'm gonna measure all of it together, all three things, just so I don't have to condition on two of them. And I'm gonna accumulate all my entropy, save all my wealth, right? And then pay for it at the end when I get information theoretic. But of course, it's like converting currencies, you lose. But you pay more at the end than you would have paid at the beginning, but we don't have a theorem for the beginning. We can't condition in the hill world, we can condition in the information theoretic world. So we switch to information theoretic and then we condition here. Okay, so kind of the moral of the story is if you don't have a conditional chain rule, you can get around it and you can prove something that works, right? So I had, at the time, we didn't know that, but I think if you were to redo this now, the gap is only two, right? The gap here is two, maybe you could do something. Yeah, that's a good point. I haven't, I haven't thought about it. So the gap to uniform is two bits, so this is a small loss. Yeah, so maybe you can also. Good. I wonder, I don't know if it matters, but I wonder which one would get you better parameters here, it's not clear, right? Yeah, so that's a good point. Okay. So actually, I should, yeah, so I should probably mention this with, wait, so how do you do this conditional chain rule? This is when you start with something, the conditional you start with is very close to uniform or the leakage is very small. Which one is it? Okay. Okay. Okay, good. Mm-hmm. I don't know what I can say about it in the next 15 minutes, but actually let me talk about the sampleable thing in a slightly different way. Okay, so this was the application. So now you know that you can actually encrypt deterministically if you believe that your message spaces have entropy and this is the way you do it. You can actually prove it. I guess one last thing I wanna try to do, we'll see if we manage to get far, but it's a shame to waste the board already filled with the min-max theorem. So I wanna milk the min-max theorem as much as I can. I wanna try to prove yet another chain rule that actually works, but for a slightly different notion of entropy and the min-max theorem is gonna be useful there. Okay, so that's the last effort we're gonna have to make. Let's go back a few slides. Let's see. Okay, yeah, so this is the thing I want. Right. We want, it depends, whatever distribution you have hardcore bits for. We don't know anything, we always make assumptions. So this is where we can say, hey, RSA is secure around this and this and the hardcore bits are still hardcore. And then it's, so there are reasonable places where you can make these assumptions to be serious, but it's not, no, I mean, you can say that on the message space that is all the photographs that Leo ever takes, it's just as good as uniform, right? On that message space, RSA has hardcore bits. And then you can say something about it. But it's an assumption, right? It's, yeah. So it's not entirely non-falsifiable. You get an adversary to walk around with a camera and then try to break hardcore bits. So you have to say for every case source, for every case source it's a non-falsifiable? No, no, I don't quantify every case source. I'm gonna work for a specific message distribution M. Right? So I want to say that the deterministic encryption works for photographs taken with the iPhone, right? And then M is a message distribution of all photographs that were taken with the iPhone. Then I condition it on an event with probability or quarter. That's the for all, but that's the only for all. I'm not a huge fan of that, to be honest. I'd like to be able to say something stronger than what I'm just saying now, but yeah, I don't know. Okay, but in the last 15 minutes that we have, I wanted to try to do another definition of Hill. Remember I said at some point that we're going to only permit you to change the Y, but not the Z when we define conditional entropy. Because we sort of thought of, Z is given to the adversary, it's fixed. The adversary now knows it. How can you change it? It's G to the A, G to the B are sent. While G to the A, B is something you used to have and you could pretend it's random. But actually it's not entirely clear why that should be the case. And a lot of our crypto proofs pretend that the adversary got something else and never noticed the difference. This is not an uncommon technique. So you could define a different notion of Hill entropy, which I'm not sure why I wasn't defined earlier, but just didn't happen. People didn't think of it. Where you permit to change both. We'll call it Hill Relaxed. R Relaxed stands for Relaxed. So we're going to say X, condition on Z has entropy K, if there's an indistinguishable Y and T that has true entropy K, and that's it. And there's still a debate about, well specifically between me and Krzysztof, about whether one can find an application for which you really need the top, the more restrictive definition. Okay, so now what about this kind of conditional chain rule? It's, as I mentioned, I was gonna, I already mentioned some of them, it does not work in general, and there's gonna be a proof of that tomorrow, but it does work for special cases. And in particular, it works for this Hill Relaxed. And there are many versions that show that it works for Hill Relaxed, okay? Because you kind of have more freedom, right? You can now choose the distribution. And the way the proof goes, I'm gonna try to do an approximation, a high-level approximation of the proof from Chang-Lui and pass 15. The way we're gonna try to do this is by arguing that the second leakage can be simulated, okay? So what does it mean simulated? Let's kind of try to reason through this. If we have some simulator, that can generate Z2 given X and Z1. It's not gonna be precisely Z2, but it's gonna be close to Z2. Then we have this chain of approximations. How does the chain of approximations work? The chain of approximations goes like this. X, Z1, Z2 is close to X, Z1 simulator on X and Z1, right? That's what it means. So we're gonna have to show that this can be done. Imagine we can show that this can be done. So this is a big if, right? And I'm gonna try to show this in a very hand-wavy way. If it can be simulated, then of course it's gonna be the same length. I mean, if you simulate it out with something the wrong length, then it's not a very good simulator, right? So all right, if it can be simulated, then what? Well, we know that X, Z1 is close to Y, T. That's our assumption. Our assumption is that X has entropy conditioned on Z1, right? If you know this fact, then you can apply the same function to both sides as long as the function is reasonably efficient. That's a legitimate thing to do. So then X, Z1, S of X, Z1 is indistinguishable from Y, T, S of Y, T, because if it wasn't indistinguishable, then you could distinguish the top line by applying S and distinguishing the bottom line. What are you gonna lose? You're gonna lose the size of the simulator in this transformation. And our simulator's gonna be pretty big here. So it's gonna be a big loss. That's not surprising. We know that we have to have an exponential loss, right? Well, when we conditioned on Z1 before, we lost the support size in the, yeah. Oh, by the way, I forgot to mention, this is kind of important. The metric to metric, metric star to metric star thing that I mentioned, remember the circuit got preserved. It was the same circuit with just a cutoff, a threshold, but the epsilon changed. Oh, delta changed. There's another version where the delta stays the same, but the circuit blows up. And there may be even some versions in between that we don't fully understand all the different versions you could have. So you have to lose somewhere and you're gonna lose by two to the length, two to the length of the leakage, either in the circuit size or in the epsilon. There's no avoiding that. As far as, well, not as far as you know, but there's a lower bound from Krzysztof and Maciej that you're gonna have to lose it. Somewhere, okay, good. So here you're also gonna lose the size of the simulator and the size of the simulator is going to be exponential in the size of Z2, just as a preview. S is exponential in, you know, well, H0 of Z2, right? In the length of Z2 and other things. So this is a fairly big loss now from here to here in the reduction, but that's the loss that's necessary. Circuit size have to, either circuit sizes have to lose it or the epsilon's have to lose it. So we're gonna have here the loss in circuit size. Therefore, with a loss of S in circuit size. Okay, so, but now here we actually can apply the, right. So now we have a triangle inequality from here to here to here. Therefore, these two things are close and finally here we can apply the information theoretic leakage chain rule, right? Because we have it and we know the length of this thing is the same as the length of this thing. I mean, simulators preserve length, so otherwise they're really bad simulators. Okay, and so by applying, by applying info theoretic chain rule, you get conditional chain rule for he'll relax. The nice thing is there's no metric star or anything going on here. This is all he'll. You don't need to go through two multiple steps. The only thing that I need to show you is that the simulation can actually happen. That you can simulate the leakage in the amount of time that's roughly exponential in the leakage length, okay? So what I claim is that the only thing left here to do is this top part, that leakage is actually simulated. And the simulator is going to be, you know, bigger than the leakage. Good, so how am I going to simulate it? Well, I'm going to say the following. Suppose that I'm trying to build a simulator of the leakage for one particular distinguisher. Now this is a different distinguisher. This is a distinguisher, not for computational from information theoretic, it's from real to simulated, it's this distinguisher. Now we're focusing on this distinguisher. Let's call it since I already have D, E. Okay, so this is the distinguisher here. E is not used yet, is it? No, okay, so this is distinguisher E. I'm going to try to build a simulator here for the distinguisher E. And you know how I'm going to do it, I'm going to apply the min-max theorem we already have and I'm going to have circuits on one side, but not distributions on the other, but simulators on the other. So now circuits are now E. Oh, should E is a bad letter, you guys should have stopped me, it's an expectation. F, F. A, A is like reserved for Alice, okay. Oh, I guess A is good too, okay. A versus F, this is to make sure that for the last five minutes you stay awake. Who votes for A? Who votes for F? All right, good. Okay, so we have a distinguisher F here. So here we have a distinguisher F and here we have a simulator. So these are not distributions, they're simulators who are trying to simulate the leakage. Simulators for Z2. So the distinguisher again is trying to distinguish, is trying to maximize what? The expectation of F on X, Z1, Z2 minus F on X, Z1 simulator of X, Z1. So this is trying to maximize, the simulator of course wants to simulate as well as possible, so it's trying to minimize that same thing, right? So minimize. The min-max theorem, it's very useful here. What is it gonna say? We'd like to say, so I'm gonna replace G hats with F hats and I'm gonna replace distributions Y with simulators S. Okay, let's think about what this means for a moment. So we're gonna call that thing G, okay. That's our G of whatever, F hat and S and F and S hat. Okay, starting point. I'm going to say that for every distribution on distinguishers, I can simulate it well. And then I'm gonna min-max it. I'm gonna start here. Let me try to convince you that for every distribution on distinguishers, I can simulate well against that distribution. Let's first simulate against one distinguisher. We'll get to distributions eventually. How do you simulate against one distinguisher? Remember, our simulator can be pretty big here. I'm just gonna run the distinguisher on every possible input. This, that too, this, that too, this, that too. Like there's like, you know, a list of possible Z twos. It's two to the length. We're gonna run the distinguisher on all of them. Say which one fools you the best? Thank you. I got the, you know, I have it. That's it. The simulator is bigger than the distinguisher against which it's simulating. But that's okay. Because we lose in the distinguish size anyway. So we're allowed to do that here. Okay. So now I simulate against one distinguisher. Okay, how do I simulate against the distribution on distinguishers? I sample from the distribution of distinguishers. I get the average distinguish. I simulate against the average. That's pretty good by the Chernoff bound, the same Chernoff bound that I'm not gonna redo now because it's way too late and people are tired. But it makes sense, right? If you have a distribution of distinguishers, you can approximate it by enough samples. What would you say? So the simulator is trying to output Z two. And the best is to put the best distribution. So what am I gonna do? I'm gonna take for every distribution on distinguishers, I'm going to sample and pick the one that's as close as possible, the sample that's as close as possible on every point. I think so. I think I probably will need the union bound over all points. I honestly haven't worked out the details. But yeah, you'll probably need the union bound over everything. So now my simulator, distributions on distinguishers are very weird things, but a sample of distinguishers on the average is just a bigger distinguisher. So I can simulate against that. I know how to simulate against one specific distinguisher. I will just say, hi, distinguisher. Let me try running you with answer zero. How will you distinguish? Okay, thanks. How do you, how well do you distinguish with answer one? Great. How do you distinguish with answer two? And I have two to the length of the two answers. I see which one you are the worst at. It's actually zero. At some point, you're not gonna be able to distinguish because it's gonna be the same. And then, so that's how I simulate. Yeah. So that simulation is perfect against our sample. And therefore it's close against the distribution because the sample is just like an average. Yeah. So for every distribution of distinguishers, there is a simulator that does well by simply sampling that distribution, then incorporating it into the description of the simulator itself. The simulator will simulate against that sample. Now the min-max switch will tell me that therefore there is a single distribution on simulators that will be good against every single distinguisher. Okay, now it's a distribution on simulators. I'm trying to build a simulator, but I got a distribution. Guess what? Sample average, right? And now by applying the sampling trick twice, I have, now it's a pretty terrible bounds actually that you get, right? Because first you had a really big distinguisher because it was an average of many. Then it was a simulator that was bigger than that, and now it's an even bigger simulator because now I'm sampling from the distribution of simulators. But nevertheless, with all these losses, you end up with a simulator. Yeah. How do you average, right? So actually, that's a valid question to which I don't remember the answer. Either of you guys can say something about it. How do you, so right? So a simulator wants to output is E2. Oh, oh, you probably, what do you do? Do you just take a random one with the right probability? Yeah. I don't know the answer. I'll have to look it up. Yes. It wasn't anything terribly complicated. It's like it's a four line paragraph in the proof, but I forgot what that was. It doesn't fit the size. Yeah, you could, yeah. So the point is that this simulator, right, is pretty big, but we are gonna lose that in the circuit size. So we're prepared to lose that, kind of. And so that's okay. So intuitively, Minmax theorem strikes again in kind of a different way, right? We had distributions and distinguishes and now we have simulators and distinguishes, but it helps there. As I said, I think hill relaxed is often the right notion and so this chain rule helps and there's a variety of parameter losses and possibility and possibility results and there's some references that I'll put online so you guys can, if you're interested, you can see mine. And I think on this note I'm done. What are the notes? The application. Oh, okay. So this kind of the first time this chain rule came up without even, and hill relaxed came up without ever being formally defined, but it was already there, was in a gentry wicks proof about something about non-falsifiable assumptions of all things. It was a paper about, I forgot even, it was a black box impossibility result. I think, for example, leakage resilient crypto is the application where you often will want a conditional chain rule and it's very convenient to have instead of having to do this monkey business that I had to do here. Other questions? Oh, I guess one thing I want to mention is that also all these chain rules can be avoided. That's sort of what Yovgeny said at the end of his talk yesterday. All right, one way to extract from a deficient distribution is to apply a PRG, use a chain rule, then apply an extractor. You have a deficient seed, you apply a PRG to it, you get a longer seed, you apply an extractor and we could use a chain rule to prove things about this but if you use a specific extractor, then actually the square friendly tricks that Yovgeny mentioned yesterday don't require you to do the chain rule and can give you a better bounce. So sometimes you don't even need it, although for leakage resilient crypto it's not clear how to avoid it sometimes. More questions? All right, thank you.