 So, you know, you have a password and I want to guess it and I have one guess, which one am I going to go for? I'm going to go for 1, 2, 3, 4, 5 because that's your most likely password, right? So, if you have a distribution, this is our distribution xx is almost always going to be our distribution, capital X. Lowercase is going to be the sample. If you have a distribution and you worry about the adversary guessing one sample from your distribution, then you worry about the highest probability. We'll call that predictability if you can already introduce that notion last time. And because we like to measure things in bits, we're just going to take the log of that or the negative log so that, you know, it's a positive number of bits. But really, the right thing to focus on is this probability of being predicted, okay? And so, the definition of min entropy, in case you didn't catch it yesterday, is just the log of the highest probability. Another way to put it, it's the smallest surprise. What's the least surprising password? One, two, three, four, five, that's the amount of surprise that you have. And if you're used to Shannon entropy, if you're coming from the information theory background, Shannon really talks about the average surprise. How much are you surprised on average? Not in the least surprising case. And of course, average can be a lot, a lot higher than least. So, it's often not the right notion for crypto. The average can be much too high to talk about security. So, H without any subscript will denote Shannon entropy, although I probably won't use it much. So, let's think of min entropy, okay? And so, what's min entropy good for, as I just said, passwords, right? If you're worried about people guessing your password, then you want it to have low, or high min entropy, low predictability, right? And you've getting yesterday mentioned message authentication. In fact, any cryptographic application that worries about predictability, right? If the adversary is trying to predict in your randomness, having min entropy is a good thing because it's hard to predict. And that, through Yvgeny's lectures yesterday, extends to a whole bunch of applications where predictability is your concern, like message authentication, signatures, and so on. And another thing that we talked about yesterday is that it's also good for randomness extraction. So, what's an extractor? I'll now be, so this will be the picture of an extractor. If you put in something, no, yeah, there we go. If you put in something that has min entropy, and you put in a uniform seed, then you get an almost a uniform output, right? In fact, the cool thing that you get from what's called strong extractors, although the adjective strong was kind of popular a few years ago and is now being dropped because every extractor is strong, pretty much. So nobody's using, nobody uses strong anymore. In a strong extractor, the seed is joined to uniform with the output, which means the seed is reusable. It's like a catalyst in a chemical reaction, right? It gives you a uniform output, then you can reuse it again to get another uniform out, right? This joint uniform, it gives you that. This is, I'm gonna throw in references. I'm not gonna read them off just in case you wanna read up more. This is sort of where that starts. And the simplest computation you can have to build an extractor. There are lots and lots of others, but this is the simplest. Is just multiply the seed by x, viewing them as elements of gf2 to the n. So n is going to be the length of x, right? Multiply the seed by x and then truncate to how much? Well, you're not gonna, you cannot hope to extract more than k uniform looking bits. Because all you have is k bits of entropy to begin with. But you truncate a little bit more, and then you get close to uniform, right? So if you want to get epsilon close to uniform, then you should truncate. You should get rid of 2 to the log 1 over epsilon. So 2 log 1 over epsilon. So if you think about, what does this mean, right? If you wanna get, if you worry about security of 2 to the negative 100. Because 2 to the negative 100 is your security parameter, right? You should subtract 200 bits from the entropy. And then the rest can be your output, and then you're close to, you're 2 to the negative 100 close to uniform, yeah? So let's just remind ourselves what close means. How am I gonna use that? Let's do left to right. So reminder, what does it mean for two distributions to be close? A, and we're going to denote it by A is epsilon close to B. Those are two distributions. Can people there see this? Yeah? Okay. If you cannot see them, it's okay to move up or raise your hand, one or the other. Better move up, but okay. So we're gonna put probabilities here and points in the distribution here. And you can draw the distribution A, and you can draw the distribution B, and you can compute the area between two curves, I guess at some point, yeah, something like that. And that area should be less than or equal to 2 epsilon. Why 2 epsilon? Because they're half the time A is bigger than B, and half the time B is bigger than A. So we're really counting things twice, right? If that area is less than or equal to 2 epsilon, then A is epsilon close to B. And there's a general result that whenever you want to use uniform, but you're using something that epsilon close to uniform, then you lose it most epsilon in your security. It's a good exercise to figure out. So being epsilon close to uniform is good. So what let's actually define, I will eventually probably erase it. But definition is that a K epsilon extractor, right, is some function, some function X. And let's say it takes n bit strings, together with d bit seeds and outputs, let's say, l bit strings, right? And what's the goal of such an extractor? If mean entropy of X is at least K, then extractor of what? X, uniform on d bits, so UD denotes uniform on d bits. Extract the distribution of extractor and uniform on d bits together with the seed itself, that's what makes the seed reusable. That distribution is epsilon close to uniform on the output space. I have an erase somewhere, there we go. Uniform on the output space, Cartesian product with the seed space. How many of you have seen this definition, by the way? I wanna try to understand where I'm targeting a little bit. How many of you have not seen this definition? There's a lot of abstentions. This is sort of a binary question, have you seen it or you haven't? Let's try this again, how many of you have seen this definition? How many of you have not seen this definition? Okay, good, this is a few abstentions, thank you for voting. All right, good. So this device is super useful, right? Because it gets you something close to uniform from min entropy. And as I said, the simplest way to build it is just to multiply. There are many, many ways to build it, but this is the simplest way to build it. And an observation is that if you have any cryptographic construction that requires truly random bits, but you use these, you lose a little bit in security, right? What do you lose in security? You lose epsilon in security because you're not quite uniform. But you lose one more thing, what else do you lose in security? That's in the length, right? So if you have a 200-bit thing and you're getting a 100-bit thing, you lose in length. But there's also a third parameter where you lose. So you lose an epsilon, you lose in length and you lose the third parameter that people often forget, but it actually will be important for this talk. What's the third parameter you lose? Efficiency of the extractor, right? Because if you think about the game, the adversary is suddenly getting a little bit of help from the extractor running some code, right? Let's say you were trying to break encryption. The adversary interacts with the encryption scheme. Now the adversary interacts with the extractor plus the encryption scheme. If the extractor is super inefficient, for example, it could be doing some ridiculous computation for the adversary, right? So when you do a reduction very carefully, you have to account for extra stuff, that extra computation that's happening, right? And that extra computation is the extractor. Because it's usually so simple we forget it and we ignore it. It's usually a very efficient computation. It's much, much lower than the sub-exponential time adversaries that we worry about. But until you say that extractor is efficient, this is actually something to worry about. Because now your computer is also doing some extra computation that the adversary may find useful. So you lose in the size of the, or in the running time, or the size of the circuit, right? And you lose the amount that is equal to the extractor size. Does that make any sense? Again, as I said, usually we ignore it, but actually there'll be time, not for the extractor size here, but for some things we won't ignore it. Okay, so good. Sometimes what happens is that the reason, actually I'm not sure I have that slide. Okay, good. Let's see if I have the right on my behalf of this slide. So sorry, the slides were prepared like very last minute because I was, yeah. Okay, I think I forgot to put that in because I was not expecting to use slides. But we'll work without it, we don't really need the slide. So there's sort of two views of extractors in the literature. If you go back far enough, and now they have merged. But if you read old papers, you realize that there are two separate views, and people didn't realize that the same thing. One view is that you just don't have a good source of randomness. There's no true randomness in nature. You only have min entropy strings from nature, and then you have to do something about them. And then you need a truly random seed that you got once by flipping coins, and you can use it forever, right? That's sort of one view of extractors. The other view of extractors is there is lots of randomness in nature. But they're also nasty adversaries who eavesdrop on your randomness. So maybe you flipped some random coins and they were truly uniform. But then the adversary got, I don't know, the hamming weight because your circuit consumes electricity proportional to the hamming weight of the string, right? And so the string was uniform, but then something leaked out to the adversary. And the whole field of leakage resilient crypto is about that, right? But intuitively, it's sort of the same situation. Whether you don't have good randomness, although you have it, but then the adversary learned something about. They're the same thing, even though the literature for a while considered these two problems separately in the 80s. And then sort of by 90s, they realized it's the same thing. But one thing to think about in the second scenario, when you have two randomnesses and then it leaks, is what happens to the entropy after the information leaks to the adversary, right? Because extractors, to apply an extractor, you need to measure the amount of entropy. So if you're going to measure it, then here's one nice lemma that is sort of immediate from the definition, right? If the adversary learned something that is not very surprising, then you didn't lose a lot of entropy. And if the adversary learned something that's very surprising, you lost a lot of entropy. And this is what, this is the mathematical version of that same thing, right? So the event, if the adversary learned Z, then subtract the, or divide by the probability of Z. Remember, we're working in log space, right? Because it's just more convenient to work in log space. But if we didn't work in log space, then you would divide by the probability of Z and, or multiply by the probability of Z. And that's, that's the right thing to do, because you just have conditional probability. It jumps by the probability of Z, okay? So this is a, this is a nice lemma, I forgot my clicker now. Bonus points to those who find my clicker. I realize I have no power in this class, because there's no bonus points I can award, okay? All right, so let's actually do an example of this, of this conditional thing. Let's say that X is uniform, again, over 0, 1 to the n, and then you leak the hamming weight. That's the example I had to begin with. What are the chances that the hamming weight is exactly on over 2? It's just a fact from combinatorics that these chances are roughly 1 over 2 root n. It's the central binomial coefficient. Okay, so the chances are about 1 over 2 root n, then this is the hamming weight. So how surprised is the adversary that the hamming weight is exactly on over 2? Not terribly surprised. That is the most likely hamming weight, okay? So if we take the log of that 1, 1 over, you know, 1 over 2 root n, then we end up having to subtract, you know, half log n and, and another one, okay? So how much information leaked to the adversary when the adversary learned that the hamming weight is on over 2? It's, you know, half log n, roughly, is what leaked. Remember, n is the length of the string, so half log n is a lot less. So this is okay. Of course, there's also some chance that your hamming weight was, was full, the string was all ones. And then the adversary is very, very surprised. And once the hamming weight leaks, it's so surprising that there's nothing left, right? So again, applying the same chain rule, again, I'm having to subtract or I guess my, I know I, there's z equals n and there's z equals 0. They're the same thing, but I should probably be consistent. Either you have full hamming weight or you have no hamming weight, but one or the other, both are equally surprising. And then, right, then you have, then you have no entropy left, which makes sense, because the adversary knows exactly what you string. So this formula makes sense, but, but often, right, if you're trying to design a crypto system of some kind that is leakage resilient, you don't know what your hamming weight is going to be. You sort of want to condition not in a specific event, but in a random variable z, where z is the hamming weight of x. And it's not exactly sort of very deep, but it's nice to have a convenient definition for that. And the right way to talk about it, and I think, again, you mentioned this definition yesterday, is that, okay, so what's, what's the goal of min entropy? The goal of min entropy is to measure predictability. I think that was actually somebody up front, one of you guys asked a question about why it's okay to ask the question. Okay, good, so you want to measure predictability, right? And remember predictability, now what is the adversary control? The adversary control is the prediction, but not the leakage. The leakage is, I decided on x, z leaked out, the adversary didn't get to choose z, it's just a function of x. But now the adversary can make the best prediction based on, on z. So the right thing to do, right, is to take the expectation over all possible x's because I flipped the random x. Of the maximum probability, did I just, did I, is it backwards or is it backwards? Yeah, sorry, what happens when you do slides at the last moment? Let's fix it. Improved? Magic. Okay, good, so the adversary doesn't control the leakage, right? The leakage is what happens. I take the average of the leakage, but then for each leakage, the adversary gets to pick the best prediction. So we average all the predictability and then we get the right thing. And this is an old, old definition by now. And that's the average mid-entropy, okay? Often in papers it comes with a tilde, but putting tildes on slides is a fricking pain, so no tilde here. Notice that this is called average mid-entropy, but it's not average of mid-entropy. And this is sort of a silly point to, to repeat, but it's, people sometimes confuse it, so it's good. Imagine that, you know, half the time everything leaks and half the time nothing leaks. What's the chance that the adversary predicts your value? Well, half the time everything leaks, so the chances are half. So you really don't have a lot of, a lot of entropy. You have one bit, because the chance of being not is, is one bit. So it's not, it's not like, you know, you shouldn't, you shouldn't average zero into thousand. You should average two to the zero and two to the negative thousand. And then take the log, average, you know, average before taking the log. Log is really kind of in the way. It's just because we like to measure entropy, so, good. So, sorry, the definition, before I take it off, does it make sense? You can take the average predictability, that's all. And then, because you want to work in bits, you're going to take the log just to measure things in bits instead of really small values. Yeah? So you would need to be outside of the log? So for Shannon, the expectation would be outside of the log. But here we don't want to do it. Because exactly because of half the time your string has zero bits of entropy, and half the time it has a thousand, if you put the expectation outside of the log, you can average zero into thousand, you're going to get 500 bits of entropy. That's bad, right? I mean, in the Shannon setting, that's exactly what you would get. You would get 500 bits of entropy. That's because, because Shannon measures average surprise, right? He measures compressibility. So for him, this is the right thing. For me, this is not the right thing. For me, you want the expectation inside the log. Good, more questions. Let me, let me keep this definition for just a little bit on the board so that we don't, we don't forget it. Marker disappeared too. All right, we'll use blue. Okay, so let's see. So I'll try to be consistent and Z will be my condition and X will be my distribution. But I may forget at some point. So it's the negative log, again, because we measure in bits. But the real thing is that it's the expectation over Z of the maximum over X. Probability X equals to X. Or people usually write it the other way. Probability capital X equals to lowercase X, conditioned on capital Z equals to lowercase Z. So that's the, that's the definition we'll be working with. So now we can meaningfully say something about, you know, the leakage when of the hamming weight of the string, on average. We'll just sort of average all these predictabilities that you get from there. And then you'll, you'll get the right thing, good. So now it's nice to have a chain rule. If you know Shannon entropy has a chain rule, we would like to have a chain rule also. And because we're not any worse. So, so this is our chain rule. It says after you condition on something, you're going to have to subtract something. But how much are you going to have to subtract? The amount you're going to have to subtract is equal to the support size of whatever you're conditioning on. So h0 measures how many non-zero probability points there are. How many people have seen h0? Ah, not enough people have seen h0. So this is just a definition that is kind of not even worth the definition. Because it's bigger than, the definition is bigger than the thing is defining. H0 of z is equal by definition to the size of the set. Little z such that probability z equals to little z is greater than zero. Yeah? Log, yes, always log, thank you, yes, good. And this time it's positive log because we're again measuring things in bits. So it's just a support size. If you want to think of it simply, it's the bit length, right? If you leaked out a five-bit string, you lost five bits of entropy. That's what that lemma says. Because the support size of a five-bit string is two to the fifth, okay? So this is not quite as nice as the Shannon case because in Shannon, all these three entropies are the same. Here we're subtracting support size, which can be annoying. So support size can be very big because you could have some really tiny, tiny probability events. But they're all adding up to support size. You can have many of them. So you'd like not to have to do this, but that's the best we can prove. So this is known as the chain rule. I guess let's prove it. Okay, we're going to try to do a proof on the board and see how that goes. Could you switch back to this board? Thanks. Can you guys see the lower part of the board? Or like, no. Okay. How about I erase the definition of extractor and go up? No? Yes? Or why don't we use that board? Sorry, we're going to use that board just because people can't see this one. Okay, here we go. So what do we want to prove? We want to prove this thing. All right, let's just go back to the definition. Okay, what's this thing? Well, it's that. It's the negative log of the expectation, right? Let's actually not do the logs because logs are a pain. Let's just do 2 to the minus. Now I get rid of the log. So now I have expectation over Z of max over X. Probability X equals X conditioned on Z equals Z. Go back to the definition of expectation. It's just a sum over all Z of, okay, max over X probability X equals X conditioned on Z equals Z times probability Z equals Z. Yeah? Okay? So now remember the definition of conditional and realize that this probability Z equals Z just cancels. Definition of conditional divides, this multiplies. So we can rewrite this whole thing as an and of Z equals Z. Probabilities, all right, I mean, yeah, conditional just doesn't make sense. I mean, this whole expectation is only over Zs that are in the support. Otherwise it doesn't make, conditional probability doesn't make sense when you're outside the support of Z. So this expectation is only over the support, yeah. Okay, so this is all right? And now just get rid of the and because I'm not making this any bigger by doing this, any smaller by doing this, right? So this is, let's write out another line, so it's clean. This is greater than or equal to sum over Z. Max over X probability X equals X, which is equal to. Well, it's just the support of Z. That's how many times I'm adding the same thing up. So it's size of the support of Z times two to the minus min entropy, I'm trying to fit on that line. Let's try that. Size of the support of Z times two to the minus min entropy of X. Where did we mess up? Yeah, of course, sorry. Yeah, this negative log is always a pain. Because you want the entropy to be bigger, but you want the probability to be smaller, thank you. Yes. All right, okay, good. Yeah, yeah, I forgot that we're, thank you. So what I just proved would be H infinity of X minus H zero of Z. But we can actually do better, we can put the Z in there. Yeah, okay, good, thanks. So let's not get rid of it. So we could actually just stop here and then say the number of addons is size of the support of Z. And each addon is two to the minus min entropy of X, Z. And this is again less than right. Good, thank you guys. Okay, this is more painful than it should have been, I'm sorry. But the point is, for every Z you have to add up the same predictability. And once you are done adding it up over every Z, then you've multiplied by the size of the support. Multiply the size of the support, in the log space, you lose the size of the support then, additively. So it's not quite the worst case of a Z, because let's do the Hammingweight example. You have a string of 200 bits and you leak its Hammingweight. How many bits does it take to write down a Hammingweight of a 200 bit string? Like eight bits roughly, right? Because it fits in a byte. Okay, so you're gonna lose eight bits of entropy for leaking the Hammingweight. In the worst case, you could lose all entropy for leaking the Hammingweight. It's not quite as bad as the worst case, yeah? It is the worst case in some ridiculous situations where what you leak is not the Hammingweight, but maybe with some tiny probability you leak the whole string. Then your support size is really big, because there's some tiny probability. And then this limit doesn't give you anything useful. But if your leakage is always limited in size, then it is useful. Makes sense? So only eight bits for Hammingweight is not so bad. So far so good? Yes? Okay, so then the next thing we can do is, yeah, so a comparison to Shannon we already did. Shannon is nicer in the sense that it's all the same H, and we're not all the same H, but Shannon is not very useful for us. And you can do this, you can even do this again. I'm not gonna prove it on the board because one more line and we'll get lost somewhere, but it's one of those exercises that is better done in the privacy of your own home. But you can just work this out, it's the same thing. Yeah, good point, thank you. Copy-paste is not as good as, yes, thank you. You just plug in the same thing and you work out the formula and work. So we can leak again and again and again, right? So I'm going to call the top line chain rule and the bottom line conditional chain rule to distinguish the two things. One starts with unconditional, gets a conditional. The other starts with conditional and continues being even more conditional, okay? Good, so basically if you started with being conditioned on Z1 and you also leaked Z2, then you'd have to subtract Z2. So if you first leak the hamming weight and then you leak, I don't know, the number of switches from zero to one or something, then you subtract these things twice, okay? So now it's reasonable to ask what this conditional notion is good for. And we sort of designed it to be good for passwords, because it really measures probability of the password being guessed. So now if you picked a password and you leaked the hamming weight of the password out to the adversary, the chances that the adversary guesses your password are two to the minus that entropy. That was sort of by definition kind of, that's how we designed it. It's the predictability. Turns out it's also good for the applications that involve predictability that you've already talked about yesterday. And you sort of, the security you're going to get, just like before, the security you got, you had to lose N over two. Here, the security again, is again, you're gonna have to lose N over two for message authentication. I'm not gonna go into that space at all today, but just so you know, whatever worked for unconditional and entropy will also work there for conditional and entropy. And it's not very hard to prove. And the pretty cool thing is that it also works for extraction. So if you have again, a string that had some entropy and then you leaked some information about it, you can still apply a randomness extractor. And so little done proved that if the randomness extractor is good, in that sense, for unconditional and entropy, in the sense of the definition we have here. It's also going to work for conditional and entropy. You're just going to lose a factor of three in epsilon, which is nothing because an epsilon is something like 2 to the negative 100. So a factor of three is easy to ignore. And in fact, for the specific extractor I showed you that multiplies two things and then truncates, you don't lose anything at all. It works just as well. Any extractor that comes from the leftover hash lemma, you don't lose anything at all. For a general extractor, again, I haven't shown you any extractor constructions. We probably won't show them this week. You may lose up to a factor of three in the epsilon, which is nothing to worry about. And the proof, we're not going to do this proof, but the proof goes roughly like this. The extractor was designed for entropy K. Now we don't have entropy K. We have average entropy K. What does it mean? It means some of the time we have a little bit less. Right? So the first thing you prove is that when you have a little bit less an extractor that's designed for entropy K cannot get that much worse. I mean it's not like you have one less bit of entropy and suddenly the extractor just is a constant. It's still going to be reasonably close to uniform. And it's going to be two epsilon close. And then you average over all things and you lose a third epsilon. That's it. Yeah? This is Solilova Don. It's actually an exercise in his book at this point. But if you want to know the solution to the exercise, then it's in one of his papers. I can forward you the reference so you can email him. I don't think he'll prove it when he gets here. But as I said, the strategy for the proof is, prove that an extract, if it's designed for entropy K, it'll still work for entropy K minus one without too much loss or K minus two or whatever and then average those things in the sense of averaging entropy. So it's basically lossless. And if you want to prove this for the leftover hash lemma specifically for the type of extractor that I showed on the previous slide, you just work through the lemma and you'll see that it still works. Just redo the proof. So basically, by considering conditional entropy, we didn't lose anything. We can do anything we've done before, which is very useful because we can still, in real life, entropy is almost always conditional. It's silly to talk, almost always silly to talk about distributions that are not conditioned on anything because the adversary always has some correlated information. Almost always has some correlated. So almost always the Z is the information the adversary has and X is the secret you have. But everything that we developed from conditional we can still do here. So I'm gonna show one application, my goal, how much time do we have? Okay, good, so we've got, we certainly have time for this application. Let's go back for a moment before I advertise it. This is probably the last information theoretic thing I'm going to do, which is one application of this conditional chain rule that makes life a lot easier. And then we will go to the computational rather than information theoretic side. Okay, so the application is to something called fuzzy extractors. Let's see, this is a popularity contest. How many people have seen fuzzy extractors? Okay, so at least it won't be a waste of time for most of you, all right. Okay, so this is kind of an application of the thing we just did, which is a very, very simple thing, but it's a useful application. So, you know, we're trying to do high key derivation here in this application. We're trying to do key derivation from noisy data. So I want to blink into a camera and have my iris become a key, okay. Or I want to take a physical circuit in my machine that is unique because of the unique, you know, silicon defects that happen at manufacture time. And I want to derive a key that is unique to my machine from that circuit. And these measurements are noisy. And so we have, let's call the initial reading W0 and the later reading W1. And let's assume that they have some small distance between them. So we're just gonna work in hamming distance, even though the stuff I say will generalize in some extent, but hamming distance is the easiest. That's not real life. Your iris does not, it's not like your iris changes, you know, in hamming distance the next time you blink, but they're smart people who can transform irises into something that does change in hamming distance. So let's assume there's some bound on the distance. Okay, and the goal is to derive cryptographically strong output. So there's sort of the two things, right? You want W0 and W1 to be the same output, even though they're different inputs. And you want the output to look uniform to the adversary in the sense of statistical distance, just this sense that we have here. Right, and then sort of again, the motivation is to kind of have authentication that is self-enforced. Without this iris or without this chip, you really cannot decrypt the information because it's all been encrypted and the server doesn't even know what it is. That's sort of the intuition, right? Of course, the server can just look at my iris and compare and give me the data if I match, but then the server knows the data and I don't want the server to know the data. That's kind of the intuitive idea. Okay, so we define something called fuzzy extractors. It's not, we're not really the first to consider this problem. There are lots and lots of prior work and the references in the paper if you want to see. So the way you define these things, they're not just extractors, they're fuzzy. Why are they fuzzy? Because they're noise tolerant. So you have some W0, that's your initial reading, right? That's a point in some space, in having space. And it has high entropy. That's the assumption. If it doesn't have high entropy, we're anyway stuck. It has, and in this case, mid entropy. And a generation algorithm will produce two things. It'll produce the output R. That's the string we actually want to encrypt with. That's the secret key that we derive. And then some kind of helper value P. That will help us get our back later when we blink into the camera a second time. So this thing should look uniform given P. What does it mean to look uniform given P, right? It means, let's just write it out here. That, you know, R comma P is epsilon close to uniform on the right space comma P. That's what looks uniform always means to me. A sense of statistical distance, okay? And then later, so we can think of it as an Alison Bob doing key agreement. We can think of it as me being in the past and in the future or in the present and in the future, the sort of equivalent views of looking at it. Later, somebody comes along with a similar string W1. Maybe it's me, again, blinking, right? And the second time into the camera. And with the help of P, I can reproduce R back. That's the functionality. So it's like an extractor, but with two algorithms instead of one. And an extractor will just take W and give me R. And now I want to take either W, zero, or later reading and still get the same R. And to do that, I will use this helper string P. Good. Oh, hey, I actually defined the security goal up there. So the security goal is this, that R comma P looks uniform comma P. Good, so how are we gonna do this? Well, we're gonna do this the way you would think to do this. You're gonna apply an extractor to W zero. That's how we get uniform bits. We had something with an entropy. We want uniform bits apply an extractor. That's the first step. Okay, well, then out here, when you do it the second time, you're also gonna apply an extractor. The only problem is you don't have W zero to apply an extractor to, right? You have W one, that's not exactly the same. So the extractor's not gonna give you the same value. So you need to reproduce W one. That's the way we're gonna build this thing. So we're going to apply an algorithm called sketch and I'll tell you what sketch does in a moment. But for now it's an abstract algorithm. That'll give us some, I guess, capital sketches and algorithm and lowercase sketch. Is this viewable? Is it the contrast to low? It's okay? Yeah, all right. So there's a lowercase sketch which is the output of the algorithm sketch. Okay, and then when we put W one back, then we can recover W zero and then run an extractor. So that's the high level approach of how we're going to build this thing. We're going to give some information inside this P that allows you to recover W zero back. So good, questions about the goal or the first attempt at a construction? I haven't told you how to do sketch, obviously, but I will. So it depends on your starting W zero. So how many keys can you derive from your iris? I'm only thinking of one for now. So let's say your iris has, this is probably reasonable to assume by some people that it has 200 bits of entropy. Then I'm going to maybe derive a 100 bit key once. Okay, that's my gen. The second time you blink into the camera, you're gonna get the same 100 bit key. The third time the same 100 bit key. The fourth time, that's it. I cannot derive any more because the entropy of the iris is already exhausted. Good, so the way to think of this entropy, this K, what does this K mean? It's not really the entropy of a specific reading. It's the entropy of choosing you from the set of all possible humans, choosing your iris from the set of all possible irises. That's kind of the randomness. The randomness is in the fact that you are a unique person. And so that's our K. Plus also the noise in the measurement that also is included in K. The noise is useless, kind of. We want to get rid of it. The fact that you're a unique person is useful. That's what we want to extract. P short is not necessarily, I mean, it's a nice goal, but it's not super important because it's public. I can write it on the wall, right? Because R is close to uniform even given P. So in terms of efficiency, it's nice, but as a first approximation, the more important thing is to get any security at all because it's not obvious. So I'm not gonna worry about length of P for now. So what are we gonna put in P? We're gonna put the, remember extractors need seeds, these public randomness. So we're gonna put the seed into P and we're going to put the sketch into P. So the sketch is the stuff that allows us to remove noise and kind of intuitively only leave the part that's important because you're a unique person. So the seed is simply because I didn't wanna overcrowd the picture. Extractor has to get a seed. We don't have seedless extractors for the general distribution. That's sort of, I guess it's another one of those useful exercises to do at home. Prove that if you wanna have an extractor that works for any man entropy distribution or is you're getting called any weak source, then it has to have a seed. You cannot have one extractor for all weak sources that is seedless. So because the seeded extractor, it's gonna, there's a seed in here and it should be the same seed in here. Otherwise, yeah, it's not gonna be the same result, but the seed can be public. That's kind of the beauty of the extractor definition. So it's okay to put it into P. More questions. So what I wanna show you now, and I'm gonna do this on the board to do it at the right pace, is that how to build this. So I'm not gonna show you how to build extractors because I already alluded to it and that's gonna be good enough. I'm gonna show you how to build Sketch and Rec. And the more interesting thing from the point of view of today is to show you how to then analyze the whole thing, how to know how many bits R you can get and how secure the epsilon is. Yes, of course, we're gonna use some kind of error correcting codes. Yes, so I'm going to show an ancient construction known as the code offset construction. The construction kind of goes back to the 80s, although it wasn't written down explicitly until the late 90s, but it was implicit in many works. So what we're going to do is, okay, so here's the intuition, right? We want to kind of hide W, but still allow you to recover W in some sense, right? We can't give W in the clear because that's my secret. But I want you to be able to kind of decrypt it if it's reasonably close. So why don't we just like one time pad W? Well, it's not gonna work because one time pad is two error sensitive, right? What if we one time pad it with a pad that itself is encoded in an error correcting code? An error correcting code will allow us to catch hemming errors. So the idea is to take random R, oh wait, R's already used, no R. I need another letter, P is also used. All the good letters are used, S, okay? We'll figure out what length in a moment. We don't know the length yet. Apply an error correcting code to it. So I'll use ECC to denote an error correcting code to get something called T, which is of the same length as W. So now T is S that is encoded in error correcting code. So a few errors, you'll still be able to decode, right? And then sketch is equal to W0 plus T. Let's think about how to recover for some, right? So okay, I know how to sketch, but how do I recover? I need to get W0 back. This is not the optimal construction. If your gears are turning about how can I improve this, that's not, we're not going for the optimal. We're going for a simple analysis for now. There's lots of ways to improve, okay? But that's, we're going for simple analysis. So how can we recover? I'm blocking the slide that has recover on it. Yeah, XOR, XOR W1 into this T, you get some kind of T prime that's close to T because W0 was close to W1. You decode it, you get R back. Once you have R, you can get W0. Okay, so to recover, to recover, get T prime that is close to T by sketch XOR W1. And then decode T prime to get T and therefore W0. So no, if you see is any, for now if you see is any error correcting code, deterministic error correcting code, fixed, pick your favorite error correction code, that's it, binary, right? So you want to work on binary hamming space for now. There is some really cool use for randomized DCC which is in a paper by Yevgeny and Adam Smith that turns out to hide this W0 even better than this will hide it. So this doesn't hide W0 terribly well, enough to extract from but not terribly well. Yevgeny and Adam Smith have a paper about how to hide W0 even better where using a randomized DCC helps, but probably not today. Right, exactly. The error correcting code bound will be related to the distance bound, right? So if you want to correct the errors then your error correcting code better also correct the errors. There's no, or I mean if you want to do distance T, your code should have distance T, there's no getting around that. So now let's do, let's actually do the analysis using the conditional entropy stuff. Now what are we extracting from? Well, we're extracting from the distribution, I'm going to abuse notation and use lower case for distribution because otherwise I have to introduce new variables. We're kind of extracting from the distribution R conditioned on what? What is the adversary C? The adversary C sketch, right? So we're extracting from R conditioned on this thing W0 XOR, sorry not extracting from R, I know what I'm saying. Extracting from W0 conditioned on W0 XOR T, that's what we're extracting from. So how do we know how long and R we can get? Well we have to know the entropy of this thing in order to know how many bits we can extract, right? So what is the entropy of this thing? We have the chain rule. So the entropy of this thing is equal to, right? The entropy of W0 comma W0 XOR T minus the length of this thing, right? So this length is let's call it N. My sources are always length N. What about this? W0 XOR T we can, sorry this is h infinity, right? We can, this is greater than or equal to. We can rewrite this from W0 and W0 XOR T, you can get S. S was the random thing. So W0 and S minus N. What is S? Well S relates to how good your code is. S is the dimension of your code. S is how many bits you can encode into an N bit string and still have distance N. So if we say that S has length M and we had, assume, right? That the ECC, the code was, is a M N T code. So if you know error correcting coding, this is notation for dimension output size minimum distance. Then this thing is, right? H infinity of W0 plus we added up M random bits plus M minus M. We threw in M random bits when we generated S. And when you throw in M random bits, the entropy behaves the way you expect it to behave. There's no surprise there. So now we immediately know how good our input to the extractor is, and then we can just apply the general extractor result for conditional min entropy and extract the right number of bits. If you want to be epsilon close, then you extract this many minus two log one over epsilon, okay? So the point of this example is maybe less to teach you about for the extractors more to show you how simple the analysis is. And once you have this construction to know how many bits you can extract is one line, basically. Two because I have a narrow board, right? Because you have conditional min entropy as a tool. If you looked at sort of previous ways to analyze this construction, they were a lot more painful because people just didn't have this notion of conditional min entropy. And this construction as it appeared in the literature many, many times under different names and so on. You can definitely improve this construction. I think Vanessa, did you ask about the length, right? Oh, Karlie, yes, about the length, okay. So yeah, the length here is gonna be n plus the seed. You can actually make it n minus m. You can only send the error correcting bits of this code by doing something else, but not here. And you can do constructions from unhamming and so on. I mean, there's a whole lot of stuff you can do with this. But the point is that the analysis is usually this simple. And you get the simple analysis also for a lot of interactive protocols in the information theoretic setting. So if you worry about the active adversary who is gonna mess with the string P on the way from the past to the future or from Alice to Bob, then there's more sophisticated protocols that will protect it. The analysis, like coming up with the protocol is non-trivial. The analysis is this is how many bits if sees we're gonna subtract those bits. You know, this is how much randomness we put in. We're gonna add that. It's just really, really basic arithmetic. Add the bits you put in that are random. Subtract the bits that you've sees and you get how many bits you can extract from this lemma. Good, questions about this because I'm done with this application. I'm gonna do something else for now. I think I'm probably not gonna talk about Fuzzy Extractors again. So the goal of the application was kind of to convince you that, you know, averagement entropy is a useful thing because it makes analysis simpler. Was that a handle? The question? Somewhere? Yeah, okay. All right, I'm gonna just remind you of something Evgeny said yesterday, right? You don't have to always use extractors. We already said that extractors are powerful in general and they will always give you something epsilon close to uniform, but you have a loss. The amount of randomness you get is less than the amount of entropy you started with by this two log one over epsilon. And what Evgeny talked about, yesterday is that you can, I don't know if I even referenced all the right papers. I referenced at least the subset of the right papers, is that for specific applications like the square friendly applications and so on, it's possible to lose fewer bits than this two log one over epsilon. Okay. So mean entropy is useful in information theory, crypto. Now we're gonna try to move to computational crypto and say, can it be useful there also? Okay. So this is a reminder of what mean entropy is. Here's a definition of the computational version of mean entropy, which is called hill entropy for its authors, Hosted and Paliazzo-Levin and Lubey. How many people have seen this definition actually? This is a good test for how fast they can go. You have seen hill? Okay. All right, so I shouldn't, so feel free to stop me at any point, okay? We're going to say that a random variable X has computational mean entropy. We're going to call it hill entropy simply because of the initials of the people who first defined it. If it is indistinguishable from something that really does have that entropy, okay? So I have this new notation that I didn't have yet that X is delta S close to Y. We had X epsilon close to Y, that was statistical distance. What is X delta S close to Y? Now there are two parameters. Delta is an amount of how close? S is against computationally bounded distinguisher, so S tells you the distinguisher size. The running time, if you're in the uniform model or a circuit size, if you're thinking of circuits as your distinguishers, write down some canonical language. We're going to think of circuits for today. So we're going to say that X maybe doesn't have true mean entropy, but I cannot tell it apart from something that does. What do I mean I cannot tell it apart? If I give myself running time S, or if I look at every circuit of size up to S, and I tell it, here's a sample from X or from Y, tell me which one it is. You don't know. The circuit won't be able to tell them apart with a difference more than delta. So this is a standard indistinguishability definition for, but now we're sort of parameterizing it very precisely by the probability of distinguishing or rather the probability difference, right? The difference that you say yes here versus yes there, that's your delta, and S is the circuit size, okay? And we're going to have to keep track of these parameters, which is a little bit of a pain, but it's kind of important to do. So that's the definition again. The probability X has computational entropy if it's indistinguishable from something that actually has mean entropy. Let's see, yeah, I have an example, so let's work on an example so that we have. Right, so the basic idea is this is a very useful notion because if you have an adversary whose running time is bounded by S, that adversary can't tell apart the two things anyway, with probability better than delta. So if you have a proof and you really need high mean entropy, but you substitute this indistinguishable for high mean entropy, the proof still goes through as long as your adversary is not more than size S. So let's kind of do a very, very basic example. You start with a uniform seed A and you run it through a pseudo random generator to get an output B. The mean entropy of B, you can't create mean entropy out of nothing. Mean entropy of B is still the length of A, but the hill entropy of B is now the length of B. So the hill entropy can be much, much, much bigger than mean entropy, simply by the existence of pseudo random generators. Remember extractors, they work on mean entropy, but now what if you have hill entropy? So you have something of low mean entropy. You took a seed, you ran it through a pseudo random generator, you have a long string. It has very, very low mean entropy, just the seed length. It has very, very high hill entropy, the output length. You run that thing and then maybe some information from that string is now given to the adversary, so now it's not really uniform, but it still has entropy intuitively at least. The result is going to look epsilon plus delta close to uniform, right? So epsilon comes from the extractor. We already had to pay epsilon, even if it was true mean entropy. Now we also have to pay delta because the circuit of size S will not view Y and X the same, it will view Y and X almost the same within delta of each other. And there's something that, why am I writing approximately S there? Why did I just say S? Because now the adversary circuit that's trying to distinguish X from Y consists of the circuit plus the extractor. That is the actual distinguish. It's the circuit plus the extractor. So now you have to pay the price for running the extractor. So it's gonna look uniform to circuits of size S minus extractor size. However long it takes you to run the extractor program, however many bits it takes you to write down the extractor program, you have to subtract in the reduction, right? So this is standard in distinguishability argument. When two strings are indistinguishable, you could as well work on the string that's better for you. If X is indistinguishable from high mean entropy, we're going to work on the high mean entropy and pretend it's high mean entropy and we're gonna have to pay the cost of the indistinguishability. So maintaining these deltas, so what do we want from delta? We want delta to be very small, right? We want the circuits to really not be able to distinguish and we want S to be very big. Even big circuits shouldn't be able to distinguish. Keep this in mind, right? So anytime we can't make delta small or we can't make S big, that's a bad thing. We're gonna try to make, to have less of a bad thing. So that's gonna be our goal, okay? And now of course you can also ask, hey, we really never care about mean entropy or we always care about conditional mean entropy, the same thing here. Let's just observe that in the computational setting, conditional is super common. Let's do something, right? So these are a million examples. I'm actually gonna work on the first one for a little bit. If you have a Diffie-Hellman, the secret G to the A, B intuitively has some entropy. Like if it didn't have any entropy, it wouldn't be a good, good Al-Gamal encryption or good Kegelman protocol or whatever. So it has some entropy, but of course the adversary has correlated information, G to the A and G to the B. So it's meaningless to talk about unconditional entropy of G to the A, B. Unconditionally it's uniform. If you're not given G to the A and G to the B, there's nothing interesting to say. It's uniform. The point is that it maybe has some entropy, even given G to the A and G to the B. So we have to talk about conditional entropy, right? If you have a secret key and some information like the hamming weight leaks, then of course you'd have to condition on the leakage. If you have a signature, again intuitively signatures ought to have entropy because if they don't, the adversary can forge them. But of course there's lots of correlated information. The message and the public key are correlated with the signature. So again, it's conditional. You know, actually the last example doesn't make sense in this context, but replace the thing with the plain text. The plain text has entropy intuitively conditioned on the ciphertext. Again, it's a conditional entropy notion. So we don't want to talk about specific values. We also, we often want to talk about on average also, right? Because we, like, here it's even harder because if you have a specific public key, then that corresponds to a specific secret key. Once you fix the public key, there's only one secret key and then there is no entropy in the plain text. You know, you really want to talk about the average public key, that the adversary doesn't actually know. Okay. So here's a definition that kind of implicitly goes back to, is Ugo here? Ugo is here. It goes back to your work and we know you didn't write it down. The generocraft chicken raven, as far as I can tell. Unless you can tell me that it's even earlier. Unless you know that it was somewhere earlier, but I don't know anywhere before that. Okay. Yeah, no, I didn't, I did not see it anywhere earlier. So the idea is this, and actually it was done in the context of Diffie-Hellman, the first example I have, that the entropy of x conditioned on z is going to be K. If you have some indistinguishable variable y, also conditioned on z, so that's x and z are going to be, sorry, fix this. Voila. We're not clicking anymore. Okay. Good. So we're going to change x, right? So x was, let's think of the context of Diffie-Hellman. This is actually the easiest example to think about. I have g to the a, g to the b, g to the a, b. My z is g to the a and g to the b, right? And my Diffie-Hellman secret is g to the a, b. So my x is g to the a, b. What is the decisional Diffie-Hellman assumption? What does it say? It says this g to the a, b, this x is indistinguishable from a uniform y, even given g to the a and g to the b. So even if you have z, right? Let's write this down in case I'm speaking too fast. So the DDH example, you have x equal g to the a, b. You have y equal uniform. And you have z equals g to the a, comma g to the b. And DDH says exactly this, that x, comma z is indistinguishable from y, comma z. That's the Diffie-Hellman assumption. So here's what Gennaro Krafchuk and Robin used it for, which I think is a very nice use and a good illustration of why you might want this definition. The decisional Diffie-Hellman assumption itself says, you wouldn't bother defining this entropy notion if you're only worried about DDH. In DDH, you're indistinguishable from uniform. There's no entropy, that's interesting. It's full entropy, all right? It's indistinguishable from uniform over the whole group. But maybe you're working in some group that Diffie-Hellman definitely doesn't hold in, decisional Diffie-Hellman. Because if the order of the group is divisible by two, then g to the a, b is gonna be a square a lot more frequently than g to the a and g to the b. And g to the a, b is gonna be a square three-quarters of the time, g to the a and g to the b are gonna be squares. Half of the time, so g to the a, b is distinguishable from uniform. DDH is just false. If the order of the group is divisible by three, then the dh is, again, false if you can test for cubeness, qubiti, qubiti, right? If g to the a is gonna be divisible, g to the a is gonna be a cube with probability a third, g to the b is gonna be a cube with probability a third, g to the a, b is gonna be a cube with probability something a lot more, two thirds of five ninths or something, right? It's gonna be a lot more likely than uniform. So DDH is just false, but if the group has a large prime order, then maybe DDH is true. So if you work in a group and you don't know the prime factorization of the group order, you can't actually pinpoint the group in which DDH is true. Maybe it's true in some subgroup, but you don't know which one. If there is a large prime subgroup, you're like, it's good, and maybe there's several large prime subgroups, and it's true in every one of them. Make sense? So this is a good moment to measure entropy. And to say it's indistinguishable from some distribution, I can't even tell you what the distribution is, but it's uniform over some subgroup of prime size. So I can say existentially, not constructively, but existentially, there is a y from which this g to the a, b is indistinguishable. It's indistinguishable from within some subgroup times, uniform in some subgroup times the coset that we already, that is distinguishable. So if we work over the sine group, or if we work over the subgroup, then y exists, y exists, therefore we can apply an extractor. Notice the nice feature here. We only need y to exist. We don't need to know what it is. We don't need to construct it. The extractor argument still works, because if x is indistinguishable from y, then we can substitute y in the proof. The extractor will output uniform bits. Therefore the bits are indistinguishable from uniform as long as y exists. Some people don't believe me, I see some skepticism. No chain rule yet? I'm not, so chain rule is a lot problematic and we can talk about the conditional chain rule in like an hour or so. But there's no conditional chain rule here. I'm not subtracting anything. I'm just saying there is a subgroup in which the dh holds, hopefully. If such a subgroup exists, I don't even know what it is. And there exists a y such that g to the a, b is indistinguishable from that y, even conditioned on z. So chain rule would require me to subtract some length of z or whatever. I'm not subtracting anything, I'm just saying. Good, so let's actually do the proof since it causes some skepticism, right? Yeah? So when you prove security is indigate, you often need to avoid to be efficient in the sample. So let's just prove security of the following construction. Unauthenticated Diffie-Hellman, where at the end, Alice and Bob both apply an extractor to the Diffie-Hellman value. And we don't worry about active attacks. So they're very, very basic. I send you g to the a, you send me g to the b. We both compute g to the a, b and extract. And we want to prove the security of that against passive attacks. And the assumption isn't ddh in the entire group, but that there exists a subgroup in which the dh holds. And we don't even know what the subgroup is. Okay? So if the dh holds in a subgroup, then it's reasonable to say that I guess this requires some algebra that I may or may not wanna do. So if ddh, okay, so let's see. First of all, I probably need another board. What can we erase? Oh, we can erase this stuff. Okay, good. I prefer to keep that board for the definitions if we can. Let's erase this. So g is our group in which we work. H is a subgroup of g in which ddh holds. So we can view g to the a as, let's see. So we need to kind of represent it as an element of h times an element of the coset of h in order to do this right, which I'm not fully prepared to do. Sorry, let's see. g to the a is, right, so let h be the, good, let's do it this way. So h is a generator of h. So we can write g to the a as h to the a1 times g to the a2 and g to the b as h to the b1 times g to the b2. And then when we compute, sorry, h to the b1 times g to the b2, and then g to the a b is going to be, what? It's, I don't want to say something like this, but I'm not sure if it's true. Let's see, what does it go? How does it go? Uh, very, yeah, it almost works, times something, and I don't know what that something is. I don't actually care what that something is, okay? So implicitly, we're computing Diffie-Hellman in the subgroup, and then multiplying it by something else. As long as you don't care what that something else is, as long as this thing is indistinguishable from uniform in the subgroup, we're okay. That's the intuition. No, I'm just going to say, assume there's a big enough subgroup. If the subgroup is of size, so assume, assume h, the size of h is greater than two to the k, and DDH holds in the subgroup. Then, G to the AB is indistinguishable from uniform over, because DDH holds here, this is indistinguishable from uniform, right? Indistinguishable from uniform over the space h, times something, which may very well be deterministic fully computable by the adversary, I don't care. The point is that this thing, by being in the subgroup, is indistinguishable from uniform element in the subgroup, and therefore has itself high entropy, right? So in fact, what I should write more carefully is G to the AB, G to the A, comma G to the B is indistinguishable from uniform on the group H, times some G to the C, comma G to the A, G to the D. So this is the Diffie-Hellman assumption in the subgroup, and this is my X, this is my Z, this is my Y, and this is again my Z. So that definition on the board, XZ is indistinguishable from YZ, follows from Diffie-Hellman on the subgroup. So I'm not doing any kind of chain rules to get anything, I'm just saying that Diffie-Hellman assumption on the subgroup, even if I don't know the subgroup, gives me some high entropy Y. This is not uniform on the entire group. This may be more likely to be a square than a non-square or whatever, but it's uniform on the subgroup because of Diffie-Hellman assumption on the subgroup. So then because it exists, I can apply an extractor to G to the AB that's indistinguishable from an applying an extractor here, and this is a high entropy distribution. I don't actually know what it is, because I don't know the subgroup, but it is big. So I can get k bits of entropy out. So that's kind of what the paper was about, that you don't really need to know the subgroup and the overreach Diffie-Hellman holds. And in fact, if it holds over several subgroups that are relatively prime order, then you can even add up all the entropies by algebraic tricks. So if you have a subgroup of a large prime divider, or a large prime size P and another subgroup of large prime size Q, you actually get Diffie-Hellman out of both, rather than just only that. Okay, so that's kind of the first application of this conditional mean entropy, that there has to exist some Y, we don't have to be able to construct Y in order to run this proof that says, you can extract from this G to the AB value, and you get something that's close to uniform. Because if you were extracting from Y instead, you would get something that's statistically close to uniform, so you could distinguish the two things, and you shouldn't be able to distinguish them by the Diffie-Hellman assumption. Okay, good. More questions about this definition? It's gonna be important later, but we're going to sort of ignore this issue for now. In this definition, only X changes, but Z does not. The intuition is that the adversary kind of saw Z, you can't switch it out. X is the hidden thing. Turns out that intuition, I think, is actually wrong, and you can't switch it out, and if you allow yourself to change Z, you get a more permissive definition, right? But for a while, nobody did it, so we're not gonna do it for now either. So then, you can ask the reasonable question, do you have a chain rule of some kind? Does conditioning actually reduce, hill entropy by, well, certainly it reduces, but by how much? So for mean entropy, we have this fact, actually the fact is slightly nicer. There's comma Z, I should put it in there because otherwise I'm cheating. We have this, and then we could average over all Z and write out that one line proof that says you can actually, you know, if you want a condition not on a specific value, but on average Z, kind of, then you can just subtract the length of that. Oh shoot, went back to what it was. Okay, so here's a theorem that we're actually gonna prove, this is gonna be a long proof without slide, and so this is gonna be important, that if you condition on an event, in a computational entropy setting, you also have a chain rule. You lose two things. One is you lose the amount of surprise. That's not surprising that you lose the surprise, right? So if you had 100 bits of entropy and the thing is two to the negative 50 surprising, you lose 50 bits of entropy. Okay, that makes sense. You also lose in the delta, and this is actually more, this is in some sense more painful. Remember what delta is? Delta is the probability that the adversary can distinguish. That distinguishing probability also gets worse. Or rather, it's easier to distinguish now. Here's the intuition. Why both losses are necessary? Let's think for a moment. Imagine that we have a pseudo-random string. Okay, so X is a pseudo-random string. It's indistinguishable from Y, which is a truly random string. What is Z? Let's imagine that Z is the first 10 bits of the pseudo-random string. Of course, if you give the adversary these 10 bits, your entropy has to go down by 10. There's no kind of no surprise there, right? But now let's take a different Z. Let's take a Z that the first 10 bits of the seed. So your X is a pseudo-random string. Your Y is a truly random string. Your Z is the first 10 bits of the seed that were used to produce the pseudo-random string. What happens? Well, somehow, if the adversary was doing, let's say, exhaustive search or random guessing for the seed to test random versus pseudo-random, this random guessing got a lot easier. Because you know 10 bits already. You only have to guess the remaining bits. Your guessing got two to the 10th easier because you don't have to guess 10 bits. So you have to kind of divide your, or multiply your probability by two to the 10th. Divide by two to the negative 10, right? Because your chances as an adversary of success are just much higher now. And you know the Z, at least, right? You know the seed. So the loss in length comes when you leak the output of the PRG. The loss in the quality comes when you leak the seed of the PRG. Actually, maybe much of Krzysztof. No, is there a proof that you need both losses simultaneously? No proof. I think at some point, somebody tried to prove that you need both. It's not an either or. You really lose both at the same time. But I don't, yeah? I'm sorry? You don't have to lose both at once, okay? All right, so I, okay, good. For now, we're gonna prove the thing with two losses at once. And I don't know what's gonna happen with later, okay? Good. It's almost break time. Let me do three more minutes so we're well set up for the next half hour. I mean, after the next half hour. This theorem appears in many, many guides and these references are so that, if you feel like reading lots of papers, you can. It's known also as the dense model theorem, and I'll explain why it's known as a dense model theorem. But before I do that, I just want to warn you that this theorem is not about the entropy we've all been talking about this whole time. It's not about hill entropy. So I'm kind of cheating by showing it here. It's about a much weaker notion of entropy that we can then convert to hill, okay? So we're gonna, in the next hour and a half, hopefully, we will first explain what notion of entropy this is for. Then prove it for that notion of entropy and then show that that notion of entropy is convertible to hill. And all of these things will incur losses, like here we lose in the delta, then we're gonna lose in the S. And so it's gonna be a fairly expensive conversion. This is not such a nice chain rule as you have for information theoretic. So the plan for next hour and a half is to prove this chain rule, define the right notion of entropy for which we can prove it, prove it, and then convert it back to the hill entropy that I defined, good? All right, let's take a break.