 Go ahead and get started. For those of you who haven't attended the series before, this is a colloquium series in computer science that we started last year in honor of our dear Lake College graduate, Mathwani. And this quarter, I'm very pleased to introduce our speaker for the quarter, Salil Dadam, from Harvard University, where he's a full professor, chair professor. We're lucky enough to have him out here in the Bay Area this year. He's on sabbatical spending time both here at Stanford and also at Microsoft Research. So Salil's been a leader in computational complexity and pseudo-randomness, randomness, and computation, pretty much ever since he was a PhD student at MIT. He won the ACM doctoral dissertation award back in 2000. He also won the Grittle Prize, which is basically the main test of time award at the University of Science in 2009 for his work on expanders. And today, he'll be talking about computational entropy. Thanks, Tim, for the generous introduction. And it's a real honor to be speaking here in this series in memory of Rajiv Mathwani. And it's been wonderful to be here at Stanford this year, having lots of great interactions. And I look forward to interacting with many of you in the five months that are left, the time is going by too quickly. So I'll be talking about computational entropy, joint work with Ifta Ketner, Thomas Hollenstein, Homer Weingold, Hotec Wee, and Colin Jang. We're mentioning a few different papers involving different subsets of these authors. The main thing I'll be focusing on, though, is joint work with Colin Jang, who is sitting over there. And we'll be giving a theory lunch talk next week on some things related to this work that I won't be able to talk about today. So the context for this work, or the motivation for it, comes from the foundations of cryptography. So to put this into context and the interaction between the foundations of cryptography and information theory recalls the history here. So in the seminal work of Shannon in the 40s, he gave the first mathematically rigorous treatment of cryptography, saying what one might mean by security and cryptography, what one wants to achieve. But mostly came to negative conclusions, saying that in the standard settings that one might think about on a standard insecure communication channel, cryptography is basically impractical. If you have two people who want to communicate in information theoretic sense, the total amount of keys they need to share has to be at least as long as all the data they ever want to communicate with each other. So basically, this is a negative conclusion. And hinted already in Shannon's paper are potential ways of getting around it. And certainly, there were some ideas in other people's minds around the same time. If people who saw the recently found letter of John Nash to the NSA, to get around this, the idea was to try and base cryptography, not trying to design systems that are impossible to break, but instead design ones that are computationally infeasible to break. And so even though this idea was around earlier, it really didn't take off until the work of Diffie and Hellman in the mid-70s who suggested to really and gave a serious start to basing cryptography on the emerging field of computational complexity. And so here, the idea, again, is to assume your adversary has limited computational resources, a lot of computational resources, but some reasonable bound on it. And the second idea is to base cryptography on problems that are believed to be computationally hard. So try and design your crypto system so that breaking it involves solving some problem that we believe to be computationally hard. And computational complexity was starting to generate examples of such problems. And in addition to starting to put cryptography, turn cryptography from an art into a science, this also enabled thinking about things in cryptography that people hadn't even conceived of before, like public key cryptography, digital signatures, and many other things. OK, so what is the most basic kind of computationally hard problem we can base cryptography on? What was suggested by Diffie and Hellman was the concept of a one-way function. And that's a function that's easy to compute in the forward direction, but hard to invert. So a candidate example is the multiplication function. So multiplying two numbers is easy using the grade school algorithm. You can multiply very large numbers very quickly. But inverting this function is the problem of integer factorization, for which we don't know any fast algorithms and whose complexity seems to grow very quickly as the size of the numbers gets bigger. So formally, one-way function is a function, say, from bit strings to bit strings of n bits going to n bits for simplicity. Think of n as a growing parameter. So we really have a function for every length n, but it'll be used to pin down a particular length that we're interested in. We're interested in asymptotics as n goes to infinity. f should be computable in polynomial time, some fixed power of n. And that's the easiness of computing f. And the hardness of inversion is in the following strong average case sense. So if I pick an input, an n-bit input, uniformly at random, evaluate the function. So if I now run any feasible algorithm, say polynomial time algorithm runs in some polynomial time in n, the probability that it inverts the function should be going to zero very quickly and faster than 1 over any polynomial on n. And inverting the function, I want to stress here, means finding any preimage of the point f of x. We aren't going to assume it's important for the results I'll be talking about to be interesting. We aren't going to assume the function is 1 to 1, except when I say so in the talk. OK, so this is a very, that there exists one-way functions, a very, it's a very simply stated assumption, and it's a very plausible one. We can't hope to, or we don't know how to prove it, proving that one-way functions exist would, in particular, involve resolving the p versus np problem. But one-way functions seem to be everywhere. So I mean, this factorization is just one example, but it seems that if you throw together enough simple but random-like operations, it's almost hard not to construct a one-way function. Because all you need to do is create a process that's easy to carry forward but hard to reverse. OK, so this is a very simple and plausible assumption, but surprisingly, a huge amount of cryptography can be based on this very simple assumption that one-way functions exist. And so from assuming the existence of one-way functions exist, you can build all kinds of basic but qualitative cryptographic primitives, pseudorandom generators that I'll talk about in some depth, certain kinds of collision-resistant hash functions, things called commitment schemes, pseudorandom functions. And then with these, this is all through a series of works in the mid to late 80s, do solve a lot of interesting, from the applications point of view, cryptographic problems. You can basically solve every problem of private key cryptography, construct amazing things called zero-knowledge proofs, where I can prove something to you with you learning nothing else, other than the fact that what I'm proving is true. Even construct public key sorts of objects like digital signatures and build secure protocols for very complicated tasks. So basically any efficiently computable function, n parties, a number of parties can get together and compute a joint function, securely compute a joint function of their inputs, such that as long as the majority of them are honest and following the protocol, no one will learn anything about anyone else's input, except what's implied by the output of the function. Really amazing things you can do, all just assuming that there exists a one-way function. It's important to remark, there are some things that are not on this picture. So there is a fair amount of cryptography that we don't know how to do. And actually there's good evidence that we can't do from one-way functions. Public key encryption is one example, and there are a number of other examples as well. But it is quite surprising how much you can do from one-way functions. Okay, so the work I wanna be talking about is motivating by trying to understand how is this possible? How can we really understanding why we can do this? So we're really interested in understanding this first layer here of how we turn this sort of raw hardness of a one-way function that's present in a one-way function, which may be very unstructured, and turn it into this first level of basic cryptographic primitives that give you very sort of qualitative guarantees. And this is really the hardest part in some sense in this picture, is this going from one-way functions to the first level of cryptographic primitives here. So you sort of take this unstructured hardness of a one-way function and turn it to something more structured from which you can do much more qualitative reasoning to build more sophisticated cryptographic protocols. Okay, so that's the question we're interested in. And the answer that seems to be emerging and that I want to give you a sense of in the talk is involves computational entropy. And the answer is understanding that every cryptographic primitive, its security guarantees can be understood in terms of some notion of computational entropy. Okay, and I'll say what I mean by computational entropy shortly. And second, to be able to show that already directly in a one-way function, we can see that there is some small amount of this computational entropy already present directly from the one-wayness property of the one-way function. And so constructing complicated cryptographic primitives from one-way functions involves manipulating, finding ways to amplify and manipulate this little bit of computational entropy that's already present in a one-way function. Okay, so before talking about computational entropy, I should review what is entropy itself. So Shannon's notion of the entropy in a discrete random variable X is given by the formula above here. You take a random sample from X and take the expectation of log of one over the probability mass of that sample. Okay, we won't be saying anything that really involves in this talk getting into the details of this formula. You should think of the entropy of X, H of X is measuring the number of bits of randomness in the random variable X on average. Okay, so if I toss n coins together, they'll have n bits of randomness intuitively and indeed the entropy of n coin tosses is n. Okay, so one thing that is useful to know is the entropy is between zero and log of the support size, the number of elements of non-zero probability under the random variable with equality here. So entropy is zero if and only if the random variable puts all its mass on a single point and it's log of the support if and only if the random variable is uniform on its support. Okay, and all logs in the talk are base two. And this also will also refer to conditional entropy, Shannon's notion of conditional entropy, if I have two jointly distributed random variables X and Y, the entropy of X given Y is the average over samples from Y of the entropy of X conditioned on that sample of Y. Okay, and I'll briefly at times mention there's some kind of more worst case analogs of entropy. Shannon entropy, I said you can think of as measuring the amount of randomness in X on average. There are these worst case notions, mean entropy, which replaces the expectation with a minimum and max entropy, which is just that upper bound on the entropy that I mentioned earlier, log of the size of support and the Shannon entropy is sandwiched between the min entropy and the max entropy. Okay, so all right, so now what do we mean by computational entropy? The basic idea, and as we'll see in the talk, there are a number of different notions of computational entropy. Okay, and part of the work here is to try and understand the relationships between these different notions of computational entropy. But generally, we're using the term computational entropy to mean that an algorithm with bounded computational resources, for example, a polynomial time algorithm, may perceive the entropy in a random variable to be very different than its Shannon entropy. Perceive is in quotes because again, we have different notions of computational entropy. So the first kind of simplest example of this is a random variable X that's the output of a pseudo-random generator. And for those who haven't seen what a pseudo-random generator is, I'll just quickly review it. It's a beautiful concept from Blum, Macaulay, and Yao and 20 years ago, 1982. And as I was preparing this talk, I was amazed at how many of my citations were from, of the fundamental things were from 1982. What? I'm getting old. 30 years ago. Then it was just an amazing year for theoretical computer science, in particular, these areas. Okay, so what is a pseudo-random generator? So it's an efficiently computable, say polynomial time computable function. The stretch is some, a short, truly random seed, say M bits, to a longer string that can't be truly random because you're generating many bits out of a small number of random bits, and deterministically. But that should be pseudo-random, should look random in some sense. Okay, and what does pseudo-random mean? It says it means that the output of the pseudo-random generator, when I evaluate it on a, so UM is a uniformly random M bit string, on a uniformly random seed, should be what's called computationally indistinguishable from a uniformly random N bit string. Okay, what is computationally indistinguishable mean? It means that no feasible, say polynomial time statistical test should be able to tell the part. Okay, that is for every polynomial time algorithm, if you give it a sample of, so here, I say I have any polynomial time algorithm T, you give it a random output of the pseudo-random generator that is evaluated on a truly random seed, or I give it in truly random bits, the probability that it accepts, all right, it says outputs one, for example, should be approximately the same in both cases. Okay, and the thing that's really powerful with this is that we're not saying anything, we're not writing down a fixed finite list of tests that we wanna try, which is a sort of traditional approach to pseudo-randomness, we're requiring this for any feasible, computationally feasible test. For every computationally feasible test, even ones that take much more time to compute than the running time of the pseudo-random generator. Okay, so intuitively, the output of the pseudo-random generator on a truly random seed is as good as truly random bits for any computationally feasible purpose. Okay, so this is a very strong definition of pseudo-randomness. The question is whether it's achievable and the answer seems to be yes, but like I said before, we can't prove it unconditionally because it would involve resolving the P versus NP question. So, but just to make things concrete, it's all very abstract. Here's a concrete example, the Blum-Blum-Schub generator. Its seed is a random composite number, say obtained by choosing two large random primes and multiplying them with each other. Actually, they should be congruent to three mod four, but not worry about that detail. And a random element modulo N, relatively prime to N. And all you do is you take X and you take its least significant bit, whether it's even or odd as an integer. And then you square it modulo N, take the least significant bit, and you square it again, take the least significant bit, and you keep doing this and just produce a sequence, a long sequence of bits, as many as many as you want. And this is known to be a pseudorandom generator if factoring, probably if and only if, yeah, factoring integers is hard. These are results of Alexi Chor, Goldreich, and Schnorr. Also, these papers are also from 1982. Okay, so this is great, a very simple construction. And what's amazing, you can prove that if you can tell these bits apart by any efficient test from truly random bits, then you can use it to factor large integers. So this is the kind of thing that we're, kind of result that we're interested in, but we don't want to just assume that factoring integers is hard, as I said earlier. We want to be able to just assume that we have a one-way function. All right, so if factoring integers turns out to be easy, we can replace it with some other one-way, conjectured one-way function. Okay, so we can't exploit, like this generator does, the number theoretic structure in a function, like the factoring function. Okay, so pseudorandom generators, this definition came out of considerations of what you want from a pseudorandom generator in cryptography, but they've turned out to have impact in lots of other areas of theoretical and pure science, understanding the power of, the complexity, theoretic power of randomized algorithms versus deterministic algorithms, understanding what problems are hard in machine, for machine learning, understanding why proving circuit lower bounds is hard. And this is, for me, much of the reason for studying these problems in the foundations of cryptography is that they seem to provide insight into lots of other things that we're interested in theoretical computer science. Okay, so now back to computational entropy. All right, so why is a pseudorandom generator giving us some kind of computational entropy? Consider the output of the generator on a random seed. By definition, this is supposed to be, you can't, indistinguishable, you can't tell it apart from n random bits, from, okay, so it's indistinguishable from having entropy n, but it's Shannon entropy in the information theoretic sense is that most m, because it's generating, by applying a function to n bit strings. Okay, so in particular, support size is at most two to the m. Okay, so this is a kind of the first sense and a qualitative one. Here we're really talking about the extreme of entropy. When your n random bits, you don't need the general quantitative, and your support size is small, you don't need the general quantitative definition of entropy. But it turns out to be useful to have a more general quantitative formulation. This was given by Hofstad and Paliazzo-Levin and Lubbi, and they gave us more general definition that a random variable x, we say it as pseudoentropy k, or at least k, if there exists a random variable y that's indistinguishable from x, in the same sense as before, no polynomial time algorithm can tell them apart, except with small probability. And the entropy of y is bigger than or equal to k. All right, so your pseudoentropy is k, if you're indistinguishable from something with entropy k. And for technical reasons that I won't get into in the talk, this definition turns out to only be interesting really as a lower bound on pseudoentropy. That's why I say the pseudoentropy is at least k. It doesn't make sense, somehow you end up with an uninteresting notion if you talk about being indistinguishable from distributions of random variables of low entropy. In fact, every random, turns out that every random variable is indistinguishable from something of very low entropy. And don't worry about why that's true. Okay, so this definition is interesting when k is bigger than the entropy of x, okay? So we have a gap, the pseudoentropy is larger than the actual Shannon entropy in the random variable x. Okay, so this is a very nice definition, but what is it good for? Well, it turns out to be used in the seminal result of Hosta et al, showing that from any one-way function you can construct a pseudo-random generator. Okay, and a very high-level and very oversimplified picture for how their proof of this result goes is from a one-way function, they show how to construct an efficiently sampleable random variable x, meaning one where you can efficiently generate samples from it, where the pseudoentropy is slightly larger than its Shannon entropy. Noticeably, but just by a small amount. And then by replacing x with many independent samples of x, concatenating many independent samples of x together, this entropy on average, turn these entropies on average, that we were talking about Shannon entropy and pseudo-indistinguishability from Shannon entropy, turn into the worst case notions that I mentioned earlier. So x ends up being indistinguishable from having high min entropy. And even if you compare it to essentially the support size of the new random variable. And moreover, the gap in entropies goes from being small to very large as you take many copies. And then by some application of hashing, universal hashing, you can turn this having pseudo-min entropy into pseudo-randomness that you want in the output of a pseudo-random generator and having small support intuitively into having a small seed for your pseudo-random generator. Okay, so this is kind of the intuition between the Hassel et al. result. They turn out to be a number of technical complications that come up there that make the construction more involved in that than this picture indicates. But now in using the kind of the results that I'll be talking about, we actually have a construction that fits this explanation much more directly. Okay, so what I want to now talk about is give some intuition for this first part here. And that's really where I'll focus my attention. How is it from a one-way function where we just have this property of being hard to invert, we can get this entropy versus pseudo-entropy kind of gap. Okay, so for now, let's restrict our attention to one-to-one one-way functions. It simplifies things and later I'll mention about generalizing, not assuming the function is one-injective. Okay, so we have one-to-one one-way function and consider a uniformly random input. Okay, since the function is one-to-one, if I look at the entropy of X given F of X, that's zero because X is determined by F of X since F is one-to-one, okay? On the other hand, the function is hard to invert. So given F of X, it's hard to produce X. You can't do it except with negligible probability. This is notation meaning vanishing faster than any polynomial in N. Okay, so intuitively, this unpredictability of X from F of X should correspond to some kind of entropy in a computational sense, right? So X is determined from F of X, so information theoretically, no entropy, but it's hard to predict X from F of X, so in some computational sense there is entropy. So let's try and see what the right information theoretic analog of this unpredictability is, all right? So consider, suppose we had two jointly distributed random variables, X and Y, and we had the property that for every function A, the probability I can predict X from Y, A of Y equals X is at most P, okay? And here, when I'm moving through the information theoretic context, I'm not putting any time constraint on any computational constraint on the function A, I'm considering all functions. All right, so this does turn out to correspond to a very natural entropy notion. It was given a name by Dotus et al, Dotus Ostrovsky, and Smith, they called, they defined a notion called average min entropy, which happens to be equivalent. It can be written in a way that looks more like entropy, just changing where, like min entropy, where you're just putting expectations and minimums and logs in the right place, you get a notion that's equivalent to this, and this corresponds to having average min entropy log one over P in their definition. X has average min entropy log one over P given Y, at least. And this turns out to be like min entropy stronger than Shannon entropy, this turns out to be stronger than conditional Shannon entropy. This implies that the Shannon entropy of X given Y is at least log one over P. Okay, so now we wanna try and draw the analogy taking Y to be F of X, the output of the generator. So it turns out there was a, one can think of this condition as some computational form of entropy, and it was given a name by Shaw et al, called Unpredictability Entropy. And just by analogy with what's on the right-hand side, I take the logarithm of this prediction probability. And so log of that is something growing faster than any function growing faster than log N. So X has some, this Unpredictability Entropy given F of X. And now the question is, by analogy with this last implication, we like to conclude that X has pseudo entropy and is indistinguishable from something of entropy super logarithmic in N given F of X. Okay, so that's the question. It'd be great if the answer were yes to that. Unfortunate, what? N. Oh, N, N, what is N? N is the input length we're looking at, yeah. And when I say poly time, it's all polynomial in N. Yeah, please stop me with any, if anything's unclear. Okay, unfortunately this turns out to be just false. Right, this is, let's not read the text there. Just say at a more intuitive level, once I'm given F of X, okay, I can easily tell X apart from anything with any kind of randomness in it. Because F is an efficiently computable function. So if you give me something and I'm trying to tell whether it's X or something else, all you do is apply F to it and see if it equals the value F of X that I'm given. Okay, so given F of X, X is very distinguishable from everything of non-zero entropy, okay? So this is showing that something is very interesting about computational notions of entropy are relationships that hold in the information theoretic setting where you don't have computational resources can sometimes go disappear and don't hold anymore when you think of computationally bounded analogs of them. So this is one implication that held in information theoretic setting but doesn't hold in the computational setting. Okay, so our challenges are, okay, so X didn't have pseudo entropy given F of X but maybe we have some other way we can convert this unpredictability, maybe by doing some extra work into pseudo entropy. And the second thing is what to do when the function's not one to one. Okay, so here I talked about the unpredictability of X given F of X and when F is not one to one this can really be trivial. It can be hard to predict X from F of X just because there are lots of strings that mapped to F of X. And I can't predict the one that you actually use just because they're, for information theoretic reasons, given F of X it could be any of them with equal probability. Okay, so we need to reason in some better way. We can't just talk about unpredictability in this sense when we want to handle functions that aren't one to one. Okay, so how do we deal with this? So how did Haas started all deal with this? They used some kind, they showed by some kind of hashing, you can turn this unpredictability into a gap between pseudo entropy and Shannon entropy. The details aren't important, but basically they choose a certain kind of hash function and a random number J from one to N and hash, given they take F of X and a hash of some random number of bits of X. Okay, hash out some random number of bits of X and intuitively what this hashing accomplishes is first it deals with the problem that F may not be one to one by the different pre-images of F of X will hash to different places, okay? And then you get some additional pseudo entropy from the remaining unpredictability, the hardness of inversion inverting F, okay? So let's not worry about it, they show this random variable that obtained by F and some hashing has pseudo entropy larger than it's Shannon entropy by a small amount, all right? A little bit more than one over N, a little bit more than log N over N, okay? And not any old hash function works here, you need it for experts, you need the hash function to support to basically be hardcore functions in the Goldwright 11 cents, but that's not, if you don't know that terminology, it's not needed for the rest of the talk. Okay, so the new result coming from works with Haydener and Rheingold and as stated here is this work with, new work with Colin says, actually we're not gonna do any hashing at all, we'll just look at F of X together with the bits of X, okay? And this has what we call next bit pseudo entropy, which I will define on the next slide. N, all right, so notice that the Shannon entropy of this is exactly N, because we have X, which is N bits of entropy, and F of X, because X is a uniformly random N bit string, all right, so it's same as up here, and F is a function of that. It has N plus something super logarithmic in N. Okay, just like what we were hoping for and expecting from the unpredictability. Okay, and nice, the usefulness of this as compared to the Hasseler result. First, there's no hashing involved, really directly in the one-way function, looking at it the right way, we're getting some form of pseudo entropy. And the other thing that's useful is that we're in the Hasseler construction, you compare the pseudo entropy to the Shannon entropy, but you don't actually know how much Shannon entropy and it may be hard to compute how much Shannon entropy is in this random variable W. They don't give you any guarantee that you can compute that quantity, and that leads to many of the technical difficulties in their work. Here, we're saying exactly how much pseudo entropy we can expect, because we know exactly how much Shannon entropy there is here. And finally, the gap is bigger, it's really what we were hoping for, log N instead of log N over N. So what is this notion of an expert pseudo entropy? So note, this is exactly the same example I gave before with f of x and x. This does not have more than N bits of pseudo entropy in the original sense. For the same reason that I said before, that if I'm given a pair, I can tell whether it's of the form f of x, x, by just applying the function f. All right, so what we're doing is relaxing the notion of pseudo entropy, and the notion that we get is this notion of next bit pseudo entropy, where what I do is I imagine I'm giving someone an algorithm the elements of this tuple one at a time. So you give it f of x, and now ask how much entropy does it look like x one has given f of x, right? Nor consider the next prefix f of x and x one, how much entropy does x two look like it has, and so on. So always looking at the entropy, how much pseudo entropy the next bit has given the prefix. Formally, we want there to be random variables y one through y n on the same probability space, so that for every i, y i is indistinguishable from x i, even given f of x, x one up to x i minus one. And if I sum up the conditional entropies of the y's, along with the entropy of the first component, I get this n plus something super logarithmic again. So if y one was a truly random bit, I mean that x one is the hardcore bit of the next. That's right. Well, but it could be for information theoretic reasons. So in the case that f is not one to one. Yeah, but in the case it is one to one. In the case it is, that's exactly what it means. It's the generalization of the hardcore bit concept. And also a generalization of the sort of the Blum-McAuley notion of next bit pseudo-randomness, okay? So, all right, but what happens here that's different from the case that Dan was mentioning. So one can think of the extreme case of this where you ask that the bits one at a time are indistinguishable from random bits, uniformly random bits, not just indistinguishable from having a significant amount of entropy. In this case, it does turn out to be equivalent to pseudo-randomness, maybe the entire thing being indistinguishable from truly random. And what we're gaining from here is that when we talk about pseudo-entropy, the two notions are different from each other. And this is an example. In the bottom line, that is true entropy. This is true entropy, and right. And being indistinguishable, so xi is indistinguishable from this thing that is true entropy given the prefix, right. So the yis are allowed to be correlated with that? Yes, they probably have to be. Except in the extreme case. Certainly for the result, they have to be correlated. Are they sampleable? That's a more technical question. All right, short answer is no. But they're not too far from sampleable and some, yeah, yeah, maybe we can talk about it afterwards. All right, any other questions? Okay. Okay, so what are some consequences of this? So this result, the simpler, this next bit, this way of getting next bit of entropy has led to a significantly simpler and more efficient construction of pseudo-random generators from one-way functions. And also, and this translates to some quantitative improvement, so in the original result of Haastad et al, if you have a one-way function on n-bit inputs, it'll give you a pseudo-random generator that requires a seed of something like n to the eighth. Okay, so while it's really important theoretical result for understanding what's possible in principle, not too complicated and inefficient to be, to have any hope of being used in practice, now we can bring with this simpler construction, the seed length down to n cubed, which still is not efficient enough for practice, but at least is bringing things into the realm of reasonability. And only two more ends to go till, yeah. Okay, so the output length will be proportional to the number of times you've invoked the one-way function times log n or something. So it won't be a large multiple of this, but there'll be a large additive stretch. I don't wanna get too much into that. So if the YIs are sampleable, would it reduce down to like a hard-core bit kind of thing because you can get to the end of the place and the true randomness? I think there'll still be this gap. Yeah. Okay, so let me check. Is there a clock in this room? Check. Do we have time? Five, 13. Five, 13? Yes, 15 minutes. Okay, great. Okay, so maybe I'll just say the key lemma or theorem that goes into the result I just mentioned. And as the examples from before suggest, what we need is some new way of relating pseudo-entropy and some kind of unpredictability. And it turns out we can give really a tight characterization. This is in the work with Colin. All right, so I have two jointly distributed random variables. Y, think of that as n bits, and Z should be short. It could be one bit or a logarithmic number of bits, but it shouldn't be a lot of bits. And we're interested in comparing the entropy of Z given Y to the predictability of Z from Y. Okay, and the result says that the Z has delta bits more of pseudo-entropy given Y than its real entropy. All right, the gap between its pseudo-entropy and its Shannon entropy is at least delta if and only if no efficient algorithm, all right, just parsing the notation here, no efficient algorithm, A, can predict Z from Y at distance at most delta, and distance is measured in KL divergence, okay? So think of it, algorithm A is trying to predict Z from Y and we're measuring its error by some kind of distance between the distribution YZ and YA of Z. All right, the particular one is KL divergence, but just imagine some reasonable notion of measuring similarity between distributions, okay? So, and really, maybe the simplest case to think of and intuition is where Z is determined from Y, so Z is a function of Y, so we're saying pseudo-entropy at least delta is hard to predict with divergence at most delta. This is a theorem for pseudo-entropy for the next bit. This is for hill pseudo-entropy, conditional pseudo-entropy, all right? So I'm asking YZ should be indistinguishable from some YZ prime, where Z prime has a lot of entropy given Z. This is a theorem, yeah, for hill pseudo-entropy. And so it's an exact characterization, so one note is that when this thing makes a lot of sense even when Z is not determined from Y, which is a case that we're interested in when the function, one-way function is not one-to-one, in that case we look at the gap with the actual entropy, and here the task of A is not just to predict Z from Y, but try and sample the right distribution of Z given Y, and we're measuring how well it does that, again in KL divergence, okay? Given this equivalence, all right, and so you might wonder how do we get around, how does this get around the bad example I said before that unpredictability doesn't imply pseudo-entropy in the hill sense? And it turns out this is coming from the restriction that Z is short, is from a small alphabet, doesn't have a lot of information in it. So in that previous example, Y was F of X and Z was the pre-image, was X, the bad example. Here we're requiring Z to be short, and that's what makes somehow the bad example go away. It's exactly delta in both cases, not like delta over two or more. Exactly delta in both cases. Well, this definition we made, so we made the second definition. We came up with a notion of unpredictability that we could relate to pseudo-entropy in an exact way, so. Okay, there are some plus or minus, I should say the deltas are the same, but there are some plus or minus negligible terms that are going to zero faster than one over any polynomial in N, okay? But we think of delta, the delta that we're interested in is we're thinking of as a fixed constant or something much larger. In other words, it's less than or equal to tilde. Right, right. Okay, another remark for maybe for experts, this can be viewed as an analog of the Impaiyatso-Hardcore theorem for Shannon entropy rather than Min entropy. This may even be cryptic to the experts, but if anyone's interested in seeing what this has to do with the Hardcore theorem, maybe Colin's theory lunch talk next week would say something about that. Okay, so I'm gonna skip how we use that to argue that this f of x, x1 to xn has this notion of next bit pseudo-entropy. But hopefully from before it was clear that what we wanted to do to prove something like that is somehow relate to some kind of unpredictability on pseudo-entropy, which the theorem does for us. And this fact that I mentioned about having to restrict to short z's, so looking at pseudo-entropy and unpredictability of short things explains why we have to sort of the move, the transition from pseudo-entropy to next bit pseudo-entropy where we look at individual bits. Thank you, we'll make use of the first step alone. What? And how we go to KL divergence from this theorem unpredictability. So basically if the divergence is O of log n, you can just, it's just an information theoretic fact that you must be finding an inverse with probability two to the minus that. Okay, so what I talked about with this part of the picture, how by a new understanding the right form of computational entropy that's already present in a one-way function, we get a much simpler construction and understand better the construction of pseudo-random generators from one-way functions. What about the, I was gonna say, and all right, so doing that now really follows the outline that I mentioned earlier of the hostile at all work, but I think I don't wanna say more about that. What I do wanna say more about is this right side of the picture where what about the other things here? And it turns out for the rest of this picture, which intuitively corresponds to security conditions in cryptography that don't have to do with secrecy, which is what the pseudo-randomness relates to, but have more to do with unfurlurability, producing, generating things that you're not supposed to, finding collisions in hash functions or forging digital signatures and so on. And for this, we have to look for, identify and find a different form of computational entropy present in a one-way function. And I'd like to just illustrate that notion. And this is actually, this came first, so as to say a little bit of kind of what happened here historically. So we had a construction paper with Haydnir, Noyan, Ong, and Reingold from a few years ago, which was closed one of this sort of,