 You can hear me better. So now I'll move to Spanish. So actually, I had this plan of saying, OK, I'll give the talk in Spanish, and actually start giving two, three minutes in Spanish and see how the audience reacts. But these are the jokes that only I appreciate. Anyway, so the subject is key derivation, an extraction. Specifically, there is this function that we call HKDF, which is a key derivation function that is becoming quite popular in use, in particular, the upcoming revision of the TLS protocol. TLS 1.3 is going to use that. And Google has been using it. Snowden has been using this because it's part of the signal protocol that he likes. So I'm going to ask for a recommendation letter from him. Anyway, so but the idea really is to stress the conceptual points and the relation between theory and practice in cryptography, which I think is an important topic, and very importantly, how much the theoretical concepts and sometimes the theoretical techniques actually allow us to build things in practice. So as I said, you had enough theory so far, even though in some sense, there is never too much theory, especially this beautiful, beautiful stuff. So we're talking about KDF, key derivation functions as a central tool in cryptography. And you can ask how much theory you need to build these kind of things. And the truth is that if you want to build bad ones, then you're not near the theory. But if you want to build them good and sound and with the fundament. And actually, we have a good track record of designing things that the previous stuff is broken and our stuff is not broken. So I think even the practitioners are getting a little bit more of respect for the theoreticians. Definitely much more today than they had 20 years ago. And we are using theory not necessarily at this pure, I mean, full theoretical level. But we use actually theory as our advisor when we build this stuff. Even when we are making some idealized assumptions, still the theory allows us to understand better what we are doing. So we are focused on the needs of the applications, the advice of theory, and the schemes that we build. Now, several semi-philosophical things. We are used to read or write our own papers with an introduction that always explains how important is what we do for practice or how motivated the problems are from practice. So that is nice and that's something that is one of the interesting aspects of cryptography. But to really be practical in what you do, it's not enough that the most activation is practical. It's also that the product, the algorithm, the techniques that you build are actually themselves practical and can be deployable. Now, it would be very bad for the field if everyone working on the field will be only thinking about what is necessary now in practice or in the next five years. But I think that we lack also, I mean we are going too much in the direction of only thinking about the more far away future and we would need more people being more interested and involved in developing solutions. And by the way, solutions that you build today is not for today. Cryptography is up for the next 20 years because once things get in standards and stuff like that, they will be there for a long time. But of course, the main message I always give to the practical people is that a cryptographic scheme without the proof is nothing. Absolutely nothing. There's nothing to say about the cryptographic scheme that doesn't have a proof because what can you say? I mean, the compression algorithm, you can say, why I'm 10% better than everyone else, have no idea why, but that's it. I mean, that's what we care at the end of the day. Here there is nothing like that that you can say. So proofs are very important. There is this, I like to use this proof driven design in which we design protocols, functions, algorithms in a way that together with the way you design it at the same time you're trying to prove the thing and this interaction between proofs and design actually give us much better results than say designing something and then sitting down and proving that even though in my own experience I've done the things because I've been developing things before we had the formalisms to judge stuff. But today we have a very rich theory including theory that is applicable to practice. Anyway, I'll definitely stress the conceptual side of things and the high level, the technical details you can actually read in the papers. Okay, so what are key derivation functions? So key derivation function is really fundamental primitive in applied cryptography is a process that is just a transformation of some given source of key material presented in some form and the output is supposed to be a good strong pseudorandom key. Usually or many times you need more than one key but once you have a strong key you can use it with a pseudorandom generator or a pseudorandom function to generate more bits so really the core of the thing is to extract I'm using the word extract here a first stronger key and the uses, you have uses with key expansion which is starting from one key produce many keys key extraction starting from a source of imperfect randomness produce a first good key. There are key hierarchies. Now they are used in key exchange protocols where for example with Diffie-Hellman you have to extract the key from Diffie-Hellman hybrid encryption, it's something similar key wrapping techniques in which you have to transport keys so sometimes you transport something which is less good than just a key and then you need to derive more keys. We use them to extract keys for imperfect sources of randomness for random number generators or pseudorandom number generator passwords that a lot of stuff that takes a password and derive a key so there are many uses but so okay so these are quite intuitive applications but how do we formalize these bits? And one thing that actually was central to this work I'm talking about is how do we build a single mechanism that will be good for all these very varied and heterogeneous applications and most of the slides here are from talks that I was giving in the last few years explaining really the design of this HKDF also in part trying to push this into use and into standard. Now even though it is such a central primitive there is very little work in the literature about key derivation functions of course as you'll see there are many elements that we know about that we can use but as an object or even something that was defined actually when I started looking into these really there were no definitions and little formal treatment. Basically what people were doing is just taking hash functions assuming them random oracles and applying to anything right? You have a Diffie element from which you need to extract the key just hash it or you have a random number generator with some imperfection just hash it. And hash is that random oracles sometimes work well but not always as you'll see in our analysis part of the analysis also use random oracles but our purpose is really to use always the minimal possible assumptions or idealization of these primitives. Show me what is this thing. Okay so we want the practical but theoretically well found the key derivation function we will need to develop some definitions for that. As I said we will still use in some cases idealized modeling of hash functions but with the purpose of minimizing these assumptions and again this is a typical bullet from some presentation to some bodies like the ITF or NIST at the time that I was trying to get them to standardize this in the meantime this actually quite happened and now this being used quite widely. Yeah well so let me skip this thing. So we can identify two main functionalities in a key derivation function, key expansion is the key extraction. Key expansion is the easy part is one you already have is a good cryptographic key a key for AES, a key for HMAC SHA-2 or something like that then produce more bits that you can use through a pseudo random generator or PRF that's the key expansion part. Then there is the key extraction the more complicated one in which you have to derive giving some imperfect source of key material to derive the first cryptographic key and we will see examples of sources of imperfect key material. So these are two fundamentally different functionalities. Extraction and pseudo random functions these are very different bits. The way we will treat them you'll see is we combine it in some ways but in principle these are quite different in functionality and in requirements. Now what you see here is a typical use of KDF in the regular way, the pre-theory way which is take your source and you know hash it with a counter or something like that and concatenate the values. Maybe at the end I will spend some time showing how the things compared to what we are doing and all the elements that are wrong in such a construction even though if you look at the hash as an ideal random oracle then this is fine but even that is not, that's already wrong because the hash functions the regular Shah 256 and D5 these are not random oracles. For example there is the extension attacks that the random oracle shouldn't have. So for example if you have something here that can be variable lengths that itself can already give you an attack. But anyway that is what has been what people were doing in practice. Now I said that we start with a source of key materials so examples of source of key material is in some cases your key material is already a strong pseudo random value. In that case you only need to or directly use that key or has it as a seed to a pseudo random generator or pseudo random function and create more bits. So the first one is easy. Now you have a very typical case, random number generators which many times work with some underlying hardware that produces some has hopefully some enough variability inside to have a good amount of entropy but that entropy doesn't necessarily come as a uniform string. For example the zero bit can be biased from 50%. So you need to run this through an extractor to get uniform output. That's more of the physical ones, the software ones are those that measure all kind of system elements and timings and again hoping that the variability of these events has enough entropy. They have enough surprise inside like Leo was using entropy as a measurement of surprise. Another type of source of an imperfect source of randomness is Diffie-Hellman. So in the case of Diffie-Hellman we have that picture that what it says is assume that you are working on a group of 2048 bits, modulo prime of 2048 bits but you work on a subset, on a subgroup, let's say of 256 bits, I mean of a subgroup, cyclic subgroup of prime order where the prime is 256 bits. So the elements that you are computing are in a very, very small subset of all the set of strings of 2048 bits. But so definitely very, very not uniform in 2048 bits. On the other hand, if we assume if we assume Diffie-H, then Diffie-H will tell us that that subset is actually indistinguishable. I mean the any element that Diffie-Hellman value that you produce in that group will be indistinguishable from random, from a uniform distribution on that group, which means basically you have about 256 bits of entropy. What kind of entropy? That is it's hill entropy because basically when I ask you what's the entropy of a Diffie-Hellman value, the observer sees g to the x and g to the y and the uncertainty is about what's the value of g to the xy. Now from the g to the x and g to the y, if you are an unbounded adversary, you can learn g to the xy, right? g to the x, g to the y determines uniquely g to the xy. Therefore, the statistical entropy is zero, but the hill entropy is 256 bits because these values are taken from a distribution that is indistinguishable from a distribution with 256 bits of entropy. So yeah, so g to the xy has 250 bits of entropy trapped inside the 2048, a long number. So it's very non-uniform in the total group, in the total set, very, very tiny subset, and yet it has that 250 bits of entropy that we would like to convert into a pseudo-random output through some extraction. So as I said, the statistical entropy is zero. Computationally by DDH, the attacker has no information on g to the xy, I mean it's indistinguishable and any element in the subgroup. Now this is something I think Leo talk about that, how do we know that we can actually have computational entropy from DDH? So there is this paper from some 12 years ago in which we analyze really how much entropy there is in DDH groups and we characterize which groups and how much entropy. Let me see. Now, when you have DDH, you have hill entropy and it's enough for us to extract. But what happens is you are working on a group that you cannot assume DDH, for example in a bilinear group. Then how do you extract bits from g to the xy? So now g to the xy, you know that it's hard to compute g to the xy, but there is no entropy in the sense of hill entropy. So in these cases we will need to extract not from hill entropy, but from unpredictability entropy. And this is a tougher case because we need to build some form of hardcore and all of these things have hardcore bits, Goldreich Levin is a hardcore bit for any function also for this one, there are also some hardcore functions that are built for in some specific groups. But since we want to do something with this generic that we use the same thing for when there is DDH and when there is no DDH, then we will have a problem dealing with this case. Another issue where we will encounter some challenges in these practical settings is that when we talk about extractors, then you can extract from a sample in a source and you can extract from two samples or three samples or whatever. But what you are assuming is that all these samples go by the source and they're independent of each other. In practice, sometimes you may find things that are related one to the other. For example, in Ike, which is the IP6 kick change protocol. So for example, one party chooses X, the other party chooses Y. Now the party that chooses Y is lazy. So next time, instead of choosing a random Y, they use Y plus one. Y plus one, because with Y plus one, you get from G to the XY to G to the X times Y plus one, you get it just by multiplying by G instead of spending an exponentization operation which is more costly. So this is an example. Now you are going to extract from G to the XY and from G to the XY plus one, clearly these two elements are not independent of each other. So that's another place where we will have to deal with an imperfect application of extractors. Okay, so going back to the source of randomness, our source key of key material. So we will assume imperfection in sense of either non-uniform, computational or not computational, but also maybe even things that have enough entropy and maybe even uniform, but the attacker has some information about it, okay? So the Diffie-Hellman again is a case in which the attacker doesn't know about G to the XY, but it knows G to the X and G to the Y and that reduces the entropy in the case of statistical entropy, it reduces to zero. But we will usually assume that there will be enough entropy in our sources, the entropy will be conditional on the knowledge by the attacker. We will be working with mean entropy, sometimes computational mean entropy, but because of the same reason that extractors use mean entropy, that the same reason will be here. And the reason is that if your source comes with an element, which is of high probability, probability half, then doesn't matter what your extractor does, your extractor will copy that element to something of probability at least half and then it will be far away from uniform. These differently than Shannon entropy where you can have an element of entropy of probability half and still have a good amount of Shannon entropy. So we'll be using mean entropy. As I said, entropy can be statistical, it can be computational, either heal entropy, which is in distinguishing from a high entropy source like in the DDH case or unpredictability entropy that comes from the uncertainty, but in the sense of unpredictability or uninvertibility. So, let's see. Okay, I actually said the definition, I don't think I have to go over the definition, we've seen enough of it this week, but in any case, let me say that once is that mean entropy is the negative log based to of the highest probability in the distribution, in the source that we have as an input. And we will most of the time talking in our examples at least on things that have computational mean entropy. Okay, so we said that we have two modules. One is key extraction, which is taking the source, I mean, some key material from a given source. We will take that source, sample it, give the sample as an input to the extractor, which will produce for us a key. Again, that source can be the output of a random number generator, can be the output of a Diffie-Hellman operation, can be some output of some lattice-based key exchange. I mean, anything that, we are going to define one function that is supposed to work with all these cases. And that key that will be output here by the extractor will be the key that will be used to bootstrap the key expansion module, which is the simpler one. It's the one that takes already good key and uses it as a seed to a pseudo-random generator or a pseudo-random permutation. Actually, we will favor, we will not use the output from the extractor as a pseudo-random generator, but rather as a pseudo-random function. Why? Because in the expansion part, we will want to have additional inputs. And the additional inputs come in the form of a context. Now, in practical application, this context is very important because what happens is that what you are going to do is that from one, you sample the thing, you extract the key. Now from that key, you are going to derive sometimes just one key and that's it, but many times multiple keys. In protocols like IC or TLS, you'll see you have derivation of multiple keys and you want to attach the derivation of a key to bind it to some context. Where the context is some specific protocol, maybe identities of the parties that are producing that key, the algorithm for which that key will be used. So that part of context who's specific instantiation, we don't care about it at the level of things that we are talking about here, but actually in these applications is a fundamental element and that's the reason that after we extracted the first key, instead of using it as a generator, we'll use it as input to a pseudo-random function, keyed with the output of the extractor. Anyway, you can stop me and ask questions or whatever, whenever you want. Okay, so I call this approach extract then expand, which is a natural thing to call it like that. And by separating these two things, then we can actually concentrate on the analysis or the design or the instantiation of these modules independently, okay? So we, before when you had a hash that hashed the sample and had some counter, basically the source, the extraction, the expansion was all mixed into one thing, we are separating it and that proves not good also for analysis, but also in actual applications and implementations. And I will describe a particular instantiation of the extract and expand, then expand approach by this function called HKDF, which is sometimes people think that it's for Hugo, but not it's for hash, okay? So don't get confused about that. Now, I want to know how many people know what HMAC is? How many people can write down the formula of HMAC? Okay, one person, one, too bad he's not a student, otherwise I would give him something bad, okay? All right, so today you will have to learn more about what HMAC is than you ever wanted to know. And you'll try to forget as fast as you can. Okay, but not yet, first we need some definitions. So what is a key derivation function? So the non-formal definition is a transformation from a weak source of key material to a pseudo-random key. But the attacker, I mean we are going to run this process in front of an attacker. And the attacker is one that has some knowledge about the source distribution. And I mean, he knows what type of source is that and actually has some particular knowledge. I mean, not only that this is a defilement value, but he knows which group is that and what's the order of the group and stuff like that. Actually in some cases the attacker can influence the source, for example, if you are using a pseudo-random number generator that measures operating system events, the attacker may be able either because he is in the computer or because it interacts with the computer to actually influence this event. So we're also giving some level of freedom to the attacker to do all kind of bad things. The formal definition is quite natural, is more or less, if I give you that as an exercise, then you will end writing something same or similar. As usual, in definitions, there is always some subtleties that you have to take care of. I am not going to discuss those in detail here, but if you look at the paper, you'll see that there are some comments in which the author says, well actually I have no idea how to deal with this problem, maybe someone else will do that. But basically the basic definition is the natural one. So first of all, we define the source of king material and we define it as a two-valued probability distribution. There will be a sigma and an alpha, which is generated by an efficient algorithm. That's what the source of randomness or king material is for us. Again, it's a two-valued sigma and alpha. The reason we have sigma and alpha is because sigma will actually represent the sample that we will enter into the extractor and alpha will be auxiliary input, auxiliary information that the attacker has on how that sample was produced. For example, here in the Diffie algorithm application, alpha will be pqgg to the x, g to the y, which is all the public information the attacker has while sigma will be g to the x, y, okay? So by the way, sigma, I call it a sample or a sample of key material. So there is some mixed notations in these talks because these are slides that I took from different places. So most of the time, I don't think it will be too confusing, but if you feel confused, let me know. Anyway, in some places I will use SKM source of king material as the input source. And as I said, the alpha represent auxiliary input. It's a Diffie element in this example. Maybe the software should run on an emmer generator. Again, some of the events that are used as input as part of the source may be known to the attacker. Okay, and as I said, you can look at this paper by Barak. What is BST? Well, you see, I'll try not to try to give you names of authors in papers, okay? Because it will be always embarrassing for me to do that. Anyway, there is a paper. There is a paper whose authors initials are BST and it was written in 2002. Thank you very much. Barak, Shaltiel and Troma, thank you. And there are a couple of papers by Vgemi. So if you Google Dodie's RNG, you will find them. Actually, good papers of the last years dealing with this kind of stuff, but I will not touch on it. Okay, so that was about source of... By the way, one thing that I didn't say is, I mean, I didn't stress is that we are assuming the sources to be efficient algorithms. So we are not talking about arbitrary sources, but sources that are generated by an efficient algorithm. And by the way, the sigma, I mean, there is this little sigma, which is the samples, but there is the capital sigma, which I use as the name of the source, but also as the algorithm that produces the source. So what is a key derivation function? A key derivation function is a function that accepts as input for values sigma, which is sample from a source of King material, as we defined before. L is how many bits of output I'm asking the key derivation function to output. R and C are optional values, where R is a salt value, chosen from a specified set, I'll say that now, and C is a context string. Okay, what I said before, you know the things that you want to add about what is the purpose of this function, in what context you are deriving this stuff, but so that's less important. The salt, salt in the security world stands for a random but non-secret value. And in our case, basically the salt will be the seed for the extractor. The seed for an extractor is exactly that. It's a random value, which is not kept secret, okay? So the reason there will be salt here is because we are going to use it as a seed in the extractors that we will be. Now, this is one of the subtleties that I was talking about. You know, in extraction, it is fundamental that the seed is chosen independently from the source, okay? And it has to be the same in our applications. In some cases, we will not have the luxury of having these enforceable independent seeds from the distribution. And in these cases, for example, we'll do some idiotic things like, ah, we don't have a seed that is independent, then we will define the seed to be zero, okay? So that's where we start moving away from the purity of the nice, well-founded stuff. But even in these cases, we will try to justify and understand under what assumptions on our underlying functions, in this case, will be hash functions, in which cases actually doing that makes sense, or it's obviously broken. So we will do that too. So here is about the actual definition of security. So we'll say that the KDF is TQ epsilon secure with respect to a source sigma, if no attacker that runs time T queries, oh, actually, I think I didn't say that, but I should have said that. Let me go back to that. Okay, so actually it said here, I didn't say that before. Okay, so anyway, so the attacker runs time P, can query the function Q times. I'll show that what that means now. And at the end of the day, he wants to distinguish the output from the KDF from random and the advantage is no more than epsilon, or probability no more than half plus epsilon in the regular semantic security notion. So this is a game-based definition. So the attacker can invoke the source of randomness to produce this sigma and alpha. Now the salt is chosen from some set of values and fixed and we give now alpha and r to the attacker. Okay, yes, so the attacker doesn't learn sigma. Okay, that's the secret part, but the salt and the auxiliary input or the information on the sample is given to the attacker. Now the attacker remember that there is this context value that as I said, one of the reasons we want context is that you may want to derive more than one key from the same extracted key. And this context, we don't assume anything about that. So in particular, we allow the attacker to control it. Okay, it can choose the context as it wants. And that's where the key queries, the key queries come from the fact that the attacker can query adaptively values from the KDF. The sigma it doesn't see and it doesn't change. The R it sees doesn't change. The C, the context and the length the attacker can choose. And then it chooses some values L and C where C was not seen before. And the attacker is supposed to be able, also the attacker gets either the output of the KDF or random string and it's supposed to decide which of the cases this is. So that's a simple semantic security type definition. The subtle issues here are the issue that we have auxiliary information, the issue that the salt has to be independent of the source, which in theory is always easy to say, but in practice it's not always easy to enforce. We have the issue of the adversary I mean, influencing the algorithm that creates the source. So there is definitely not, I mean, everything is very simple until you really get and need to get to deal with all these issues. But overall, at the end of the day, the formalization is quite intuitive and simple. So notice that I defined it that a key derivation function is secure with respect to a given source, okay? So this, I mean, that's one of the ways we are going to talk about cases in which we have a specific source and we may want to build a KDF, which is specific for that source. For example, in Diffie-Hellman, you can have some hardcore functions that are good for extracting random bits, pseudo random bits from the particular Diffie-Hellman value and it's good for some type of groups, but not for others. In this case, we will be using this notion that we are interested to build extractors even for specific sources. But the most common thing that we will assume that the different case in our discussion today is this one in which the key derivation is a generic, a generic function that is able to extract random bits from any M entropy source, okay? And yeah, so we will say that it is TQ epsilon M entropy secure, if it is secure in the sense with them here for any source with mean entropy at least M, okay? Any questions? So, so the, yeah, the attacker actually, I mean, this experiment, you can repeat it, right? You are supposed to be secure against an attacker that has disabilities against independent samples, okay? And independent samples even when you keep the salt or the seed fixed, okay? So you may have a fixed value of salt of seed and use it with repeated samples, which is the typical use with cryptographic extractors, right? You, an extractor is supposed to work with a given value, a seed even with multiple samples from the source. And in our applications, we will, in some cases we will have fresh salt for different samples, but definitely dark cases in which you keep, for example, if you have a physical random number generator, you could fix, to choose a random seed when you create, when you manufacture the chip and put it there and keep it fixed, random but fixed and not necessarily secret and use it to extract from many, many values produced by the random number generator. Okay, so, you know, a generic extract then expand KBF will have this format, will have the inputs as we defined before. When I write these things down in the form of things that can be written in a standard and I give them, instead of sigma LRC, I give them actual names, source of king material, the key lengths, the salt and the context information, you first apply the extract part with the salt and the sample from the source of king material. I wrote their salt optional because in some cases, unfortunately, we will not have enough randomness or enough independence and then we may, instead of having salt, we will fix this to some value and then after you produce the key from the extract, you will use it with the random function to expand it to as many keys as you want. Now, the expand part can be done in different ways. For example, it can be done, I mean, if you started with 128 bit, well, actually it's 100, let's say 256 bit chat too and you want to use it as a PRF using HMAC, for example, then and you want to produce 1000 bits, then you can repeat this with a counter four times or you can do what it's called feedback mode, which actually HKDF is defined with a feedback mode, which means that the way you produce many keys is by producing a key, then the next key is produced by applying the PRF to the previous key. There is some potential advantages of the feedback mode, but if the PRF is secured and either counter or feedback mode will work and I will not get now into the considerations why counter or feedback mode, that's not too interesting. So really, the part which is interesting to analyze is the extract part, okay, which is the computational extractor part of the thing. So we will be talking about computational extractors. In most of our applications, we start with computational entropy, so even if you applied a statistical extractor and then you will get only a computational guarantee, okay. So we define the notion of computational extractors. These are strong extractors. Again, a strong extractor is one that the seed is public, a regular extractor is without the seed being public. The truth is that most of the treatment now of extractors is for strong extractor, so the word strong is many times not made explicit, but in any case, we'll always talk about strong extractors where the seed or salt is made public, okay. So what is a computational extractor with respect to a source S? So as a regular extractor, it says you take the seed, you take the sample, you are supposed to output something which is close to a uniform. Here, the close to uniform is replaced by epsilon indistinguishability in the sense of semantic security. So that's quite natural. So we replace statistical closings with computational indistinguishability. Now, since this definition is with respect to a specific source, then again, this one can use a computational extractor which is specific to sources. As I said, there are some functions, for example, the Diffie-Ellman function that work with some type of groups, but not to others. So that would be an example of a specific thing. But again, we actually are more interested in generic ones that work with anything that has enough entropy. So an extractor will be called M epsilon T. If it is epsilon T computational with respect to all sources that have computational mean entropy M. And apparently, we will say that an extractor is M delta statistical if it's really close to uniform, okay? So there are some results that I may or may not show later in which we use this notion. Actually, the previous definition that I gave of computational extractors needs to be amended by adding to the output of the distributions that need to be compared to the auxiliary information that the attacker has. So now, not only the seed is known to the attacker, also some auxiliary information alpha which in our formalism, the source was outputting a sample and a value alpha as an auxiliary information. And these two things with the same R on both sides, the same alpha on both sides, the output from the computational extractor has to be indistinguishable from the uniform distribution. So, okay. So as I said, and as we know, the power of extractors comes from their randomness. Again, you can have deterministic extractors for particular sources, okay? Or actually, I don't think anyone talked about the deterministic extractors this week, right? You know, if you have a set of sources with enough entropy that sometimes you can use deterministic extraction. But anyway, for most of the applications and for the power of extractors, we want to randomize them through the seed or the salt. But then the question is, okay, if you have a good randomness source for the salt, why do you need an extractor? Right, we want extractors to produce randomness. So if you have a good source of randomness for the seed, for the salt, why? That's it, we are done, right? We have the, so why don't we stop there? Why aren't we happy with having the source of randomness for the seed and using it for any purpose that we were interested in? Any ideas? Actually, it's written here. I was supposed to hide this before. So the point is public versus secret randomness, okay? It's much easier to have public randomness than to have secret randomness, okay? So for example, by the way, sometimes we don't have access to public randomness and we will have to live with some deterministic extractors. But here are examples where you have public randomness. I said before in a random number generator, you can choose the salt and fix it as part of the generator and you don't have to hide that. I mean, you could also use a secret key there, but then you have to assume that the attacker cannot look inside the chip and learn that value. When you use a random extractor, you care at the time of manufacturing this chip or whatever way this is done, you have to choose something random and put it there, but that's it, okay? So the issue that that's another issue is that it's reusable. The usability of seeds is very important. You don't have to choose one for each application. Another example is in kick-change protocols, when you produce a key that, and you're going to derive from a Diffie-Hellman, your keys, then the way you can produce salt is by, you know, in kick-change protocols, usually the parties exchange nonces, which are exactly, by definition, random, but non-secret values, and you can use these nonces to seed your extractor. Unfortunately, the way to use nonces as extractor seeds, you can do that only if the parties authenticated the nonces because if the nonces can be chosen, or the seed can be chosen by the attacker, what's the problem with that? The problem is that the attacker can choose the seed depending on the source, okay? And we know that in order to apply, to get the benefits of an extractor, we need the seed to be independent of the source. So if the party's authenticated, then the attacker cannot choose them. In the ongoing design of TLS-1, which is the new, you know, the next generation TLS, we are using HKDF, unfortunately, we don't have the, we cannot use the nonces as salt because the protocol uses the salt, needs salt before the parties can authenticate the nonces. So that is an unfortunate thing about the protocol. In IC, the IPsec kick change protocol, actually we do that, we actually use the nonces as seed to the extractor. So when you don't have salt, you have cases in which you have to settle for deterministic extraction. Well, definitely you, yes, I mean, so the question is an attack at what level? Definitely theoretical attacks are there. You know, let me answer this question in a little bit, because there's something I want to say that has something like an example. Yeah, I'll say that there. Okay, so the security of the extract then, well, yeah, the extract then expand KDF, actually you can prove that the composition of an extractor and a PRF gives you the security that you want. It's a straightforward thing, so I'll skip this part. Okay, but okay, what's written here is actually the heart of at least the rest of what I'm going to show, which is we know how to build pseudo-random functions from hash functions, okay? HMAC, for example, is such a construction. We want to reuse these same hash functions or even HMAC to build the key derivation function. And the reason we want to do that is because practitioners love that, okay? They love to have the same primitive used as much as they can for anything they want. Or if it's right or not right, just give us something that our hardware has, our libraries have, et cetera, et cetera. So that is what we are going to focus from now on. Now, this is a question. We are going to do some crazy stuff with the SHA2 or MD5, SHA1, whatever these crazy dirty, who knows why they are secure hash functions when we actually could use provably extractors, right? We can use universal hash functions, which are nice and easy and quite efficient. Now, there are several reasons why not. One is that they require the large number of sold seats. And actually we have public sold in many cases, but there are cases that we don't have and there are cases that we have but we have not too many of them. So that's one problem. I was hoping very much towards the end to show you something about how you can use regular extractors with less sold, but I'm not sure I will get there. So more on this later, maybe. They require, okay, the other thing is that they require the large gap between the entropy that you start with and the number of uniform bits that you can output. And in many cases, particularly in the Diffie-Hellman case, you don't have enough entropy. Now, you start with something with 256 bits and you want to output a 160-bit key and the gap between these entropies is not sufficiently large to give you a good statistical difference at the end or computational difference. If you don't have sold at all these nice universal hash functions, that don't give you anything. So you cannot use that. This is a trivial, stupid reason, but a strong reason that these things are not there in the hardware and the libraries. And there is another reason which is that in some cases, we want to reuse these extractors also as random oracles. Either because we don't have enough randomness or because the analysis of a protocol is based on the random oracle. And we want all of the things, extractors, with salt, without salt, deterministic randomized random oracles. We would like to have one thing that actually can do all of these functionalities. Now, Diffie-Hellman is an example in which statistical extractors will not be sufficient. Oh, another thing that I didn't mention as why statistical extractors are not good enough for this case is also this issue of non-independence between samples that an extractor, a regular extractor does require. Well, they are more complicated. They're more complicated. But if these things were the only, I mean salt, for example, was the only issue, we could actually go review the things, maybe have more motivation to find more practical ones. But there are so many other things that somehow this becomes one of the many reasons but not the killer reason. I still encourage everyone to go and look into building these extractors with, I mean there are lower bounds, okay? There are lower bounds, but the lower bounds between the bounds, there are lower bounds that are met by upper bounds. So they are tight bounds there but it would be nice to have more practical implementation of the tight or close to tight extractor. So that's one of the research problems that would be interesting to have you guys looking at. Okay, so I have this notion of cryptographic extractors, which I don't define, but I mean, covering all these applications that I was talking about, they are computational. Okay, anyway, they have all these requirements and limitations that I was talking about and we want all of these to have a single design for all of these different scenarios and their suitable assumptions. So the way we are going to build these things using hash functions is based on HMAC. And we will show that the construction will have all these properties that we are interested in and there the suitable assumptions. In some cases, the assumptions will be completely combinatorial. In some cases, the assumption will be built on random oracles. But in any case, we will try to minimize the how much we abuse idealized models like random oracle random functions and stuff like that. Okay, this, okay. So now I have a problem because I don't want to spend time on what I'm going to spend time now. But since you said that you cannot write down how HMAC looks, then I'll punish you and I'll show it to you, okay? Now, the reason I need to show you the thing is because I'm going to have some mathematical statements about properties that these things have and you need to have some idea of the internal structure to see how these other things make sense or not. So what is Merkel-Darmgarde? So how many people know how Merkel-Darmgarde works? Oh, excellent. So Merkel-Darmgarde starts from a compression function. I mean, the reason I'm happy to show these slides is because the only slides that have these beautiful pictures, which is the most I can ever do, you know what I mean? So there is this compression function which is a function with a finite input, finite output, two input values. One is called the block, typically 512 bits. These are the SHA-1 numbers. So the block is 512 bits. There is a key of 160 bits. The output from these two values is another value of 160 bits. It's important the output here is the same length that they put here because in the cascade or the, we call this cascade or Merkel-Darmgarde construction, what we are going to do is concatenate the things one after the other where the output from one function goes into the next one and so on, where the input is broken into blocks of, say, 512 bits and they are input into the function one after the other. So if you have an input of 1024 bits, then you will need to actually run two of the things. The truth is that we'll need to run three because there is a padding. These functions define that in the last block you pad to the end of a block of 512 bits using some encoding of the length of the message. Now, here I'm showing this as a kid function. Both the compression function, I think about this element, this value as a key, okay? And I also think about the cascade or Merkel-Darmgarde with the key, okay? Which could be variable. However, as a function, as a hash function, the constructions fixed the key to some IV value. You can think that is a key that it was chosen at random ones set in the function. And when we talk about collision resistance of SHA2, we are talking about collision resistance of that fixed specific function that has the IV as the fixed IV as the key. Now, instead of looking at the IV as the fixed key, if you now let this construction be defined over a set of keys, so it becomes a family rather than a single hash, then it can be proven that if the compression function that you started with is a pseudo-random function in the regular sense of pseudo-random functions, then the concatenated thing is also pseudo-random, okay? However, on prefix-free inputs. Why prefix-free inputs? Why do we need to, why can't we, why cannot, I mean, it is PRF on prefix-free and it is not PRF on general inputs. And as usual, I wrote it here, all right? There is this extension attack, which means, since this thing is computed iteratively and the next block is input, if I was, if you tell me what's the value of the function on input X, okay? And let's say that X ends at the boundary of a full block and now you are asking me what is the output from the function on X and concatenated with the value Y, then I can, even if I don't know the key, I can compute that because the value of the thing is the compression function with this value as the key applied to the block Y. So anyway, for any, if I tell you what's the value of this cascade function in an input, then you can calculate the value of the function on any input that is an extension of what I gave you. So that again? No, well, so you, yeah, I mean, it doesn't save you. It complicates, but doesn't save you. You just need to put in the extension the padding. So, you know, there are some cases that that will help, but not in general. Certainly not as a general proof that the thing is a PRF. Okay, so cascade is nice. It preserve PRF of the compression function, but only if, only in prefix free inputs. So now we go to something called NMAC. How many people heard about NMAC? Okay, so NMAC is what I'm defining now. NMAC is the function on which HMAC is built. N is for nested. And the definition of NMAC is what is written here is you take two keys, okay? The first key you use to hash the input. This is the cascade construction where the IV was replaced by a random value K1, okay? You compute that cascade and then you take an external one, the compression function you apply to the thing that you calculated here, okay? What is this external thing doing? It's removing the extension attack because now if I put a Y here, okay? It you have not seen FK1 of X, okay? You've seen FK1 of X after you applied these pseudo random function if you want to think about it as pseudo random functions. So that's the nested construction. F, lowercase F is the compression function. The uppercase F is the cascade. Kid VIV, I mean that the IV is replaced by the key K1. So pictorially, this is what happens here. So you have, yeah, okay? So you have the cascade and the output uses a different key and output, the thing. And it can be proven that is if the compression function is a PRF, then this construction is also a PRF, okay? Right, right, right. So yeah, I mean, these hash functions are allow any input and if the input is not at the boundary of a block, then it will pad actually. So yeah, let's say this is 160 bits, it will put many zeros and then some encoding of the lengths. Yeah, so okay, but it is important, okay? If the compression function is pseudo random, then this thing is pseudo random, okay? And this is the type of claims that they want to prove. Things that say if the compression function has a property, then the iterated thing has the same property or a related property in this case. If it's PRF, then it's a PRF. Now we come to HMAC. Now you better memorize this thing because next time that I'm here, I will ask you, I will test you. So this is HMAC. What is HMAC? HMAC is NMAC, but it was a cheat. Okay, so it's this beautiful thing that you take these two values, this is a byte, okay? Now we are becoming practical people. This is a byte, this is a byte and you take this byte, you repeat it 64 times, it gives you a string of 512 bits. You take your key, your HMAC key, you sort it here with opad, which is the outer pad, the outer application, and here you sort it with ipad. Unfortunately, Apple doesn't pay me any royalties for this. And we should have registered the ipad name one second. And what it does, actually what this does is basically it's an NMAC, where the K1 and the K2 are computed out of the same key K sort with these pads, okay? So you see K1 and K2 in NMAC are produced from the thing that uses the regular IV and the input kx or opad. This one is a regular IV kx or opad. Why is this, why we did this thing? And the NMAC thing is a natural thing to do to avoid extension attacks. But why we move from that natural thing to this very unnatural thing? What's the advantage of this thing compared to NMAC? And it's not written here. The answer is not written. So one thing is that it's only one key. Instead of K1 and K2, there is only one key. That's one thing. You can use the standard implementation, exactly. So you see the box here and the box here. There are many implementations of these hash functions that will not allow you to change the IV from whatever shot 256 define to a random key. Okay? Either because that's the way it is programmed in the library or if it's a hardware thing, forget about that. So the idea was how we build an NMAC but without really requiring that you key these functions. Okay, it's something that should be quite easy at least in software, but even the software people didn't like that. I'm definitely the hardware people didn't like that. Actually, your first. Oh, that's, I was hoping that someone will ask that question. This is the best thing in HMAC because this is what gives these cryptographers, they have this deep knowledge of why these bits are not the other bits. I was coming back from some crypto 95 or something like that and I decided that I need to choose two parts, two bytes, two eight bits values at random. How do you do that? How do you do that? You know, some people will use a coin. I use my, I wrote them, I mean, obviously they were. And however, it's more intelligent than that. Actually, the constraint I had in choosing them is one will be random, eight bits at random, then I will choose four bits, four positions at random and I will flip the bits there, okay? The reason for that is that at the time there were some attacks on MD5 that actually worked best if the Hamming distance was small between two values or if they were, you know, I mean, either the two values or the complement of one with the other had small Hamming distance. So I maximized the Hamming distance in both directions by taking this thing. But yeah, so now Opa and iPad demystified. By the way, Ranri West was complaining that I should have chosen something more random than taking a byte and repeating it 64 times. And my answer was, you know, if this is going to be broken because of this choice, then it will be also broken with a sign of 37, you know, these kind of random stuff that people choose. So we better know as early as possible that it's broken. Anyway, so I guess that this is the end of the first and I did quite well. I think that I'm more or less on half way of what I planned, give me one second and yeah, okay. Right, right, right, right. So exactly let me explain this line. So NMAC, it has a real proof that if the compression function is a pseudo random function family, then the iterated thing is also to the random function on arbitrary inputs. Going from NMAC to HMAC, it's not anymore the case because the K1 and K2 are not independent, okay. They are choosing through these different iPad, these pads. So actually in order to say that K1 and K2 are good keys for NMAC, you have to claim that they are computationally independent, okay. And you know, when things are not true, you force the truth. Now, so actually there is one way of claiming this thing which is you can think of these hash functions as being kids through the input. There is two ways of converting these hash functions into kid functions. One is to kid them via the IV. The other one is to kid them via the input. And there are reasons, there are justifications to both things. Actually in HMAC we are keying through the IV. There is a very good certification of that the key comes into the block and not into the IV. And the reason is that at least for the MD and SHA families the way you build the compression function is actually through a block cipher that is kid via the block. So anyway, so to go from NMAC to HMAC as a PRF you need to assume that the function is a PRF when input through, when kids through the input and you also see these keys are related you need to have an assumption of related keys. In other words, going from NMAC to HMAC is secure because it is secure, okay. There is these ways of explaining it, but at the end of the day we hope and there were no attacks so far exploiting the thing even though it's surprising that it's not. But that's the situation. Yeah, so this stupid thing of people really wanted the one that's a single key. I mean, we, yeah, it should, yeah. Yeah, I mean, it was important. People really thought that having single keys is important even though, you know, the point is that we could have run a pseudo random generator, having a single key and double it with a pseudo random generator. But that is one more operation that you have to do. No, because still the XOR of the two keys will always be the value of the XOR of these iPads, so it will always be fixed. The point is that if you sort the K1 and the K2 in this case, then you get a fix, so that will not, okay. All right, so we will start next with actual definition of HKDF. I will try to get as far as possible in the analysis of what the properties of these things are.