 So sad news homework is going to be delayed until Thursday I know I came up with a really cool way to do a homework assignment but it means that I have to implement stuff so it means it's going to take a little bit longer than I thought so it's going to be really I promise. I don't want to spoil anything yet. Cool. I'll tell you about it. I'll tell you about it on Thursday I promise. Teaser. Teaser? I don't want to over promise anything that I can't actually deliver. This is a good tip for your future careers. I'll just tell you that it's really cool. That's not going to be details. That way you'll see it as long as it's cool. That's all that I'm going to be care about. If I tell you it's all going to be so dear to you that this is this and then it'll be not fun. I'm going to be reinforcing the concepts that we've been learning about in this cryptography section. So lots of fun. Cool. Okay. So let's... Great. So we started talking about and we left off last Thursday. We're talking about cryptography and what's so special about public key cryptography? There are two keys. Two keys. Why is that important? Why can't I just generate... I have two keys in symmetric cryptography. There's one that you know and one that everyone knows that's public, one that's secret that you have and we showed how the combination of the two allowed to show... to pass secure information and also to pass... not anonymously secure, but directed. So only the party you've sent it to knew it was from the time. Right. So the key difference, right? This is the way to kind of think about it. Symmetric key encryption, yes, you can always generate new keys. You can have new keys. But you still have to securely transmit that key to the other party, right? Here we have two keys that these two keys are linked and they're linked in such a way that if you encrypt something with the public key, then only the private key can decrypt it. And vice versa, if you encrypt something with the private key, only the public key can decrypt it. Yeah. So you can actually say, oh yeah, I actually sent this to you or no one knows that I sent this to you. Right, exactly. So you get confidentiality so you can hide messages similar to symmetric cryptography but you also get non-repudiation where you can actually say this actually came from me. As we left off, I believe we talked about the history here, which you have this really interesting case of parallel development, one in the classified world, one in the unclassified world. So because the unclassified people, even though they technically developed it later, they're the only ones that we knew about for a long time, for about 20 years. The algorithm is named after them. Their algorithm, RSA, is named after them. But it's important to, kind of an interesting, I don't know, I always like this peak behind the curtain where you can see the crazy stuff that's being done. Cool. So let's look at this first 1976, so we'll take ourselves back to 1976. So the idea was even back then that people understood the problems with symmetric cryptography and there the problem, as we've always said, is that you have to exchange keys securely. And if you can't do that, then the entire symmetric cryptography system's just break and fall apart. And so Diffie-Hellman was a way to get across that. And the important thing here is the key exchange. So this protocol is only about exchanging keys. It doesn't quite get to what we want on the general-purpose public key cryptography system. But I feel like this is a little bit easier to understand and digest than RSA. We're going to talk about the details of RSA, but I like Diffie-Hellman for this. So the idea is that Alice and Bob want to exchange some keys. And what they're going to do is they're going to generate some and communicate some shared information. Then each have some secret information, essentially mix the public information with their secret information, then share that result. And then each use their secret information to derive what ends up being the same secret key. So I really like this Wikipedia image of how this looks like with paint. So obviously paint doesn't necessarily work like this, but if you assume that Alice and Bob shared some yellow paint. So this is shared in the clear. Then privately they each generate their own secret color. You can see that somebody was British, whoever created this, right? So Alice's is red and Bob's is, I'm going to go with teal. I don't know, I'm not a big color person, but I can at least do. It's not quite blue. Teal, we'll go with teal. Is this orange or more red? It's hard because mine is, see, I look at it like this. Okay, it's definitely orange. Okay, cool. So orange, so Alice generates orange, Bob generates teal. And this is secret. They never share this information with each other. Then they mix that yellow paint with their secret key, with their secret colors, with some of their secret colors to create something that's light orange and something that's more blueish. I don't know if I'm writing this. I'd say periwinkle, but I don't actually know what that looks like. And then they exchange these. So now the idea is anybody can see the yellow, right, the public at the start. Anyone can see these other two tan blueish colors. And the idea is it's difficult for an adversary to then work back and take that and substitute, subtract the yellow to get back to the original secret colors. So I don't actually know enough about paint and colors to know if that's feasible or not, but this is the general idea. Is you use something that mathematically is easy to combine but hard to separate. So this is all in the clear. That makes sense. Cool. Then they take each other's publicly exchanged information, mix that with their secret color, and because of the properties of the algorithms and the math that we're using here, they end up with both of the same secret key. They both end up with brown. And the way this is done guarantees that they will both end up with the same secret that you cannot derive if you only see yellow in the public transport. You can't derive what this common secret is without either of the secret keys. So now at this point, what happens? Seems strange, but using a paint sample, like if someone gets a hold of the blue public paint, they can just begin throwing random colors at it so they can just try to catch what it takes to get the common secret. That makes sense. So again, this blueish paint. Do they know what the common secret is? Yes. They know the common paint. They know the public transport. But this is done in secret so they don't know the common secret. So they don't know the common secret nor the secret. Correct. You can think of that. The common secret is essentially derived based on the public values but also each of their secret colors. Because you can tell if you look at one flow, Bob flowed yellow in the teal to get blue, and then Alice mixed that with orange to get brown, whereas the other way, Alice took yellow, mixed it with orange to get tan orange, and then Bob mixed that with teal to get brown. So they both went through the same transformations in kind of a different order. But just by having yellow and either of these colors, you can't derive these secret colors or the common secret. And so now what can Alice and Bob do? So what do they know? Let's say Eve, who's Eve's driving this, does not know. Yes, Alice and Bob, they each have their own secret colors, but they also have this shared common secret. And the common here does not mean everybody, it means the two parties, right? Both Alice and Bob have this exact same common secret. So what can they do now? They could use this, they could say that, okay now this is going to encrypt our AES communication, and they never shared this common secret key in public. So they were each able to derive this same secret key. And I do not believe they can actually control exactly what that value is. So this is why this is called a key exchange protocol, not a message passing authentication, I mean encryption protocol, because they can't choose what this resulting value is. But they know that they each have this different, the same color here. Questions about this? And so the way that this works is going through examples. So Alice and Bob both agreed to use some numbers, in this case they use the terminology P and G. So P is 23 and G is 9. So these are small numbers just for examples. So Alice chooses some secret value of 4 and Bob chooses some secret value of 3. But they each do this individually and this is generating this orange and this teal color. Then Alice sends Bob A which is 9, which is G raised to the 4, raised to the secret mod with P to get 6. And then Bob sends Alice G, again same G raised to his secret power of 3, mod 23 which would choose 16. So 6 and 16 are shared in the clear. So now we know values are 23, 9, 6 and 16. And the idea is mathematically difficult to go back from knowing these values to derive the exponent here. And then Alice can compute the secret value. So Alice takes 16 which she got from Bob, raises that to her secret value mod 23 to get 9. And then Bob takes his secret value, the value that he got from Alice, 6 raises it to 3, his secret value mod 23 to also get 9. So by going through this process they actually both derive the same value here. And they can use this 9 as an encryption code. Obviously you want to use bigger numbers so that that way you increase the root force space. Questions? Cool. Now we get into RSA. There's not a good visual depiction of RSA. And I think it's important to understand at a high level how it works, what the steps are, and what's public and what's private. But, you know, so I think it's important to go through this so that you've seen this, but would probably not ask you to derive or to prove why RSA provides public and private key encryption. So, and remember, this is one party. So for you to generate those two keys, right, that are linked, you generate two distinct prime numbers, p and q. Then you compute n, which you multiply p and q. So this is exactly what our friend from the 1800s did in his book, right, to write that number that said, I don't think there's anyone who will ever be able to factor this number. Right, because he found two large prime numbers and multiply them together, which is an easy operation, and use that. So this is, so basically, if I give you n, you should not be able to derive p and q. Right, which would be computationally difficult to do that. What would make that not true? What was that? Yeah, if it's, well, if it's a power of two or any even number, right, that would be trivial to do. Then, it's technically two is prime, but they don't want to use two. So the idea is that if you're given n, it's hard to factor p and q. Right, so this is where, so we'll see as part of the public key, you give out n, because n you can give to anybody, because you're saying, hey, there's no way you can actually break this n down into p and q. So the rest of this follows with exponentiation, with modular using the mod operator. So the idea is, so for any n, a, so any a and any e, so to calculate, I mean this actually is very trivial, right? So you can take any number a, raise it to the power of e, and do mod n. So what's mod n? Two prime, one point together. What's the operator mod? Yeah, it's just, it's exactly what you would think. It's the mod operator. It's the percent in programming, right? So this is something you're actually very familiar with. This isn't like a brand new concept. And you all know about exponentiation. So this is good. So the idea is you can calculate c. So you can calculate, you can take a number, raise it to any e, and calculate mod n, and give a c. But, and this is where the other, so RSA actually relies on two difficult things. So one is factoring large numbers. So finding the, finding this, finding p and q given n is computationally hard. That's one property of RSA. The second property of RSA is if I give you c, e, and n, so in this equation. So if I give you c, I give you n, and I give you e. It's actually hard and difficult to calculate a. So this is a whole theory of mathematics that has to do with groups or fields around modular arithmetic and all this stuff. We're not going to go into any of that at all. But this is kind of a, these would be hard things about RSA. So if you give me c, n, and e, c, n, and e, it's difficult to calculate a. So then we compute m, which is p minus one times q minus one. And we choose an e such that e is between one and m. And then we compute d is e to the negative one mod n. And this is something that we can actually easily do because e times d, so if you multiply this equation by e, e times d is equal to one mod m. And because of the way we chose m, this is true. So this is, I'm hand-weighting a lot of this. And actually at this point we now have our public key. So we have the public key is n and d. So this d value is related to p and q through m. But it's difficult to go back. So from this d, it's difficult to derive e. And I believe in n and m. It's difficult to derive m and m is based on p and q. And the secret key is e and n. And so to encrypt the message we have, so if Alice wants to send a message to Bob out, she needs Bob's secret key, right? That's something we've been talking about. I'm sorry. I was supposed to catch the message. Something like that. Alice must have Bob's private key, which in this case consists of the n. So the n that Bob chose and the d that Bob chose, right? Public key. I said private key the first time I think. Did I say it again? You said secret the first time and then private the second time. Good. That was a test. Public key. So Bob's public key, which is the n of Bob and the d of Bob. So you take the message. So the idea is you somehow turn this message into an integer because this all depends on integer arithmetic. So you turn the message into some integer m such that n is less than the n that Bob chose. So what does this point mean? How does this actually impact the way we'll use RSA? All messages are really big keys. Yes. We either need, because we're limited by sending a message that's encrypted that is limited in the size of n, which is related to the numbers p and q, the prime numbers that we chose. So we either need to use really large numbers in order to encrypt messages or we need to send very short messages. Or as we'll see, we choose n such that we can encrypt a 128-bit AES key that we can then use to communicate. We can do all kinds of things here. But this is kind of a limitation, fundamental limitation of RSA. So Bob, no, this doesn't make sense. Sorry, this should be backwards. Alice takes the message that she wants to send, raises it, this is backwards. So Alice takes the message that she wants to send, raises it to d, and then calculates mod n to get the ciphertext, and then Bob takes that ciphertext, raises it to e, which only Bob knows, because Bob knows n and e, and Alice knows n and d, and Alice takes c, Bob takes the ciphertext, raises it to d, and then does mod n on it to get the message. Then I'm going to, she only sees the ciphertext, the public key of Bob, and the public key of Alice. So some of the properties, as we just talked about, it only actually allows us to send messages that are less than n. So depending on the bit length of n, that's basically the size of numbers that we can actually send. So how do we turn this into an actual crypto system? Because a crypto system, right, we should be able to send arbitrary messages. So how can we turn this into a real crypto system? You can use it to encrypt an AES key and then send the message, so send the RSA encrypted message, and then send the message that's encrypted with the AES key that's encrypted with the public key that we're interested in. That'd be one way, what else? What if we implied, let's say, breaking up the text into some kind of fixed block size and encrypting each of the blocks with R, the public key. Think about it in terms of letters. Use the same encryption key to encrypt a high volume of various small messages. Right, so if we go back, so A, yes, assuming n and d are fairly large, doing exponentiation and then modular, these are expensive operations, so these mod operations, I mean, the exponentiation is not cheap. But if we look at this, what does the ciphertext here depend on? It depends on n, which for a specific person's public key is what? This doesn't change fixed value, so n is fixed, so what else does c depend on? From the d and the m, but only the m changes per method? Correct, so only the m changes, so if we apply, let's say, this encryption to each letter of hello, we have to encrypt it to somebody's public key. The public key is constant and does not change. So therefore, the Ls will get encrypted to what? We don't know, but we know it's going to be the same thing. We don't know exactly what it's going to be, but we know that they'll be encrypted to the same thing. Plus, let's say, Eve knows we're doing this, sees the message, who's to say she can't switch around the letters and then send the message to Bob? So we could try this, but this would be fundamentally broken, right? We get all the same problems that we had with encrypting blocks in ECB mode, right? We're using the same encryption method on each chunk of data, and so there would be statistical properties that leak, all kinds of stuff. The idea we talked about, use essentially RSA, so use RSA to encrypt the key that was used to encrypt the actual message with AES. So this would be this, and you'd send this to the other party, and then this way, only the person who has the private key of this RSA public key that we use could then encrypt the key and then use that to decrypt the actual message. I've done this a little bit before, so if I receive a message from one of you, and let's say I decrypt it, and it turns to gibberish, completely random, how do I know that you didn't just try to send me a random message versus actually sending me, and maybe the message you were trying to send was a bunch of random bytes that you used that you stole from the random number generator from a secure computer or something, and you're trying to share with me these results. I'm trying to think of a scenario where you actually send completely random data. On a primitive level, does AES or RSA, do any of these things have any way of differentiating between that? Eventually, our messages at this point have no integrity, so no integrity properties, so all the crypto systems we've seen are giving us either confidentiality or maybe confidentiality plus non-recudiation, and so we want some way to be able to say, and here's a valid, here's how you know that the message is correct. There's actually an interesting slash difficult problem because you're sending them the message, so when I just, you know, the message is correct. And so we want to, some of the cases we want to be able to tell is what if an attacker flips a bit in the ciphertext before sending it to us? So what if maybe, as we've seen, so remember we saw CBC mode, right? We actually didn't look at this property, but you can maybe tell based on the way that the CBC mode worked. If you flipped a bit in the middle of the ciphertext, when you decrypted it, all of the beginning would be okay valid messages, and just when that bit flip happened, you'd get random errors, but actually the CBC mode has this interesting property where it will eventually self-heal and go back to normal, so you may not that, those garbled places may actually be important data that you want, and so you want to know that there was a bit flip. So an attacker could actually use this to make you think that, oh, maybe the original document was just corrupted. So you have no way of knowing was the original document corrupted, was this garbage that I'm seeing part of the submission, was it intentionally induced by an attacker? Even if an attacker doesn't do it, you know, what if just the bit is corrupted? So how can bits get corrupted? Gamma radiation. Gamma radiation. So there's actually a story about, I don't know the details on it, it's told to me in one of my courses, and so the idea was there was, I believe it was a government organization where a university was building these supercomputers and they built two identical machines and one had twice the error rate of the other one. And so they did what normal people do is you look at all the components and you figure out what's different and you replace all of the second one by the first one or brand new components in both and that didn't fix the error rate, the error rate was still twice that. And so they looked and they tried to figure out what are the differences and what they found was that one facility was in the mountains and the other facility was at basically sea level and so they actually realized and found that it was gamma radiation because these were massive supercomputers doing massive amounts of computation. So all it took was one gamma ray flipping one bit somewhere that could cause the program to crash and once they like lined everything with lead or whatever you do to get rid of these so that the error rates were identical. So this is a thing that does happen and it's kind of weird to think about how many of your computer crashes could have been caused by something like this. I know there are some... So this happens, right? So you want to know, hey was this part of the original message that whoever is sending me this message intended for me to read, right? And the idea is how can the receiver know that the message has been tampered with or that it's not what we sender wanted to send. So how does Ethernet deal with this problem? I don't want to check some. I don't want the details. Doesn't it basically just flips all the bits and then adds them together and you should get one across the board if it was the corresponding messages matched? Yeah, something like that. I actually don't remember, but the idea is, right, so Ethernet, you have this physical medium where information 1s and 0s are traveling over so it's very possible for some bits to get corrupted and so you want the other side to know when there's some corruption or even maybe try to fix it. So one thing you could do is for every, let's say, byte of data you send you add an additional bit that acts as a parity bit. So the idea is if you, just I think what we said, right, if you added all of the bits together you'd want them to be zero, I believe, at the end. So you'd add a parity bit that would ensure that that was the case and that way the receiver on the other end could tell whether any of the bits have been flipped or changed. What if there's an active adversary in the middle of your Ethernet connection flipping bits? Does it prevent that? Exactly. So all I have to do is figure out, okay, within this byte or however long we check some is I want to flip this bit from a zero to one. That means I need to just flip, I don't even have to change the data. That just means I need to flip the parity bit. So I know if I can control two bit flips then I can completely control and alter the contents of that message. So the check sums, what we normally think about check sums are really designed for kind of random failures and random errors and bit flips but they're not designed for an active adversary. And so we actually, this was part of the study of cryptography was coming up with some way of defining a, use different terms, digest is one term, hash is another term. The idea is you have some arbitrarily large amount of data that you want to try to develop some kind of digest that is smaller than the data based upon that data. So in this case, your parity bit is essentially a one bit value that depends on the eight bits that came before it, right, that height. So you're trying to compress information about those eight bits down to one bit. So cryptographic hash functions are essentially functions that try to map some arbitrary size data to a fixed size bit string. And they have different sizes depending on exactly what hash functions you use. I believe most of them are about 256 or 512 bits depending on what you need. And so the idea is no matter what input you give it, no matter what size of input, it will do something and will spit you out a 128 or 200, let's go with a 128 bit number. One thing is, whenever something has cryptographic in the name, you have to assume that there are certain mathematical properties that you want that you're probably going to use this in cryptography and for cryptographic purposes. So let's go. So we have arbitrary size input, right? Literally any amount of size possible, right? So this is technically unbounded. I mean, you can think of some hard boundaries, but fundamentally unbounded input that we are trying to generate 128 bits based off that information. And so if we think about, well, you know, if we go back to that example to check some, right? The purpose of a check sum is to see if any of the bits kind of got flipped. So what would be some properties we would want from a cryptographic hash function in this case at a high level? Can we actually guarantee that? No. Now why not? Right, how many different numbers can 128 bits represent? 2 to the 128, right? You don't have to know the exact number, you just didn't know this, right? 2 to the 128, right? How many possibilities are there of input data? Infinitely many, right? So this means that, so ideally we want, so for this to actually work, we need everything in our input must map to one 128 bit number. Does this mean that there's a unique mapping from every, it's kind of like alpha, I guess I meant to do input. Inventory size, so is there a one-to-one mapping? So from every possible input value there's a unique 128 bit output for it? I can't possibly be the case, right? Because there can only be two of the 128 bit unique mappings, right? And what's the point of, if your input size is only 128 bits, you're not really squeezing anything down into less space, right? You could just send the value in that case, right? You don't need a hatch or anything, right? You may want it, but in this case you just send the value and you can represent all of them. So, and then it's clear, how are you taking 355, I think you're talking about this, pigeonhole principle? So that's how you prove that there's not a unique mapping, right? You know there's more than two of the 128 bit inputs, which means there must exist two inputs that map to the same hash value, right? But look at that. 355 stuff coming back. Okay, so we know it can't, it's impossible to guarantee, and in fact there's no way we could guarantee it, that every input has a unique hash. But what do we want? Yes, so we definitely, so we want this, the way we think about this is we want this to be a one-way function, right? We want there to be easy to go from any size input to the fixed 128 bit output, but difficult to go back. I thought what we wanted was to make sure that the thing that I torrented hasn't been tampered with. Isn't that the purpose of this? Uh, no. That's a use case. It's not the purpose, right? So the purpose, because we were talking about network transmission error detection correction and the kind of... Right, and we talked about why that that's not, but they're not good for an adversarial use case. So we're kind of expanding here, and the idea would be, well... We're transmitting that arbitrarily linked message, and we're transmitting it and generating a hash. So when the message gets to the receiver, they can re-compute the hash and make sure it matches the one that I sent. Yes, at a high level. We'll go into the details of how this is actually done, but this is... Hash functions have a lot of different uses, and so we have to kind of touch on all of them. So, yeah, it's hard to come from, like, use cases of two hash functions, so we're going to try to do both. Right, so definitely... So, yeah, one thing should be if I give you a hash, it should be difficult to come back to know that exact text that generated that. What else would I want? Like space and use. So then, if you have the message, hello, the message, goodbye, you don't want both of those to collide in the hash function, but if you have hello and... Okay, yeah, so that's a good way to start thinking about this. So the idea then is about what kind of collisions do we care about, right? So one question would be... So if I have hello, it's a little difficult with this example. I actually don't know if this is the case, but what's one bit off from H, do you know? By flipping one bit, I assume it's G. Somebody could look it up and tell me if that's correct. So the idea is there's how many bit differences between these input strings? One bit. There's one bit difference. So what would we expect? So let's say we have some hash function H. So what would we want to be true about the difference in the hash? Yeah, we want it to have absolutely no relation with each other. Yeah, so we want this to be... So basically it is a one bit difference, so we want basically a one bit difference which should cause fundamentally completely different hashes, right? So they call this I think the cascading principle or something. So you want one bit to cause a completely cascading change in the output of the hash function. What else would we want with this? So let's go back to our example of using this to maybe pass messages. So if I have some message and I've computed the hash to that message and I give it to you and I say here, you know, here's my message. Go give that to somebody else, right? So let's say I have some message m. So I have hello. So I have m and you have the h of m, the hash of m. What would I want it to be the case? So should you be able to take m off? So Alice wants to send this to Bob, but then Eve takes this, gets some new m prime with the hash of m. Let's assume right now that she can't touch this. We'll talk about how that is done in a second. So Eve does this and then Bob gets m prime the hash of the message. m prime equals hash of m. Right. So if that's true, then Bob will think that this is a legitimate message, right? So if hash of m, so can we prevent this from ever happening? We've already seen this. How many tries would Eve have to make in order to find some message m prime that hashes to that? Many, exactly. Two to the 128, right? Or however the size of the hash function. If you try enough of those, you will eventually hit the same, the same one, right? But we know, so we want the property that it's difficult to find these hashes. So we call that a hash collision. So when we have two values, so this is from hash tables, right? You have two values that hash to the same thing. If it's trivial, so this would be a case of checksum, right? It's trivial. So if you give me a message, it's actually, for a lot of checksums, even like CRC 32 and 64, it's actually trivial to create a new, brand new message with the exact same hash as another message. This is why checksums and CRCs are not cryptographic hash functions. So we want to be the case that it's difficult if you're given one message with a hash to find another message with that same hash. And this is actually how and how they break or how they demonstrate that they've broken hash functions. So, like, NB5 was broken and there are things that you can do that you can, like they, they'll give you the hash of a PDF and then they'll give you five different PDFs that all say different things that each hash to the same value, but still give you completely different content. So, very cool stuff. So let's see, I think we have some other one. Cool. So one-way function is what we talked about. We want it to be easy to compute so it should be easy and, as we kind of talked about with cryptographic functions, we want it to be fast to compute, right? This shouldn't be a difficult operation. We want it to be incredibly quick, but we want it to be difficult to go back. So these are, you can think of it as one-way function is one-way or a trapdoor function is another term that they use so you think about a trapdoor, right? You can fall into the trapdoor but you can't come back up. And we talked about it's not a one-to-one mapping, it's deterministic, right? So if you give it, it's a function, like a mathematical function in that it is only based on the input that you give it, it will generate the same hash value every single time. And the other thing is a small change, so one input bit should completely change the output. So one way we can use this is as we talked about. So we just showed that public key cryptography is actually pretty expensive. So when we wanted to create a signature for the message before, we would take our public key but we would take our secret key encrypt the message with our secret key and then display that to the world and now anybody can use my public key in order to verify that I indeed encrypted that message. But this is actually an expensive operation as we saw, we actually don't want to necessarily do that with all these messages. So Alice can actually use a hash function in this case. So Alice, like before, wants to make a statement M that everyone knows is from Alice, so that she doesn't care about hiding the message itself but she wants everyone to know that it actually came from her. And so Alice takes her secret key, encrypts the hash of the message with a good hash function and then shows everybody here's the message and here's the signature of the message. So then how does Bob or anyone else verify this? The message in the signature and then how does Bob verify it? Does he know the hash function? Yes, he knows the hash function. It's a... You could... Yeah, it's considered as part of this. As far as part of the signature you got metadata. If he hashed it and got the same signature it could be relatively sure that it was the same. Since it's not one-to-one, he can't be absolutely, but... What does he have to do with the signature that he got? Guess what? Yeah, he has to use her public key. So he has to check, hash the message, check does this hash actually match with using Alice's public key decrypting the signature, right? Because that signature... So what does Bob know for certain at this point? He knows the message came from Alice or to go a little bit weaker we can say that he knows that this signature that Alice sent belongs to this message. So assuming that Bob trusts the hash function that it is very difficult for somebody to generate a new message with that same value he can be fairly certain that Alice sent this message M. So we can think about... So what if Eve alters the message to be some M prime, right? This is kind of the example that we've been thinking about. So if Eve alters M to the M prime it should be the case that this is what we just talked about. We want the hash of M prime to not equal the hash of M, right? So it should be very difficult for Eve to find some new message value where that is equal. So this actually used cryptographic hash functions that are used in a lot of places. What are some of them that you know? So somebody mentioned Torrents. So Torrents, the BitTorrent protocol uses this basically to guarantee and check the integrity of chunks of the file. So the files that you want to transfer breaks up into chunks and this way when somebody sends you a file you can compute the hash of that chunk and then verify that they actually sent you something correctly which means that somebody's not trying to poison you by sending you false pieces of data. What other instances? And this is actually what a Torrent file has. So it has the location of the tracker and it has a list of, I believe, the files, the metadata and then a list of chunk hashes. And that's all that's in there. So that's all you need to basically recreate that file. It's a very interesting file, distribution protocol if you're interested in looking at that. What other uses? Make a secure connection to parties and send the key to each other and then decrypt the hash function to make sure that who they're communicating with is actually who they say they're communicating with. So you can't decrypt the hash function or you get the hash to make sure that the key that's being sent to you is a valid key that you're likely to communicate with. Exactly. So basically, using the thing we just kind of talked about and similar techniques, you essentially send hashes of the messages you're trying to send and so this way you can verify that you actually received it correctly. Yeah, well don't work directly. Never touch hashes or hash functions. Passwords, right? Passwords, what about passwords? They're just two correct passwords, but that's all. Yeah, so we'll actually look at it. We're going to go, I think, to authentication after this. This model is after talking about crypto. So the idea there is a website wants to check your username and password to make sure that you are the same person who registered for this account that originally gave this password. One way to do that is to just store the password in the database and just check, right? Which is fine, but I mean it works. But the problem there is that then the website owner can see your password and if that website gets compromised which they often do then everyone knows your password because it's right there in the database. Now with a hash function if now I can take an arbitrary password and squeeze it down to a gibberish 256-bit hash now the website when you register hashes your password and then stores the hash in the database and then when you then go to log in it takes whatever password you send hashes it compares with the database and there if it matches then it knows it's highly likely that it's the same password if it doesn't then it knows it was not the right password. It's much more complicated than this and we'll go into this it's all hashing and I mean there's salts and hashes and all kinds of problems with this in practice but that's the general idea there. So what else? One of the areas you can use to do the hash table so you can which actually usually it's not done because for hash tables that are such interval parts of languages they want it to be very quick to calculate the hash but it's bitten a lot of programming languages because I believe Java and Ruby the hash function was known and so you could make web requests and we've put variables that would all hash to the same value in a hash table so it used to be an o1 lookup would become on squared for all the values and so you could send a request with I think like 10,000 characters or something to completely overwhelm the server just because it was hashing all the values to the same location in a hash table so the way they fixed that was not by using a cryptographic hash function but by randomly choosing a key on startup and so I believe it there's a way to disable it I know Python does this when you start up it randomly generates a key for hash tables that it uses to then XOR into what it would normally do for hashes so what else? code signing what does that do? the developer is the person that did it yeah so similar to what we just talked about message right with authenticating the message it doesn't have to actually be something that we want to read like a readable message it has to be a software package that you're going to install and so you want to verify did the any government agency tamper with this file before I'm going to install it right and so if you know and trust that you actually know Alice's public key then you can know that only the person with Alice's secret key has made this message so and we use ZSH file system heard by some people so ZSH part of what it does is it because actually hard drives so there's controllers on hard drives that look at drive failure and how it's storing data usually they'll tell you when there's an error where you can sort files whether the hard drive thinks that there's an error but you can actually have what they call silent bit corruption which is parts of your block on your hard drive just randomly flip bits and the controller doesn't know about it and so then your operating system doesn't know about it and the file that you thought you had that you think is stored correctly is actually not because the real world of hardware sucks and so the idea is ZSH actually I believe, I don't know exactly what it uses if anybody knows an MP5 or I believe it uses a hash function on the blocks so that way it can actually detect when there's any of this silent corruption that happens so at least you know about it at that point yeah ZFS is that what I said yeah ZFS so file integrity message integrity package integrity password verification we talked about anybody use git? everyone wondered what those long random hexadecimal characters are no that's a SHA256 of that version so git actually uses SHA256 hash function to do to determine heads and I can't remember exactly what it uses but it hashes all the previous commits plus the current commit to get to the next state of the hash table system or maybe it's not, is it 256 or 128? but it's some hash function anybody play with bitcoin? how's bitcoin work? blockchain, it's like the hottest technology ever and nobody has read the Wikipedia article on it you're not going to get more time when you graduate so bitcoin works the idea is literally the blockchain comes from you have transactions and each transaction includes the hash of all the previous transactions plus its current blocks hash and then the next block in the blockchain uses the previous hash plus its current transaction block to generate a hash you have all the everything that's trying to mine bitcoin when they have the rate of difficulty I believe it's how many zeros at the beginning of the hash so basically you take all the transactions that you want to turn into a block you take the previous blocks hash and you XOR all of that together with some random value that you generate and the first person to get or first miner to get I don't remember four or five zeros whatever the difficulty is of the overall hash that's now the new chain in the blockchain and then you transmit this to everyone that says you say I found the next chain in the block and then you start mining the next one off of that, off of the previous hash and so the idea is this is also it's called proof of works the idea is you've proved that you did a lot of work to try to calculate a hash because we know that hashes should be essentially randomly distributed so to get something that has a fixed number of prefix means that you must have done a certain number of calculations so look it up, it's cool you can actually this is actually a good technique that I like to use if you have data that you want to store you can actually store it by hash so you can hash the data itself and use that as a file name if you want to do this so this guarantees that your files will have unique names and all this stuff so for data data processing so these are some of the more formal definitions of the things that we basically came up with when we were talking about hash functions so the idea is pre-image resistance basically means that if I give you some hash value it should be difficult to find some message that hashes to that same value so if I just give you a hash you should have to try two to 128 tries in order to find something with that same value makes sense right, this is exactly what we were talking about the other one is basically what we talked about is I give you some message and one it should be difficult to find another message and two such that the hash is collide so here you need, you should have to try another two to 128 in order to find these depending on the hash function the other general property is collision resistance which essentially generalizes this and says it should be difficult to find two messages and one and two such that the hash is collide and another super useful property of hash functions is in message authentication code so the hash functions that we saw don't have any keys right there's no key, the input is just the thing you want to hash and the output is some hash right but let's say we wanted to so we really didn't talk about it and there's actually a really good reason I'll talk about it in a second so the idea is that the problem here is how to create a signature for a message with a secret key so the idea is you should only be able to verify that this message is correct if you have the secret key so in this case if you just take the message and you hash it that should not be the case so the scenario is let's say you're a web server and you want to store on every user's browser cookies that specify who this person is so some kind of user ID so let's say you want to store the value user ID equals 50, that's your message to their cookies you can trivially not only can you remove your cookies so cookies are simply a way that a web server asks a web browser to send this data and when you talk to me later send it back to me and because that data lives on your computer you, the owner of the computer can change those values, send it to whatever you want so if you knew my user ID was one you could change your cookies that said user ID was one and now the website thinks that I thinks that you're me and you're impersonating me but if I send this message along with a message authentication code which is a hash based on this message and some secret key that only the server knows then if you change this message what should be the case? think about it this way so the server knows some key and this is the message so the MAC takes in the message and the key and then outputs so this outputs some hash based on the key and the message and then have this user ID 50 along with this MAC so you have the message along with the MAC you change the message do you know K? assume it's a 128 bit number so it takes 2 to the 2 to the 256 tries to guess so do you know this number? so can you create a new hash for this message? not the same one the server doesn't have to store anything that's the beautiful thing here so the idea is the server sends so let's call this little h so the server says hey web browser store this message m which is user ID 50 and the message is in clear text it doesn't matter and little h this is some hash it's not just a hash of the message because then you could set user ID equals 1 hash the message and then send back to the server your new m prime so let's say you have some m prime some new message that you want to send so you have m prime what do you not have in this scenario? the key so you should not be able to create the same h message so you can call mack so you can always try to brute force the key you can call m prime and try all possible key combinations until you can find the key but that would be based on the size of the key so if you use a long enough key that should be very difficult so the idea is only the person with the key can generate this hash so you can think of it as making the hash functions and hashing something such that you will get the same hash for a value only if you have that key and know what that super value is and this way we can know that whoever, whenever we get back a message, if you send me some message m prime and some h h prime I can calculate the mack of m prime with the key and verify that that hash matches h prime if it does I know I sent that mack if it doesn't I know something has messed up and I know to deny you access so we get authentication and integrity so this seems like something trivial I mean one of the first things you would think about doing so here the double bars mean concatenation so the idea would be take your secret key right append the message at the end and hash it with some hash function this seems reasonable why does it seem reasonable to reverse the hash so they're giving the output the mack of this they can't go back they know the message but they don't know the key so for them to break this they have to then brute force the key and try all possible key combinations at the beginning concatenate it with the message unfortunately this actually can be vulnerable because of how depending on how the hash function is implemented we're not going to go into all the details but it's really interesting basically the idea is hash functions do some kind of crazy operations and then their output is the state of the system at that point so if you give me the mack of some key concatenated with some message then I know the output of your hash function at a certain state at the very end here and then if I send it to the message let's say I want to add extra parts to the message I could then recreate the hash function state and hash the other four parts or whatever bytes I want to send and it'll actually the idea is if I want to send so let's say I have our user ID equals 50 so I have this which is our message I have some H of that message so I want to send back user ID equals 50 and admin equals 1 so the idea is I can't change this user ID equals 50 this is fixed I can't ever change that because on the other side the mack is a hash of the key concatenated with the message but this mack that I'm getting H tells me everything about let's say md5 at this state so md5 is in a certain place when it processes K so the entire key plus the message so I can actually recreate that state here I can then say okay hash ampersand A admin equals 1 and then give me the hash for some new H prime that I can send so I can essentially extend the length of the message as long as the prefix remains the same and then we create the hash function state from there so it's mack that actually works on real hash functions so I believe almost all I think shop3 is the only hash function that is not vulnerable to this so the other idea would be okay well then let's put the key at the end take the message, concatenate the key now if you change the message length you completely change the hash function so part of the problem here is depending on the hash function so like we said right the hash output if I find some message prime where the hash of message prime is equal to the hash of the message then I can use that message prime because it's being appended to this key so this is a case where rather than the security here depending on the key K and the size of the key if I can break the hash function I'm knowing the key size so this is it's kind of more of a theoretical attack but it definitely can happen so I just want to get through this so the idea is to use a secure computation that people have studied which is a hash based Mac or an H Mac the idea here is essentially that you extend the key so you take the key you extend it to be the block size XOR that with some outer pad and then you concatenate that with the hash the key concatenated with the key XOR with some inner pad concatenated with the actual message so that opad and ipad are just constants that only need to differ by one byte but they are very different in practice anyways the idea the actual implementation here doesn't matter but the important thing here is that you should use an H Mac and never do this yourself so if you ever get the feeling that I need some way to verify the integrity here stop using H Mac and when we come back we'll talk about public key cryptos with some weakness