 Okay, folks! Good to see you again. How was your trip? Uh, trip was good. It was rainy, actually. I hit all the rain that you guys got on Monday, on Wednesday, and then I delayed my flight for like an hour, but for an hour long flight. Uh, but yeah. Uh, sad news. No class on Tuesday. I know. Fall break. You leave and then you don't even cancel class on Tuesday? I know. You guys just want to cancel this class. Cool. So, thank you very much for Ferris for filling in. And now we're going to finish some crypto stuff. We'll get to the next stuff. And we'll get on to the next stuff. So, yeah. Any word on the first exam? Yeah, the first exam will be in two weeks. From today? From today. It's on the website. How many exams before the final? We get two. I don't know. Whatever it says on the syllabus. Midterm on the 18th. Two weeks. In here. This room. Has assignment three been posted as well? We'll talk about it in a second. Cool. Okay. Yeah. Yes, these are the same slides. So, we're filling in some of the blank pieces there. Okay. So, when we talk about let's bring it back to symmetric encryption or any type of message passing. So, what's our entire goal with cryptography? Yeah. So, we have Alice. We have Bob. They want to communicate messages to each other. We have our adversary. We can call Eve or Carol. It's also not a frequently used name. So, you get the nice ABC acronyms. Okay. So, we have some. I'll draw this channel in terms of like dots to say that this is some insecure channel. So, A wants, this did not work very well. A wants to communicate some message M to B. And what does A want to happen? Properties does A want to hold. Confidential. Confidentiality in what sense? Doesn't want E to be able to read it. So, it doesn't want E to be able to read the message or know what's in it. What else? Yeah. Efficiency. Efficiency in terms of what? I don't want it to take three days for me to send someone a message. Yeah. Or you don't want the receiver to maybe take three days to have to read the message because maybe the message is time sensitive. It's about something that's happening tomorrow. So, if your surgery is tomorrow and they get the message three days from now, then that's not very useful. What else? Integrity. Integrity in what sense? You don't want, like, B should know that E didn't change the message. B should know that E didn't change the message. Okay. So, thinking through the things that we've talked about, how do we actually, do we actually have any integrity properties? So, think about symmetric key cryptos. So, symmetric key crypto involves what? A key, Alice and Bob share a key out of bound, out of band. So, some key K that they both know. Cool. And then A can encrypt the message. Wow, sorry. I'm going to blame it on the sensitivity here. A can encrypt with K the message M and then send it to B. And then B does what? Decrypts with K. Decrypts with K. So, this outputs, we'll call it C is the cybertext, lower K, C. B decrypts with the secret key K, C. Because what do we want to have happen? M. That it returns M. Is that guaranteed? Because E can modify the cybertext in transit. Right, because E could modify the cybertext. And let's call it C. Well, prime is a little hard to just, we'll do C hat. Little hat on it. So, what is C hat? New cybertext. Yeah, not C. It could be new cybertext. So, when B decrypts K to C hat, what's it going to decrypt to? M hat. M hat, or something gibberishy. So, is it going to make sense? Probably not, really. It could, but it's not the message I was intended for. It could, but what was that? It's not the message I was intended for. Yeah, A didn't intend for that message. So, in what circumstances is it going to be gibberish? Yeah, so let's say I change the cybertext and just flip some bits randomly. That decryption is probably going to decrypt to something garbage. Or should. When is M hat going to be something recognizable? But how do I change the message entirely? E doesn't have access to the message. It's going to change the key. What was it? If you have the key, you've broken everything. If E has old messages? So, if E has old messages, so Alice and Bob are sending messages. So, we've got surgery tomorrow. Hey, I don't know, I'm losing the message here. So, A sends message one, message two, message three. So, can we assume that E, so that all of these encrypt to, we'll call them C1, C2, C3. And what can we assume about Eve? Does Eve know the messages? No. No, what does Eve see? The ciphertext. So, this means if Eve now, or if Alice wants to send message four to Bob, what can Eve do? Substitute one of the other ciphertexts. Substitute one of the other ciphertexts, so that M4 is going to give a C4. But Eve can choose any of these other ciphertexts to replace. And is Bob going to be able to tell that that replacement took place? Because he doesn't know the original message intended. Right, because he doesn't know all these gets, he decrypts it. And if it's garbage, he knows, okay, something maybe went wrong. But if it decrypts to a message, how does Bob know that that's not actually the message that they tried to send? So, this is the key problem in message integrity that we haven't really talked about, is, or even, you know, what if a bit is corrupted? Maybe it's not an adversary, but maybe it's just the network. Maybe the network accidentally flipped a bit. How would we know that that we should discard this message and not try to understand what it means? And so, okay, and we also had the problem here that Bob doesn't actually know that this message, well, actually, Bob knows that whoever encrypted this message encrypted it with gang. Cool. Okay. So how do we try to solve this problem? What property are we kind of missing here? Or missing integrity? Well, so couldn't you just sort of like encode that in the message? Encode what? So like, What the message itself is supposed to be? Yeah. Well, so, you say, But then, okay, keep going. Like somebody has some kind of, or, you know, well, I guess, well, because if they managed to switch the messages, it wouldn't really work, but in most scenarios, I feel like you could just, you know, have some kind of, you know, code or, you know, like they do in like spine movies and whatnot. But it's like, oh yeah, the first letter of the first word has to start with like a C or something. Mm-hmm. Then you know. Something like that. I don't know. But then every one of these messages would also have that same property. Well, yeah. So that's why I wouldn't go with that. Right. Okay. Yeah. But at least you know the Biflip maybe? Wouldn't this be solved if it's like timestamps? Timestamps? So put some timestamps in the messages? Yeah. That would, I mean, would that help? Because like if you send a message at a later time and it returns a previous time. Yeah. You could worry about timestamps there. You can get into some issues. Anybody deal with any code that works with timestamps? Yeah. Having written some submission servers and having the time zone not be accurate to Phoenix's time causes some major weird issues. So now you have to deal with time zones and who's the time zone of Alice and Bob and can they even communicate on a time zone? And some time zones are actually like one and a half hours off 100 stuff. So it gets even more complicated there. Yeah. So you can do away with the programming part of it and just incorporate in the message on the time, like at what time you wrote this message. But what time according to who? Let's say you encode it in your time zone and you send it to me. And I see, wow, this emails for this message is from six hours ago I should throw it away. Well, if you put it in your time zone or their time zone, regardless there should be a sequential order in their message. Okay, so maybe you can keep track of these messages as they're going. I could be one way. Yeah. What if you timestamped it like the current milliseconds that you didn't use at the time? The current milliseconds of what? From like the beginning of time. Yeah, so you need like a Unix timestamp maybe. You need to agree on the format. You need to agree on how precise. Yeah, that's, I mean that's one option. What else? Yeah? Maybe we can number the messages. You could number the messages? Yeah. Yeah, so you could put a number, I mean just like I have them here, you could say, so you could give some kind of format to this message in some sense in order to give it some ideas of what the ordering of messages are. It's not symmetric cryptography. Symmetric cryptography, how? Oh, don't use it. Okay, let's use public brand of key crypto. How does that change things? Yeah, please. Okay. So Ferris said on Tuesday that we could encrypt, I'll see if I can get this started. We, Alice encrypts the message with Bob's public key and then encrypts it with her private key. Okay, so Alice wants to send a message to Bob, so the message being M. She does what first? We encrypt it with Bob's public key. So take Bob's public key? Yeah. Which means that now who's the only one that can decrypt this package? Bob. Bob with his secret key. Okay, and then do what? We encrypt it with Alice's private key. We encrypt it with Alice's private key, which gives us what? So this gives us some cybertext, we'll say C. And then on Bob, what does Bob do to unpack this message? We encrypt it with Alice's public key. Public key of Alice on C. And then Bob's private key. And then that gives us what? M. M, perfect. So now do this with M1, M2, M3, and have E substitute C4 for C1. All right, so call this M1, call this D1. This is M1. It's the other way around. Yeah. It's the public key of Bob outside. Because the adversary can intercept, like decode it with Alice's public key. Replace it with their own message and encrypt it with Bob's public key. Oh. Oh. Interesting. OK, so let's say we have this construction. And then Eve gets C. What can Eve do? E encrypt it. Eve can use the public key of Alice on C, which returns, let's say, PB of M. Right? And then now what can Eve do? She can what? Encrypted with a... Sorry, say that again? She can encrypt it with her own, with Bob's public key. This thing? No, no, no. They can do message. OK, so M1, so she can take M1, encrypt it with what? Bob's public key. Public key of Bob. And then encrypt it with who? Her own private key? And so sends this then to Bob. So Bob tries to decrypt this with Alice's public key. And what's going to happen? Stuberage. Stuberage is going to fail. Right? It's not going to be the intended message, but you can't spoof a new message. But yes, you could maybe mess with it. But in general, you could just replace it with any C hat garbage. And then it will result in M hat, right? Yeah. Wait, shouldn't Alice encrypt the message using her secret key person then encrypt that using Bob's public key? OK, let's go through that. So secret key of Alice message? Yeah. We'll go... That's why I saw it in the... Message bar. That's why I saw it in the presentation. And then the private key of Bob. Yeah. The public key of Bob. Mm-hmm. Because you see? Yeah. And then Bob... So then let's say Eve intercepts this message. So Eve now knows C. So what can Eve do to this message? Not much. Yeah, so if Eve flips this message, so nobody can basically unpack this layer because there's no secret key of Bob. Wait, Bob has a secret key that he can use to write an encrypt... Yeah. So Bob can use the secret key of Bob... To encrypt C? Wait, no. Wait. So that... First, encrypt C using Bob's secret key. Alice is a public key first. Can't do that because the last operation on here was the public key of Bob. The only thing I can undo that is the secret key of Bob. And then the public key of Alice. And that thing gives us the new message. Yeah. So my understanding was this was to, like, kind of solve the whole non-recognition of the company chat of any, like, issue. Right, so here you would know that by decrypting of the secret key of Bob, so basically once you encrypt this with the public key of Bob, nobody else can delve inside it and mess or move anything. So wouldn't it not matter, like, the order? Say it again? Like, wouldn't, like, if you had, if you did the public key first and then the secret key wouldn't, that not really affected as much because you'd still have non-recognition. We'll talk about that in a second. You do have this ability for anyone. Once you encrypt it with the secret key of... Sorry, that doesn't make sense. Right, encrypted with Alice's secret key, anybody can remove that layer. So ideally, you'd want to make sure that you can remove anything. And so that's why you'd use Bob's secret key first and then, sorry, the outermost layer. So you think of it that it protects everything that's inside of it, so you can't even unpack that unless you have Bob's secret key. Right, which is in this way, and so what was... So here I can basically send message, I can, well, I'm not actually using anything from this message, but... So you can think of the public key basically locks this whole thing and so you can't mess with it. But we can still, like we just talked about, we can still put random bits and at the end of the day, they don't really know if maybe you're trying to send them a random message. Maybe you're trying to send them a key, so one of the ways that public private key crypto is used is actually compared to symmetric crypto, public private key crypto is much slower. So what you do is the message that you want to send, you actually send a symmetric key. So you generate a 256-bit AES key, which is essentially random bytes. You send that over to them using this method and then that way, now you're able to communicate with symmetric key crypto. So this is why you can't really rely on the message itself being random or not random, or you need some format or something else to the message. So we are going to, the whole idea is we want a quick and easy way to check did this message get altered or changed in transit. So how do you, have you taken a networking class yet? Your networks? Yes, a networking class, I don't care what it was. So you have data being sent, let's say through the air over 802.11 or LTE or whatever, or even on the ethernet, do you have physical mediums perfect? Yes. Yes, they are. No. No, and in what ways, why does that matter? What can happen when you're sending data across an imperfect medium? It could get corrupted, yeah, just random bit flips, like you don't know why. So how do a lot of networking protocols handle this? They detect it, how? Handshakes? Handshakes. I'll system and then use it like a three thousand, all of their size, and then you check if it's the same amount. But how do they actually do it? What's one of the ways? Like how does TCP do it? Or I think ethernet uses the same one, although I'm not 100% certain. They have a header before each packet? They have a header before each packet. Anyone ever enter their credit card number in wrong? When you click submit, it tells you that's in it, that's the wrong credit card number, even though it sometimes never actually talks to the credit card company to figure out that's the wrong number. So similar type of idea is there, so for a lot of network protocols, use cyclic redundancy checks, CRC, which is a way of basically, I honestly don't recall the specifics of it, it's essentially reducing all of the input down, so let's go look it up real quick. Cyclic redundancy check. So the idea is this is an error correcting code decide to understand, oh yeah, it's definitely in ethernet, that's where I remember it from. And essentially, it's checking that, yeah, it's actually a very simple thing. You're XORing each of the bytes in terms of 32 bits or 16 bits, you just keep XORing, and then your final value is whatever the XOR is, but this is actually not super important, it's calculated. The idea is you can take a message, you can calculate a CRC on that, and get some, so how big are your messages? I don't know, how big can a message be? As big as you want, generally? Yeah, I mean different hardware or different physical mediums have different sizes of messages, but in essence, as long as you want. So essentially you can think of, we're encoding along with the message, some kind of check, we'll call it H for now, so that we can send it along, and they can say check on the message that that equals H. So do we want H to be as long as the message? No. Why would that be bad? That's an awful lot of space right there. You're doubling all every bit of data that you sent, just to make sure there's no errors or problems in it. So usually CRC, let's say 32, generates a 32 bit value that is the output and I'm deliberately not calling it something right now, but some kind of error checking code, so that way you can recalculate that CRC32 on the other side and compare to make sure that it matches. Similar thing happens with the digits in your credit card. It's called a LUNLUHN check. You can go look this up and say, I can't remember the exact algorithm, but they do some operation on the digits and then it should be that the final digit is a zero. So you can actually check in JavaScript on a web page if the credit card you're entering is valid. Which is why they can detect if you just fat finger and accidentally input the incorrect credit card number they can actually detect right away. So this concept is, and we're going to be very careful about the terminology we use, the concept is known as essentially hashing or the idea is you have some message, you want to reduce it to, you want to reduce it such that every message M hashes to the same hash, right? The same message outputs to the same hash. Yeah, like a signature. So you can think of it like a signature of the message, yeah. Yeah, okay. So, yeah, we want the property that, so we'll call it H. So we want some property that if we call H on some message, it produces, what do we want to call it? We'll call it S, lowercase S for signatures. It's going to mess this up with private keys. I don't think so because we're talking about different things right now. S for the signature of that message. And so, right, this makes sense. If this function was somehow non-deterministic and gave you weird output, you would not be able to tell if the message was sent correctly. So let's say that the size of S is, we'll use an easier one now, 16 bits. Why is that important? What other properties would we want in this hash, in this function that's like a signature? Since we have messages of potentially unbounded size, if the length of the hash is too small, then a lot of messages will map to the same hash. Yeah, so one thing that seems very intuitive would be, hey, I want a property that I have two messages, M1 and M2. If I hash M1, should that be equal to the hash of M2? Unless, while you could do this a little more formally, you could say, if this is true, then what must be true about the messages M1 and M2? It should be the same. Can you actually guarantee this? No. Why not? Because the hash... Well, how could you guarantee this? The hash size was literally large enough to cover every possible message. Yeah, you just output the message itself, right? Your hash is the message. But we just talked about we don't want that, why? Yeah, because it's super redundant, right? We don't really want that. Cool. So we think we want this property. But if we do this, then we can't actually limit the size of our messages. So if the size of our hash is 8 bits, would you be able to... Sorry, if you had 8 bits with this property hold, if it's 16 bits with this property hold, we can't say probably not. That's not a great answer. Be more firm, be okay with being wrong. No. No. Okay, what about 32 bits? What about 128 bits? What about 256 bits? I thought I got the message. Yeah, as long as the messages can be as big as you want, then this is called a hash collision. So the idea is you have two messages that hash to the same value, they are now colliding and have the same value. So you can't actually prevent this. This seems then crazy why are we even talking about this if you can't implement this nice property that we want. Because how could you do this? So think about this. So I give you... Let's say I give you a message M1, and I say find me a message M2 that has the same hash. How would you do it with 16 bits? I guess you could hash 17 different things or something like that, and then two of them will have to collide. They will hash you, right? Because you only have 8 bits to put this hash into. So if one message hashes to all zeros, you try 16, 17 others, eventually you'll find another one that hashes to all zeros. So what property do we want? Okay, so let's start from scratch here and say man, this is too difficult. And so actually with CRC32, I won't go into it. You feel free to do this as an exercise yourself. I think I do this in my grad 545 class. With CRC32, you can actually just calculate there's actually almost no group force. You don't have to test two to the 32 messages. You just take a message and mathematically figure out exactly what bytes you need to have the new message hash to the same thing as the old one. So what property do we want to hold? That the probability of a collision is unlikely. That the probability of a collision is unlikely. How unlikely? Hopefully we won't get a collision until we insert n plus 1. Yeah, so let's say I give you m1, right? So you can easily take the hash of m1. We'll call it what we call it s1. Then, okay, we're going to be a little hand wavy here and say something like it should be difficult. And specifically how difficult to find an m2 such that hash of m1 is equal to the hash of m2. What do we mean by difficult? That we would have to enumerate all possible combinations of other characters before we come into one. Right, so when we mean difficult, how do we know that we can find one? Yeah, we just proved it, but how many do we have to try? 17 for 16 bits, right, n plus 1. Which was what? Based on what? The size plus 1. Yeah, it's like 2 to the size of s we'll call it for right now. So basically however many bits your hash is, if you can do it easier than just brute forcing all possible combinations, then that's a bad hash. Are there any other properties we want? So CRC does not have this property because you can very easily brute force faster than the size of the hash. So that's why we're not... There's a distinction between hash functions and cryptographic hash functions. For right now, since we're talking about cryptography, we're going to assume they're the same that they have these properties that we're talking about now. They should be easy to compute and difficult to reverse. Why difficult to reverse? Because then it would be really bad because we could find out the message from the hash. Yeah, well in some sense you could think, well is it even possible to reverse them, right? So if you think of it as a function, you have an infinite space of input space and then the hash maps to either 16 or however many the 2 to the size of the input is and so every output has to have multiple inputs that hash to it. So we also want... So H of M equals S. So basically given S, you should not be able to discover M. Unless you do what? You just keep trying messages. A, B, C. Although let's say your hash size is 16 or 8, let's say 8 bit. 8 bit size of your hash. How many do you have to try to get one message that collides? 513. What was that? 513. Say that again? 513. 513 for 8 bits. Yeah. Yeah. Is that right? Yeah. Okay, perfect, good. I think I said the wrong number in my head. But yes, okay, perfect. But is that necessarily the original message? No. Why not? Because multiple things can have hashes. There's infinitely many messages that hash to the same output value. Cool. Are there any other properties we want? Ideally, using a slight change to the message should change the entire hash. Why do we want that? That way, it would be much more beautiful. Right, so we want a property that messages, similar messages should have completely different hashes. I'll put that as like 4. Don't want to express it mathematically. I'm going to say similar messages, very different hash. Cool. Yeah, so if you flip one bit in the message and you get a hash that has one bit difference, then you could probably pretty easily go backwards, right? That's kind of a follow on the previous one. Anything else that we may want? Once we have this as, let's say, security, so now let's say we have this as a primitive. We have some hash function. What can we do? So think of it this way. What if I gave you some hash, let's say lowercase h? And I tell you, let's say this is going to be, let's say at, what's today? October 3rd. 10-3, I give you this hash. Is that right? This is a 3rd. So no idea what day it is. The 4th. Wow. Gone yesterday. Yeah, two days ago. 10-4, on October 4th, I give you a hash. I say, let's say I predicted the winner of this year's Super Bowl. Could you tell which team I picked based on this? No. Yes. Well, maybe. It depends. Maybe. How would you try? So if you say that you, you have like a list of teams maybe, and you know that you hashed a particular like team. I'd say the hash algorithm. I use a certain hash algorithm. Yeah. So if you, if you gave us the hash and you know, and we know like how the input looks like, how you hashed it, then we could, after the Super Bowl is over, we can like hash those same teams, that same input and then compare. Ah, okay. But you could also do that beforehand, right? Yes. And you could figure out which team I picked. Yeah. Because you could go over the team names. Yeah. But what if I, the message is I am picking team X to win the Super Bowl in 2018? You'd have to know exactly what the message is, right? Yeah. So then on the Super Bowl, I don't know what the Super Bowl is this year, whatever. And then I say, I'm super awesome because I released my message of what team won the Super Bowl, what team that says in the message what team won the Super Bowl. How could you verify, so let's say I do this in 2019, how could you verify that I was super awesome and guessed the Super Bowl correctly? Yeah. Hash the message M, get something H prime, and if that equals H, then you know that I actually, so it's kind of crazy, right? I can reveal a message that I know something on a certain date, but not actually reveal what that message is. Even if you put I am picking whatever, as soon as you know anything about the message, even like a length constraint that severely limits the amount of space you have to enumerate to find the hash. Sure. Did I tell you what the message is? So what do I give you on the today? Fix size hash. Do you know anything about the message or the format or even what the message is? Maybe it's a picture of me holding a sign of what team I think is going to win. So we also know the hash algorithm? Yeah, for sure. So I'll tell you it's a hash in whatever, whatever. We'll go over them in a second. We get super cool properties like this that you can do things. It actually does come in handy. Let's see. So I think we actually derived all of these things that we want from hash function. So we think of them and when we say function, are we talking about like a programmatic Python or Java C program function? No, why not? Well, it can do anything, right? A function can change global data. It can do all kinds of stuff. Here we're talking about a mathematical function. When we say mathematical function, what does that mean? No side effects. So for a given input, it always produces the same output. Think about the addition operator as a function. It takes in two integers and returns an integer. So here we're mapping arbitrary size data to a fixed size bit string, which is exactly what we talked about. It's a one-way function. So what does this mean? Very difficult to go back, exactly. Or it should be the only way to go back. It would be to enumerate all possible messages as we just discussed. But even then, unless you know exactly what the message is that you're looking for, it will be difficult to do that. Cool. Easy to compute. And again, this goes back to efficiency. Like we talked about, it should be very easy to compute a function, but difficult to go back. And we can tell. We already talked about this. So is it a one-to-one mapping? No. Why not? Exactly. Arbitrary size data. That is an infinite set mapped to a finite set. There's no possible way that you could do that. Cool. Deterministic. This was our first criteria, which makes sense. We don't want to have random values in our hash because that will produce bad things. Cool. And then a small change in an input bit should completely change the output. So this is what we just talked about. So there should be no way to go backwards. So we can go back to our example. How do you know that I made this proclamation and I didn't steal it from somebody else? So let's say in 2019, you're shocked. You think I'm the best football predictor ever. But it turns out I stole this from some other analyst. How would I do that? I steal what? The message contains something you generated. But let's say it doesn't. Let's say it just has the phrase, I am so awesome. I predicted the 2019 Super Bowl champions are XYZ. You're shocked. I didn't make this prediction. Couldn't you map any message you wanted? There's no way of telling if it's you or not. Yeah, almost. If you use public key cryptography, you could sign it by encrypting it with something. Yeah, think about it for a second. Where did H come from? Do you know that I came up with H? What do you know about H? You could just say that I posted on this date. You have no idea that it actually came from me. Just that I posted it. Right? So if I was real smart and I wanted to do this trick on you and there's somebody else I trusted to make predictions, you could take their hash, send it to you, and then later afterwards take their message and send it to you and look at how awesome I am. I'm super good at predicting Super Bowls. So then how could I verify that I'm the one who sent that? Would you use, maybe you could use your private key to encrypt a message and then hash that ciphertext and push that hash out. And then later instead of giving us the message, you would give us the ciphertext that you could do with your private key. Interesting. But I'm already keeping my data private anyways, right? I'm not releasing the message at all. So if I wanted to really convince you, well, yes, okay, perfect. So I could at least convince you that it came from me. Actually this has gotten into a really interesting area that I probably haven't thought about. So I could encrypt this hash with my secret key. So you know at least that this thing came from me. But like we said, then that means you can get h and somebody else could actually re-encrypt it. Not the hash, like the message. And then you hash the ciphertext that you get. Okay, so the message, what do you want to do? And then h? Yeah, so you hash the ciphertext that you get and you post that. So at 10-4 you post the hash of the ciphertext and then in 2019 you release the ciphertext. I'm going to call it capital H. Everyone cool with that? Yes. So I hash the output, the ciphertext of my message being encrypted with my private key, which is going to be a hash function. And then afterwards instead of releasing the message, I release this of the message, which means what? How do people verify this? They can decrypt it with my public key to get the message and they can take the hash of the message encrypted with my secret key and verify that it matches up there. And if it does, then they know I was the one who actually made that because it's encrypted with my secret key, which only I know. That's cool. But isn't the message I'm construed be stole from somebody else? In some sense yes, but in this other scheme the message M wasn't stolen. Remember, I don't actually know who's going to win the Super Bowl. I'm just passing somebody else's message of who they think is going to win because I can pass their hash along. But in this way everyone can re-verify by doing this computation that I'm the only one who could have computed this because I was the only one with the message and I'm the only one with my private key. I like to do the same thing though, like take someone else's message and then encrypt it with your key. Sure, I can take anyone's message here, right? You don't actually know that this message comes from me. But here in this case, did I know the message here? I didn't. No, I don't. I take this hash from somebody else smart who's doing the same thing. And I claim that as my prediction. And then later I can take their message and say, look, I was so smart. I guessed this thing earlier. But here I can't do that in this new situation because I have, because this, if I did this and tried to do this, nobody would be able to decrypt, what was it that was down here? Nobody would be able to decrypt the, they would take this that was encrypted with somebody else's secret key and they wouldn't be able to decrypt it with my public key, which means this message originally was not from me. But since there's such a gap between now and 2019, is it possible to add, to come up with, I guess, I don't know. I was gonna say if you could make a ciphertext after the fact, after Super Bowl happens, that happens to hash to the same value. So you could do that if, what's true about the hash function? If it's broken. Or a bad hash function. Otherwise, how do you have to enumerate if this is let's say a 256 bit number, then I would have to try two to the 256 possible message combinations that also had my own, what I wanted. So this is actually, the reason why I bring this up is because I think it was Shaw, we'll talk about the specific encryption algorithms in a second, but Shaw one, why do I keep this phone? Shaw one was, I don't know, it's MD5, I think MD5, yeah, was an older hash function that was broken such that somebody could do this. They created PDF files, and they announced, I think it was some, I wanna say it's the winner of some presidential election, and they all hash, every file hashed to the same hash value. Was it Shaw one too? Okay, perfect, yeah, that's kind of the classic case because you have two documents that hash, oh yeah, they had, yeah, there was, I think it was Frac, this hacking magazine that had an article, the entire magazine hashed to a hash that's in on the front page of it, like this thing hashes to a thing that's in it, so, anyway, it's a pretty cool stuff. So, another thing that we can do, and so basically, so another thing is in this case we're hiding the message, but oftentimes we may want to send a message to everyone, and part of the problem, and again, this is anytime you're doing public key crypto, the main problem is that it's slow, right? So this hash of message, and actually usually the slowness depends on the size of the message itself, so if you have this really long email you're trying to send to somebody, and you're encrypting all of that with your secret key, why are you doing this? So that others can, so that others can read it and know that it came from you, not only do, they need to do this whole operation, but you need to do this whole operation, and it's very slow. So, assuming we have a hash function H, a cryptographically secure hash function, what can we do to try to speed this up? Because what am I, I don't care about the confidentiality of this message, because anyone can take my public key and read the message. So how can I speed this up using hashing function? I want to say you could hash the message and then encrypt that with your private key, like the hash. Hash the message, we'll call it a signature, and then, so I know how many bits is the signature. However big my function is, right? 32, we'll say 256 is usually the safe one now. So I can send somebody the message with, encrypted with my secret key, the signature of the signature, send that to everyone, and now can they verify that I sent this message? I feel like, but in this case, anyone can take the message and hash it and then wrap that in their private key. So they try decrypting that with my public key because they think the message came from me, what's it gonna say? Jibberish. Yeah, it's gonna say jibberish, it's gonna say it's not, it's gonna come out to jibberish, right? So exactly, and the whole reason is, you think this message is from me, you're verifying that this message came from me. So you're verifying it with my public key. If you verify it with the attacker's public key, then you're screwed, alright. Because exactly, they can write their own message, calculate the signature very easily based on that. Cool. And so this is actually what, so using this primitive, we get this nice signatures, and this is how in public key crypto, which you're gonna get hands-on experience with, you actually write signatures to messages, and you can sign a message such that anyone can verify that it actually came from you by putting basically exactly this, a hash or a encrypted with your private key hash of the signature of the message. So also some other cool uses for hash functions. They show up actually in a lot of different places, which is why we cover this here, and why they're such an important topic. File integrity, so why is file integrity important? Care about the integrity of your files? No, they go away, it's fine. Computer, 8-year homework, or hard drive, 8-year homework. One thing is, make sure that you're downloading and installing, you can see it by some other attacker. Yeah, so one thing, and maybe if you're downloading an executable, you may want to verify that the executable you have is the same as the program, and if it's a gigabyte program, again, you want to be able to calculate quickly a hash of that, which it does very quickly, and compare those two. You have other problems though of where you got that message from, or where you got the hash from, actually, that's of doing the comparison, because if somebody's gonna feed you a fake binary, that hash value's on there, possibly. So you'd want to sign hash things again, but now you have a problem with what public keys you trust. It's a whole big issue. Also, I don't know if you know this, but sometimes hard drives actually experience bit rot without saying that they have errors. This happens more often with spinning disks than SSDs, but so new file systems like ZFS is one that I know and use that actually stores not just every file, but every hash of that file. So you can actually go through your file system to check if any of your files are corrupted silently, which is one of the worst ways that you can have to add a corruption. Other issues that we'll actually talk about in more in the next section when we talk about authentication is password verification. So what's an easy way to check for passwords? Stream equals. Yeah, stream equals. We talk about this, right? You just store the error password in the database, and you check if it's exactly what it said. What's the problem with that approach? Yeah. You have to store every user's password on your system, which means if somebody breaks into your database, then out of the password to every user on your system, your admins have everybody's password. Do you all use unique passwords for every site that you use? You should be. Maybe I'll convince you by the end of this course, but many people don't who have not taken this course, and so they could reuse those passwords. We'll see. You have to actually do this very carefully to do it correctly. So I know a couple of lectures ago, we talked about how if you do that naive comparison, then there could be side channel or timing attacks where it returns early, if the password matches sooner. Would hashing it fix some of the problems associated with that because the hash is completely different even if a single bit has changed? In most cases, I would still be safe and do it correctly. That way you're not leaking any information about the password or the hash version of the password. It could be you're using a really crappy version of the hash or maybe it's a bad hash function. You would still want to try to make sure you're doing constant time operations and do basically a men compare rather than a string comparison to tell you if those memory regions are the same or not. So yeah, I'd still do that. Proof of work. Anyways, many cool uses of hash functions that show up all over the place. They're super useful. We actually talked about some of these properties, so I'm not going to go into any of them, but these are kind of more technical hash function properties. This is exactly what we talked about. Again, this actually uses the language that we've said of it's difficult, but again what we mean is it's more difficult than just brute forcing. So basically if I give you a hash value, it should be very difficult for you to find a message that hashes to that hash value. Another thing is if I give you a message, it should be very difficult to find another message that hashes to that same value. And by difficult, and these are again the properties we already went over, so this is not really new stuff. The other thing is collision resistance. So basically it should be difficult to find two different messages that hash to the same value. What's the difference between these? In the first one you're given a message. No. Look at it again. What are you given in the first one? Given an input M1. Sorry, I have it between second pre-image and collision. Yes, perfect. So in the very first one all you're given is the hash. It's basically testing the one-way ability of the hash function. Can you go from a hash back to the message? In this one it's saying that if basically second pre-image-resistant is if I give you an input, it should be difficult to derive another input that matches. So that kind of does with the bit flips and the fact that the things... What am I trying to say? Yeah, so that tiny changes to the message should be different hashes. It should be very difficult. The second one is actually giving the attacker a lot of capabilities of choosing their own messages. So a lot of them you can think of this is more attacker capabilities versus less. So a lot of hash functions start to show they're aware of the collision resistance. But usually at that point you want to abandon them completely. Cool. Okay, so very quickly go over this. So using these primitives we can... And we actually talked about this when we talked about how to make sure that the messages we sent were in order. We talked about different strategies of counters. We talked about time. But all of those are basically adding a protocol into there of what messages should look like, what they should be like. I will say that it is very difficult. There's a lot of complexity here depending on what type of message that you're sending or what the specific properties of the hash function versus the public primary key crypto you're using. So it's going to be difficult to get right. So an idea is we want to create a signature for a message with just a secret key. So why is that helpful? They're useful. When do we want that? Electronically similar documents. Yeah, so I want to send you a message and I say in five minutes do you want to send me this message back to me? What do I want to verify about that message? That it's the same one that you sent to him? That's the same one that I sent originally, right? Even though we're talking the same person but I don't need public or private key cryptography at all because I'm just talking with myself, right? But I want somebody else to store data for me that I can check and make sure that they didn't actually alter any data. And this happens literally every single time you're using the internet and the browser. So web browsers, essentially if you look and study the HTTP protocol there is no concept natively in HTTP of sessions. So what is a session? Why is a session important? So you don't have to log in every time you go to a different page on a website? Yeah, so session stores information about who the web server is talking to. Without any session information every time you make a request to a server they say, hey I've never seen you before here's the page you want. Can you make another request? They say, hey I've never seen you before here's this other page that you want. So how can you ever do login capabilities which said, hey, remember me from three requests ago I logged into your website and I'm user Adam, right? Otherwise you'd literally have to like keep logging in every time it would be kind of a nightmare. So the browser is, so the web server can ask the browser hey store this piece of data and send it back and I'll kind of try to remember who you are. Look at the cookies on the browser and mess with them. So there's nothing that secures them from the user they're not trustworthy. It's like handing data to somebody and expecting them to hand the same thing back to you. So for instance if I set a cookie on your browser that says, hey your user ID is 50 what would happen if you change this? Hey you could be different users logged into the system so this is the scenario that cookies can be trivially altered we basically do not want that capability and we want to make sure that nobody's messing with this message and that the cookie value does this cookie store any like, confidential information? Right now no, right? You know what user ID you are I don't care if you know I don't care that if you change that user ID to be a different user and are checking out somebody else's information. So the idea is we have some random key that we're generating as a server a first implementation of this you would think, well just take so the double bars here are concatenation take the key and concatenate the message and run that through a hash function h to get a in this case of all the mac it turns out this is why this is difficult this is vulnerable to what's known as a length extension attack where you can and the key idea is that the hash function actually stores state up until here so you can add to the message and then recompute the hash function for those additional characters and it will allow you to mess with it so I'm cool. So it actually turns out and the key idea here is it's complicated and this is why when you're sitting down to design something especially you want to do some kind of crypto thing which it seems like it should be very simple hold this thing for me and give it back to me later the proper technique here is called an HMAC which uses a good hash function and you actually need to XOR the key with first a high level and then you can make that with the key XOR with a different pad and then append the message so you have to do actually all of these things anyways it's crazy so crypto is nuts like you would not unless you're well versed in this stuff and I'm definitely not you would not look at this and be like aha this is a trivial attack that you can clearly do unless you study all these different types of attacks so what I want to leave you with before we talk about the next homework assignment crypto research there's a lot of different things you can spend all of your time doing theoretical things and breaking crypto so let's say you have a 64-bit hash function if you're able to find a collision using only 2 to the 63 tries that hash function is now considered broken because you don't have to do the whole 2 to the 64 tries even though in practice it's probably not that much but as we've seen attacks always get better over time so that's when it's time to move on to a new hash function there's a lot of cool research in actually using like creating new theory, new theories new implementation, new types of public key, private key crypto new types of symmetric encryption that are faster crazy types of new cryptography one is homomorphic encryption which basically says how could you let Google store your data and also give you a search functionality on that data if that data is encrypted as we've seen encryption once you encrypted it's basically random and no data remains so how could Google index all your emails and have a nice email search functionality if all the emails aren't encrypted so homomorphic encryption is a way to store data without knowing what the contents are while being able to run queries over that data which is crazy secure multi-party computation so if you think about large healthcare providers they want to share data and information about people about like trends of diseases but they do not want to share their databases with each other because they are meeting in the same space so secure multi-party computation says how can we each run a function on our own data and compare those results so that we only get the high level output but you don't actually leak any information about each other's data it's crazy anyways applied crypto is how to actually build these things, how to attack applied crypto systems it's a super cool area of cryptography any questions before we get on to the assignment yeah implemented similarly to the encryption functions so when you implement hash functions in terms of data structure you don't really worry about reversibility I don't know enough about them to definitively answer that I think some of the original functions run some symmetric crypto with like a fixed key basically and then you have to because the size of the input you have to deal with it so you have to have some way of doing that I don't really know how modern hash functions work besides making them work and these high level properties but yeah you can spend a lot of time reading about these things more questions so as I mentioned 18 midterm exam the next day the next homework assignment so this is a homework assignment designed to do two things one is get you actually familiar with public and private key crypto in a way that is not at the theoretical level you will actually deal with you will generate a public key for yourself along with obviously it's corresponding private key that will identify you and will also be working on your social engineering skills what's social engineering yeah like hacking humans instead of computers so for instance some of the cool social engineering things you can find is somebody calls up your telephone company and gets them to either turn off your service or to move the number somewhere else there's been you can attack DNS this way by getting the registrar to reset the password to whatever you want all kinds of crazy stuff so I'm going to first describe it very quickly and then we'll set some ground rules okay so the key thing here and I think it was mentioned on Tuesday how do you so one of the key things one of the key things that we talked about here is what should Bob be able to do in this public key scenario does Bob need to know who's public key is who's yes Eve commences Bob that her public key is Alice's public key then she can easily intercept all the messages and send whatever fake messages she wants pretending to be Alice so this all but how do you actually how do you actually establish identity in the real world? Identities, like who you are yeah you could meet them you could look at government issue identification and then maybe get there their public key you could look at a website you think they control and take their public key from there of something you think they control you could rely on somebody else to keep track of who owns what and so we're going to be looking at the GBG web of trust model which basically relies on people validating in supposed to validate in person people's identities with their public key and then and only then signing their public key as a form of trust to say I trust that this public key belongs to this person and then you can say how many people who've signed that how what people do those people trust how much do I trust them and it forms this web of trust so we are going to create our own web of trust and distrust in this class so you're going to learn a lot about a lot of different things essentially at a high level you're going to create a public and private key pair that will have your name will be exactly the name that ASU has for you in this class so I have all of your names from the ASU registrar's office on the website it's not up yet so to get started start reading about GBG start reading about GBG key creation start playing around I learned with this very soon and the whole idea is so A we need to know whose keys are actually in this class why do we need to know that so that you know that somebody's key so one of the things is how do you know that's actually their key so what you'll do so every one of you will generate a public private key pair the name of your key will be your name and then you will upload that public key to the submission server which will sign it with a special key and then give you that signature so that you can prove to everyone else that the CSE 365 signed your key so A right off the bat any key that's not signed by CSE 365 is garbage we don't consider those keys we don't trust those keys we don't like those keys okay then and please please do not lose your key pair what happens yeah you are hoes you all have Dropbox accounts you have free Dropbox accounts you have free GitHub accounts with private repos store that key somewhere figure out with whatever software you're using for that key how to back up your secret key don't just put it on a public place because that will have problems but please do not do that okay then so there's two parts to this assignment one is you will sign have your public key you'll get 20 of your classmates to sign your public key pretty easy on the flip side you will get 20 of your other students you will sign 20 keys right so you need to do 20 signatures you need everyone else to sign to sign your key 20 times okay so that's kind of the basics the signing, signing trust things now when we get to the adversarial part every one of you I will have decided if I'll let you choose the name no I think that's too front first I will generate each of you a fake adversarial key with a fake name of somebody who is not in this class and this will be your adversarial key so you will get public and private keys there the next 10 points of your assignment is do not sign invalid or fake or adversarial keys so if you don't do that you get 10 points because part of this is validating that you're signing the key of somebody who is who they say they are if you trick somebody else to signing your adversarial key you will get extra credit extra credit per person yes per person so rack up as many signatures as possible yeah it will depend on the total so that's really incentive zero is going to get correct because I really want to incentivize people to trick people to sign their adversarial key and incentivize people to check because if you're not incentivized to check it all you just sign anything anybody gives you so you're not tricking anybody for anything it will depend on how many actually happen I'll have something that's fair for people who got adversarial keys yeah that's why you don't know that's a good point yes you will not know I will assign it based on what I think is fair for those people who did it yes I thought about that too yes that was a concern so the idea is let's say it was a point and a half for every sign of an adversarial key then you all could talk to each other and just say let's sign everyone's keys and then you all get extra credit and more points than more than this negative point yeah so I'll know at the end I'll calculate it will be something fair so really here's the tips that I've learned on this from people who have done this you need to really go through and understand what a GBG key looks like how do you sign it how do you verify that it was signed with the CSE 365 public key what does your adversarial key look like what kind of things can you do to your key that other people will try to do to you to trick you to sign it should we have fun to sign it and you have to talk to each other too oh computer people talking is there a key where to go with your key go with my dad go on go on go on go on go on go on go on go on go on