 Okay folks, happy Tuesday. Let's get started. We're going to pick off right where we left off with the Caesar Cypher. So what's a Cypher? So yeah, we'll go with Cypher Dex. What's a crypto system? Elements that make up a crypto system. Let's go with the encrypt and decrypt. But yeah, BND. Encrypt. Decrypt. What else? More specific. Maybe a key. Key. So the set of possible keys. So how many different types of keys can you use in this crypto system? What else? Yeah. The set of possible plain text and the set of possible cyber text. So think about that as the alphabet. So in some sense what types of messages can you encrypt in this crypto system and what types of things can you decrypt? So what are some, so why do we have to limit the set of keys? Why can't we just use any key we want? Yeah. Because you're wrong. Yes. Or another way to think about it is the lock itself fundamentally. I mean you think about, does everybody know how locks actually work? The locks and that, the ridges on your key, how official of pins and sort of like. Yeah. So there's pins, exactly. So inside the lock, I should get a diagram of this, but inside the lock itself, there are pins and each pin, so this would be, I think there's probably one, two, three, four, five, six, maybe seven pins in this, whatever this goes to. And each of those pins in order for the door to be opened need to be at a certain height. And so as you put the key into the lock, it pushes up all the pins until it all matches the same and then you're able to actually turn the lock. So it doesn't make sense to, if you're trying to break into whatever this goes to and you have, I don't know, a key that looks like this, right? This is a, you can't see it in a video, but this key is very different, right? It doesn't have the same heights, it doesn't even have the same number of pins it looks like. So you're trying the wrong key and the wrong lock. So you need to know for your specific crypto system what are the set of keys, what's the set of plain text, what's the set of cyber text, like what are the possible, the language of possible cyber texts and messages. Cool. So the Caesar Cypher is a crypto system. It's a very primitive crypto system as we saw. This was actually used back in Caesar's day. And so conceptualizing this in our, what we just talked about of our crypto system, we have the sequences of letters is going to be the messages. So can we encrypt a JPEG image with this crypto system? Maybe, why maybe? Yeah, you could go even simpler, right? You could take every bit and say if it's a zero, it's an A, if it's a one, it's a B and just do a bit string of A's and B's and run some kind of encryption on it. So that would be some kind of, obviously you're taking a random input or a different type of input and you're transforming it such that it fits inside this message, inside this language of letters. But so really it doesn't matter, but it may, we'll get to it in a bit. So keys, so a Caesar Cypher, so what is the Caesar Cypher? Can we encrypt and decrypt functions and subtract what numbers corresponding to what letters more specific? Is that Caesar Cypher specifically? Yeah. You would add three to every number. Why three? That should be around each other's. Well, what about a more general crypto system? You can add anything between zero and twenty-five. What specific thing are you talking about? The number of letters in each letter. Exactly, what is that number? Yeah, what kind, I mean how do you just make it up? It's key. The key, exactly. It's the key of the crypto system, exactly. So the key tells you when you're encrypting how many characters to shift a given character forward and to decrypt it tells you how many characters to shift a character back. Perfect, you're saying all the right things. I just want to get us back to the terminology that we're using. So here we can say the key. So if you think about this, we're talking about the English language, twenty-six letters. So that means a set of possible keys is every integer from zero to twenty-five. If you want to shift it, well, if you want to shift it back twenty-five, what's that equivalent? Well, I guess, yeah. It depends on your crypto system. So it all depends on how you define the encrypt and decrypt functions, right? Because they ultimately define, right now, with just this, we're saying we're encrypting letters and our keys are zero through twenty-five, but it doesn't tell us how to actually do that and perform the encryption or decryption operation. But here we can say, so it basically says, so to encrypt it, given a key in K for all letters M, you can encrypt M, do M plus K mod twenty-six. So that's the encrypted value. So what does this mean? What are the mod twenty-six there? So when we reach the end, it goes back around. So if your key is three and you're encrypting Z, what does that map to? C, yeah, so forward three. And you can easily do twenty-five plus three mod twenty-six should be two, right? Zero is A, one is B, C is two. I didn't actually do that. Check it. I don't know. Make sure it makes sense in your head. So this then, so what in this entire operation says that we're moving and shifting the characters forward when we encrypt I? I don't see an I here. K? So K is positive? Okay, partly. How can I keep K positive and change this to go backwards? Exactly. I'm adding K, right? That's the only operation in here. So go back to your question. If I want to change this around to shift backwards, I would subtract K. I'd take M, move backwards K, lot twenty-six. There we go. So then how would you write the decrypt function for this? Yeah. It's going to be the opposite. The opposite. What does the opposite mean? Opposite is easy to say. Harder to write. So if we're adding anything, we're going to subtract from that. So yeah, so we'll take, so we need to know the key. Why do we need to know the key? To decrypt. Yeah, to decrypt. We need to know how much to shift. And so we could do it kind of a number of ways. So you could do it twenty-six plus C minus K, where C is the number, the number of keys. So we're using lowercase M because this letter is in the set of plain text. And we're using C because it's in the set of cybertext. They're the same set. And so now with this crypto system, now we can know exactly how to, what keys are valid. So if I said use this crypto system, the key is one hundred and two. Does that make sense? It's not the set that we defined. Yeah, it's not in the set that we defined, right? So it's not in this key. So we can't use that key. We can only use zero through twenty-five. Questions on this? Questions on any of the notation here? Cool. So, all right, let's go back to thinking about our, and now I'm going to start drawing. Ah, okay, I got to pause something real quick. And we're back. Minor time warp. So, and now I will introduce the most famous people in cryptography. We have Alice and Bob. Alice wants to send a message to Bob. So Alice has some message M. So what do they need to do in order to use the scheme that we just talked about? They both have to have the same key. They both have to have the same key. Let's call it K, lowercase K and key. Think simple. So Alice knows K, Bob also knows K. Alice wants to send this message to Bob. So what does Alice do? Encrypt it. Encrypt it. So it takes the message and what does she need to use in order to encrypt this message? K. Does everyone know K? Ideally know. Why ideally know? Encrypt it, steal the message, intercept the message, write your own. Yeah, so think about, so, exactly. So then how does Bob, so Bob, so this outputs, we'll say that this outputs C, which Alice sends C to Bob. Does Alice send the message M to Bob? No. Why not? Because this plain text, anyone who sees that message can know exactly what Alice is sending. If Alice had a secure way to send the plain text message to Bob, she would use that. So then Bob gets C, how does Bob get the message back? Encrypt it. How does Bob decrypt the message? Take C. Take C. K. And like magic, outputs back M. Cool. Any questions on how this operates? How do Alice and Bob both know K? They could choose it beforehand so they could agree on it beforehand? Yeah. They could have had, maybe they were met in person and they were able to exchange it secretly. Okay, meet in person, exchange it secretly? Why doesn't Alice just send C and K along? No point in that. That's exactly the same thing as sending the message in plain text, right? Anybody who sees, because all, so for somebody to decrypt C, what do they need? The key. The key, K. So the key fundamentally must be kept secret because that is the entire, in this scheme, it's the entire linchpin behind this. I'll try to be louder without this thing being annoying. Okay, so, so now we're bad people, we're attackers. What's our goal? Get the message. Okay, I heard get the message. Alright, what else do I hear? So why do we want to get the message? Try to decrypt C. Yeah, so we want to understand the contents of this message because fundamentally we have C and we want to know this private communication. So we could, so attackers, we want to know M. What else would we like to know? K. K? Decrypt. Say it again? Decrypt future messages. So we can decrypt future messages, which one's more powerful? K, K, K. That depends on what you want to do, right? If you only care about that one specific message, then you don't care about the key as long as you can break that message. Right? If you have the key, then it's great because you can break all the messages, yeah? Sorry, the key couldn't I also impersonate Alice? If you had the key, you could do what? Impersonate Alice? How? By sending a message encrypted in the method that Bob takes on the Alice node and sends it to Bob. Right, because once you have the key, you could then encrypt some message M' with K, I'll put some ciphertext C' send that to Bob, and when Bob decrypts it, what's Bob going to see? M' this new message that you tampered with that you controlled. Wouldn't you also have to know the encryption slash decryption function, which in practice isn't a huge issue because there's like five of them that are used, but... Yeah, so why do I need to know E and D? Like back in the day, if no one had ever heard of like Kaiser's cipher, then it's like, alright, what do I do with this? You know, if they intercepted one of his like messages, even if they knew the key, they're like, well... Yeah, good point, yeah. Well, if you have M and K, you can derive E and D. Ooh, if I have M and K, I could derive E and D? Well, if you have the original message and the ciphertext and the key, you can derive... So maybe if I have all three, maybe if I have the message, the key, the ciphertext, and maybe if I have multiple copies of those, maybe I can try to reverse engineer and figure out what the encryption algorithm is, and then from there infer what the decryption algorithm is. What else? Well, I guess the question is, what should we assume that the attackers know? Yeah, the algorithms for encryption and decryption. The algorithms, why? For the most part, I would say like in the, at least in technology, I feel like those algorithms are like public knowledge. So I would assume that... Let's assume they're not. So maybe they're not public knowledge. So what? So to get back to this, so what does... So the attacker... So I'm thinking about it this way. So if Alice and Bob can guarantee the security of K, can they guarantee that their communication is confidential? So, but let's say if they can wave a magic wand and say nobody on Earth can know or guess K besides Alice and Bob, is the encryption... Can they communicate confidential? Yeah, what do you think? Somebody can figure out what M is. Okay, so they need to secure... Let's say they need to secure... I don't want to use a different color. No, it's going to be fine. Okay, so I need to secure the key. They also need to secure this operation, and also on the decryption side. Let's say they could wave a magic wand 100% secure. Well, that would only protect the message in transit, but it wouldn't stop it if you intercept the message like after Bob has received it, or as Alice is like writing it. Okay, yes. So as the message is being written, let's say we go back a little bit to our example of kind of things that happen in person. It's difficult to spy on, but things maybe in transit, it's a lot easier to intercept a message or do something like that. So what is the security, the confidentiality of this entire scheme, I was going to say like an attacker could guess the key. Let's say they can't... I mean, I have a magic wand, I'm perfect, but they cannot guess the key. Nobody knows K or can guess K besides Elsa Mock, and nobody can see their encryption and decryption operations. Would it all depend on how easy it is to intercept, I guess, C and decrypt it? So, okay, yes. How strong their algorithm is and how well C is protecting it. Yes, okay, so if... Is it a reasonable assumption to assume that somebody can intercept the message or get a cop... Oh, sorry, not the message, but the ciphertext C as it's going from Alice to Bob? Yes, sure. Why? In the scope of the internet, you can just pull traffic, pull packets. In the scope of the internet, you can pull traffic, but more fundamentally based on what we're doing here. Well, they're using encryption. Yeah, they're using encryption, so they must think that somebody could snoop on their communications. Otherwise, they would just deliver the message over whatever channel this is. Right, via pigeons or internet or whatever, it doesn't really matter. Cool, okay, so we know they have to do that. So then, let's say, so the attacker... So we should assume to be reasonable that the attacker can get what out of this scheme? Ciphertext C. The ciphertext C. So if the attacker cannot... Let's say, cannot guess or know K, how can Alice and Bob be sure that they can't get M? Or can they rephrase it at that point? Even if they don't know the algorithm or the key, I feel like just for this particular algorithm, I feel like if you had the ciphertext, because you're just shifting everything by a certain number of places, I feel like there would still be some structure in the ciphertext, so it wouldn't be completely random. Let's think less about that and more... But that would still be somehow figuring out K. But let's say that there's no way that they could guess K. Ciphertext to deduce it? Yeah, I mean, I guess that... So I guess there's a couple things, right? So ensuring that... So A, this is a difficult problem, as we've just dove into. And it's more of thinking about so what capability and really what I want to get to is the security here, a lot of it depends on K not being able to guess K, not being able to brute force K, not being able to just discover K magically, because if you have that, then it's very trivial to break this. So that gets back to should we... So if we considered, well, if the attacker never knows what encryption and decryption algorithm I'm using, is that a reasonable assumption to make? Why? Not anymore. Not anymore, but you could go home and write a crappy encryption algorithm. And it would be crappy? It would be crappy. That's actually what we're going to learn as part of this section. Not just... It's not anything about you. It's basically never write your own crypto because it will be terrible. But, assuming you wanted to do that, did you feel safe and confident in the knowledge that nobody else knows how your crypto system works? Almost as if then the knowledge of the encryption algorithm and the decryption algorithm becomes what? If that's core to the security of your algorithm. Key? Key. It almost becomes part of the key, right? So you have to protect that with the same level of security that you would also protect the key. So if you have the choice between two crypto systems and somebody says, well, you just can't tell anybody that you're actually using this versus another system that says, you can tell everyone on earth that you're using it, but you're still going to be secure. Which one would you want to use? Yeah. The second one, the one that says, hey, tell everyone you're using it. So this is a kind of long way about to get to the fact that let's assume, so actually, maybe if we're more technically drawing these boxes, we would put the operation... Well, this is really bad, but let's put the operation itself of doing the encryption and decryption inside of our secret box, but the knowledge of the decryption and encryption algorithm is known to everyone. Okay, so going back a little bit, we talked about the attacker's goals. We want to get M, we want to get K. Is this a different property? So if we get the message, we're breaking what security property of this communication? Confidentiality. Confidentiality, right? We're breaking the fact that Alice and Bob wanted this communication to be confidential, but it's no longer confidential. If we are able to guess or break and extract the key, what can we do? Fake messages. We can fake messages? What else can we do? Decrypt. So we can do the decrypt operation, or we can do the decrypt operation, which means we can break confidentiality. If we can spook messages, what security property does that? Integrity. Integrity, right? Because the data that's in transmission is being changed, therefore the integrity of the data is changing. Do we, well, do we envision a scenario where we could do this without knowing K? I was going to say you can repeat a message that you've already observed. I may be able to repeat a message, so maybe they're sending multiple messages. I drop one and send an earlier one. Does that, do I need to know the key for that? Not necessarily. So we'll state this more formally later, but let's say I could just find some M prime, or some C prime that decrypts to some M prime. That is what I want, a message that I want. So that could be reusing an old message. Maybe I just change bytes in the text and I will understand without understanding the key how those actually go about. Okay, cool. So what does the attacker have access to in this model that we're building? The encryption and decryption algorithms. Encryption and decryption algorithms, what else? Cypher text. Cypher text, how many? Which is a more powerful attacker? The one that has all the cypher text. Yeah, the one that has all the cypher text, right? So you can, and the way to think about these things is, and why do we care about how powerful of an adversary we're considering? Because it's easier to defend, it's easy to make a system that's secure against really weak attackers, but if it's a stronger attacker, we're still defended against that, then it's more secure. Exactly, it goes back to threat modeling and risk assessment that we talked about, right? What are we, what are Alice and Bob worried about? Should they be considering a nation state level attacker that monitors every single communication that they make, so which will have access to all those cypher texts? Should they be worrying that their adversary has put a back door on their computer systems at build time, which is able to extract K and send it to them? Do they have to worry that these encryption and decryption algorithms are actually built by the adversary and have an inherent back door in them? These are all things that depending on, that all change depending on your threat model. And so, so we can have, okay, so we can have, so we can think about what the attackers have. They could have cypher texts, zero, I mean one or maybe all of them. What else do they have? Super computers. They could have really good computation, what else, I mean in this diagram. So will they already assume that, will always assume they have knowledge of the encryption and decryption functions? The key, okay, but if they get the key, they break everything, so should we assume that, I mean if you assume a powerful adversary that can get the key, you're done, right? You've lost, there's no hope. Why is the message helpful? Because they can figure out the key. Yeah, what else would you have with the message? The cypher text perhaps? Yeah, so you could have a pair of original message and cypher text. And you may have many of them. Why is that helpful? Is that more useful than just having cypher text? Yes. Is it a key? What's that? It tells you the key. It might tell you the key. But I think of it in terms of information. Do you have more information if you have the corresponding messages and cypher text? Already, if you assume the attacker has the messages, you could say that they could take any message and substitute, or take any cypher text and substitute it with a cypher text they already know. So you can think about how the crypto system, how the crypto system will accept that or not accept that. Okay. So there's one more, level of attacker that we haven't. So who sent these cypher texts in our example? Alice. Who chose these messages and these cypher texts? Or who chose, let's say, the messages here? Alice, again, right? So here, I mean, assuming we've recovered these from that communication. So we have messages. We have cypher texts. What if Alice let me encrypt some messages? Getting a little... I'm actually breaking our model a bit, but... Chosen plain text will tell you a lot. Right. So if I, as the attacker... So what's the difference if Alice chooses a message or I choose a message to encrypt? If you choose a message, you can design it in a way that figuring out the function or figuring out the key is easier. For example, you can just encrypt a whole bunch of the letter a, and then that allows you to do a lot more about the process. So with the Caesar cypher, so what happens if I encrypted all a's? Or just how many a's do I need? One. One a. Do I encrypt one a? What's the cybertext? Depends on the key. And it will be what exactly? A plus the key. A is what? Zero. Zero. So the cybertext will be... The key. The key. The key. You will get the key. In code A, you will get the key. So I know in this crypto system if an adversary is able to encrypt a message of their own choosing, is breaking this easy? Yes. Yes. Tribulely easy, right? You just all did that. But is it reasonable for an attacker to be able to control what message they encrypt? Somewhat. Somewhat? Well, give me a situation. You're saying is it reasonable to assume that they... Not assume. How would like... I guess the question is, what's the difference between an attacker who just knows the key and one who's able to encrypt a message, a message of their choosing? World War II, the enigma machine. Explain. Like, if... Like, if the allies... I actually don't remember if the allies got their hands on one or not. I have no idea. They did. They disobeyed it. So like, once they have the machine, they can set the key to whatever they want and then generate their output. But they still had no idea what key the Germans were using for their transmission. Okay, so in that example, what asset is that machine here? You're right. That's the function. Yes. So that's getting access to E. So I, you know, you can say it's easy to know E and D, and now it is, especially when everything's in software. But back then, it was very difficult. But even after stealing the machine, understanding the encryption and decryption algorithm, that's still not a plain text attack, because why? So let's say you set up this enigma machine. It turns out it's a Caesar Cypher. You run A through it. What is it going to tell you? Your key. Your key. Is it going to tell you their Alice and Bob's key? You're lucky. I mean, if you're one out of 26 lucky, you will get it. Yeah. So then what's the, so what's the scenario, so what's the difference then between somebody who can, like how, go to this Alice and Bob scenario, how can an adversary get their own plain text? Concripted. Yeah. Kind of a stupid example, but like back to the enigmas, like if you know that they're transmitting messages from this center and you're intercepting them, you could maybe like, oh, they have a radio guy who just transmits this stack of messages and just sneak in a bunch of your own. Yeah. Spies, right? I think they've got spies. Alice maybe has a spy that slips in a message that just says A or it says, ah, like A and then H at the end. I'm like, yeah, this is a message that somebody would want to send. And you know that and you're intercepting that communication. Have you stolen the key? I don't know. K directly from Alice? Well, the key is still very secret, but you're influencing and choosing exactly what plain text is getting encrypted. Is that more difficult than just stealing messages? Yeah. Stealing messages is fundamentally easier than get tricking, especially if we go to this physical war scenario, right? That's actually a high risk activity. Cool. Okay. So then you could say basically, I want to use prime, I already used prime. Somebody give me like, I'll put a hat on it. M hat. And you'll get C hat all the way to, oh, I did this. M and hat. I'm just making up this notation, by the way. This is like, standard notation. And what's the important difference between these two cases? You choose, in the bottom one, you choose what the message is and at the top one, it's a message that you retrieve. Right? So the difference here is the attacker chooses, whereas before, it's just whatever that communication is. Cool. And so, okay, I guess I was, we'll shift now into more crypto terminology and we'll talk about adversaries because we're considering that adversary. And we will be, oftentimes, putting ourselves in the shoes of the adversary, somebody who wants to break a crypto system. And we've talked about the different ways that we can break it, right? Not all breakage is necessarily the same. And like we said, we'll assume they know the algorithm, but not the key. So, some of the capabilities of an adversary that we talked about, access to just ciphertext, known plaintext. So access to known plaintext and ciphertext pairs. And then chosen plaintext. These are all the things that we just talked about. Questions? Crypt analysts, breaking crypto systems. So how can we break this system? So let's go back to our Caesar cipher. So we have, so M was the set of, let's say, A through Z. Is that a Z? I need to sleep. Okay. Key is something 0 to 26. Okay, so we talked about a chosen plaintext attack. So we can choose M1 hat. How do we break the Caesar cipher? Choose A and see what the key comes out as. So if we choose M hat is the string A. I guess I'll do uppercase to be consistent because the slides also do uppercase. And we get back E. And what's the K? Four. Four. Right? And this is basically for any, so we can see, so is the Caesar cipher resilient to a chosen plaintext attacker? No. You just did it. You just broke it. Cool. Okay. Where's the chosen plaintext attack one more time? A chosen plaintext attack is where the attacker gets to choose what messages are encrypted. So they're able to influence the system. So let's go slightly weaker adversary. You have two messages. You have, I won't be able to do this in my head, but let's try. Anyways, you have M1 is, you know the message is, I'm just going to make this very simple. And you have C1 is, so you have M1 is ADAM, C1 is DEDN. Wait, that's not right? Yeah. So this is what I should get. Have you guys come up with examples? That's usually the better option than me just doing them on the floor. How do you go about breaking this? Subtract C. Do what? Subtract C. Subtract C1, so basically C1 minus M1. And you think about doing this for every letter just to make sure you're right. And what do you get? One. One. One. Cool. You just broke it. So is a Caesar Cypher resilient to a known plaintext attack? No. No. Because given any plaintext and Cypher text pairs, you can easily derive the key. So what if... So now, then what's the weaker of the adversaries that we discussed? Cypher text only. Cypher text only. So we only have Cypher text. And let's think about this. So could you have done the same thing if I just gave you the messages A and the Cypher text was B? Yes. And similar for this? Cool. So now, all right. So now we have M. Okay. So we don't have M. We have some Cypher text. What's the key? How do you break this? We... You only brute force it? You only brute force it? We only have one letter. Well, then we would have to check. You'd have to check what? Well, you would have to you'd have to check every item. But without more messages you won't be able to do this. Why do you need more messages? You were able to break it with one letter before. What's the difference? Yeah. For this, like if we're assuming that the Cypher text is in English it's a little hard with one character because there's... like A is a word in English and I is a word in English. So we can't really assume the key in just one text. Okay. Yeah. We'd need at least M and that would make us what kind of attacker? Say it again. Plain text attacker. But I only have Cypher text. I'm not able to extract the plain text. All I get is Cypher text. So think about Cypher. Right? You come across the writer. You politely ask the person for the scroll that they're carrying and you see some message. Yeah. You need some more context for this because you're not really sure what kind of information they're communicating. So you'd have to get that either from more messages or from like what the transmission is like or what context it is. Right. So thinking about this, right? So if... What if we're not necessarily sure that it's English or maybe... maybe this... we've only got the fragment of the first character of the communication. Right? So we can't even narrow it down to just I and A. It could be fundamentally almost any character, right? So we could try every possible key. Is it easy to try keys? Yes. How many are there? Twenty-six. Twenty-six. Twenty-six. It's trivial. I guess twenty-five I guess as much as we can say. But twenty-six keys? But we can try them all. But how do we actually know if we're right? Unless we have context. Yeah. We don't actually know if we're right. How much information do we have here? Very little. Pretty much not. One character. One character. I wouldn't say a bit, but we have a character. Right? But fundamentally this gives us no additional information because, well, it's a wide, much later, but intuitively it doesn't make sense. If I wrote this on the exam and like give you the key K, you would just, I don't know, you'd probably be within your rights to walk out. But, like don't do that actually during the exam. Oh, that would be funny, but don't do it. Right? So the problem here is the set, all keys could potentially be valid. So what would we need in order to start? Like how would we start? Let's compare this scenario where we have one character to another scenario where we have a huge string. I'm not going to come up with one because this is, I think I have a slide for later, so it's not a big deal. But, we have a huge string. How do we try to figure out the key? Yeah. You only do kind of like the first couple of letters and see if it makes sense of it. And then, so you don't have to test the key on the whole message that's covered. So, you have to be able to test every key on the first couple of letters. The first couple of letters are in natural words and you use that key to do the rest of the message. Okay. So you could start by going through every key. I mean, well, I guess it does depend on how long this is, but fundamentally, if it's, let's say, even 10 characters long, 20 characters long message, you could shift each of the letters in the string by 0 to 26 keys. And then, you know which one is the message. Whichever one makes sense. Whichever one makes sense? What does that mean? Based on the context. If you know Alice is saying like a... So what property are you using to then derive this key? English words? Yeah, English words or essentially knowledge of that, are they encrypting? I mean, is the message going to be a random string in, is the message going to be some random string in M? Maybe. If it is, then how will you know which one is right? So let's say, okay, this is a good example. So let's say Alice is randomly picking letters from N. Encrypts them with the key and gives you the ciphertext. So Alice is, so she gets M1, which you don't get. You'll never know M1. It is, let's just say, 10 characters long string, each a random letter in the alphabet. And then she gives you c1. You get c1. And she'll give you as many of these as you want. So M2, c2, you get c2. Let's first actually stop with this. I think this one. Okay, so M1, c1. So you have a string of 10, 20, 30, 100 random characters. So what happens if we do our previous algorithm, we just talked about decrypting this, trying to attack this. What would we do? So you look at each character, right? You go do 26 shifts right out of each of them. How will you know which one is right? It might be like over the top that you could maybe look up like the statistical frequency of letters in the English language. It's not English, it's random. I randomly picked from M to generate M1. Yeah. If M is randomly generated, then there's no information being passed. So the crypto system is meeting this anyway. Right? Yes. In some sense, yes. Right? What is Alice communicating to Bob if it's a random string? If the random string is like her chosen password. Right? Could be. I mean, we don't know if that's going to be used later, but maybe there'll be more semantic information and future messages. Or maybe if she's, I don't know, something bad as part of it. A location. What was that? A location. I think maybe. Yeah. But the key idea here is that before, if it's an English string, we can brute force, decrypt everything, look at it, because we know that the plain text is not a random, uniform distribution of all possible letters in the alphabet. There are certain limits. What are those limits? People like to communicate with each other? Yeah. With a language. They like to communicate information. Right? Let's change this slightly. We'll say Alice reuses K. She generates some M2. Gives you now ciphertext 2. Now can you break the key? There's no difference in having one normal cool string deferring. Perfect. So we can get as many ciphertexts as we want, but fundamentally, there's no information, the key is only 26 bytes. I mean, only 26 choices. But you still, because there's no information content in them, you still can't understand them. Cool. Okay. So, so fundamentally, how did we break those algorithm, the Caesar cipher? We just broke it in three different ways. Trivially, without doing anything. So if we can choose the plain text, what were we doing? Like, why did we choose to encrypt A? It's easier. But it's easier? But how do you know what to do? You just guess? Because A is 0. Then you add a key on top of it and mod is under 6. A is 0. You have a key on top of it, mod 26. You just get the key. Yeah. So you analyze the algorithm. You understood that the encryption process is reversible in some sense, right? So if you know, worry. So, you know that if you can control M, so M is 0. You can say 0 plus K mod 26 gives you K. You can solve for K. So that means EK of 0 is the key. What about the second one? Where we don't, we have a known plain text and the corresponding cipher text. Did you know, how did we figure out how to break the Caesar cipher? So we just discussed the chosen plain text level where we get to choose. We use knowledge of the algorithm to figure out what to do. How did we break it? So we'll go back here. How did we break it when, oh, that when we were given Adam and we had the other thing that Adam encrypted that I don't want to rewrite. So here we have a plain text attack. So we have the plain text, we have the cipher text. How are we able to derive this key? Subtract the input. Yeah, similar idea, right? We analyze this algorithm and we said, well, if here we know the result and we know M, then we can solve for K very easily. What about this cipher text only attack? Can we break it if we just have one character? No. We'll just talk about that. What if multiple characters? How did we start to break that? It's in English. Yeah, so we tried to figure out and essentially kind of brute force the key to figure out what is the message likely to be. Cool. So these are actually all the, basically the main ways you go about breaking crypto systems. So one, mathematical flaw. So this is basically find a flaw, understand the crypto system. So this was very easy. I mean, if somebody told you that the Caesar cipher is resistant to a chosen plain text attack, you can see mathematically that is not true, because what property would you want to hold for the Caesar cipher to be resistant to a plain text, a chosen plain text attack? Yeah. Maybe it doesn't necessarily map. I was going to say it doesn't necessarily map to the same character, but is it? Think about it in a high level. So what does the attacker know? Given the cipher text, so given like ek of some message, so given encrypted message cipher text and m that the attacker chooses, should they be able to derive k? No. That's what you want to prevent. Here we can see that that's not true. You just proved that's not true. Cool. So other statistics we can, so other attacks we can do, and this is the main basis for a lot of kind of classic crypto attacks are statistical attacks where you're trying to make some kind of assumption based on the underlying language. So this would be your knowledge of as we talked about Caesar cipher, you have a long string, you try all 26 combinations, and you say only one of these looks like English. You know, Alison Bob speak English, they are highly likely to be speaking English to each other. Cool. And what was that implementation attack? What would that be? So assume your map is perfect, your crypto system is so awesome it leaves no trace of any statistical attacks. There is no statistical traces of the message and the crypto system. Sorry, the message, the plain text and the cipher text. Are you secure? Yeah, maybe it's like, so like in a computer system if like you're not even encrypting the messages before you send them. So like theoretically you, it is un-attackable via math and whatever, but because you decide not to encrypt it in the first place. Yeah, what if I have a bug that accidentally sets the key to be zero? You all look at me like that's impossible. What if you had to check like this in your code, just as bit some of you before, I can tell. Right, or you can think about, I don't know, maybe one or something like that. I don't know, zero is actually better because it won't take this branch. So you can easily have an implementation bug the math is beautiful, perfectly secure, no statistical problems, but because of the way it's implemented the key is always zero, the adversary can know that and break and decrypt all of your messages. Yeah, even like less innocent stuff like crime and breach both used how encrypted data was compressed to reduce key length. Yeah, and I don't like call that implementation. I guess that depends on where you fall on the what parts are implementation and what parts are theory and math stuff. I'd say that may be a problem with a mathematical attack in some sense. Wasn't the issue with that though, that the algorithm itself was fine, but then like once it was compressed there was Yes, so the question is then you should, if you're doing that, you should take that up into your mathematical understanding, right? As opposed to an implementation of like don't compress it. And this is what happens often times when a geography is mathematicians come up with these beautiful, elegant algorithms that need to actually be implemented in the real world and for a long time there was this big disconnect where like you come up with this great algorithm you write a theoretical paper and then you wait for somebody else to implement it and then they say, well yeah, you'd want to compress things before you encrypt them or all these kind of different types of attacks that actually leak information. So it could be any kind of these implementation attacks could be all kinds of really interesting things. You could have, I mean there's a lot of these type of attacks and honestly this is where a lot of the interesting crypto, like a lot of cool crypto attacks are against the implementation, not necessarily the math. Which makes sense, right? Otherwise, if we're all using encryption that is known mathematical flaws what are we doing, right? Like why use those? Cool. Okay, so we're going to first start off with classical cryptography. So this is the scenario we've been talking about where both the receiver and the sender share some kind of common key. So we briefly mentioned but how do Alice and Bob get that key? What must they absolutely ensure? Say it again? That only they both have the key? Right, that key is only something that both of them know and nobody else knows. As we'll see why can this be difficult? If you're communicating with a web server you're not going to, how are you going to give them the secret key? Right, again, in this case Alice and Bob have a need to communicate securely. Which means they must be thinking that somebody is monitoring their communications. So if you're in that situation how do you actually transmit a key and a key? It's the same reason that one time pads are even though they're so secure they're not very often used because we have to need base to base to exchange the exact same pad. Yes, we will get there. So it's very clear we started this off with you just clearly cannot share K over this channel. So what basically what people assume is that there's some trusted way for Alice and Bob to share K. Maybe they need in person how do they know if they need in person and they whispered the key to each other how do they know the room's not bugged? That's why this is a difficult problem. Another name for this is going to be Symmetric Cryptography and we'll see asymmetric cryptography in a bit which is super cool. They're two basic types. So what was really the underlying way that Caesar Cypher was working? What was it doing to our plain text? Manipulating it? How was it manipulating it? You can think of it as mapping. It was mapping all characters to other characters. So for a given key, if you give me A I will give you E. If you give me B I will give you and I'm going to stop. So you can think of this these are at a broad case substitution Cypher. So you're substituting one letter for another. Another way that we'll look at is mixing letters up. So what do I need by transposition or mixing things up? Swapping letters around. Swapping places in the string. Swapping maybe the first character and the last character and maybe have some complex way of swapping characters as we'll see. Cool. Caesar Cypher Oh, good. See I did have examples here. So if like Caesar if our key was 3 then Hello World would encrypt to this that I'm not going to pronounce. And so we've already talked about this. We actually already went over this. We already attacked the Caesar Cypher. Why can we trial possible keys? Because there's not that many. So should we extend the English language to get more letters? Yes. Should we include like emojis? So when we do statistical analysis so what so we can exhaustively search, we can trial the keys which we talked about. What would another way of attacking this be? So let's say there's more possible keys. Yeah. You could look more like for a three letter set that repeats itself a lot. That repeats itself. Why is repeating itself important? Probably the same word. Yeah. So it's probably the same word because all the characters have been shifted the same amount. So TAG will be shifted the same no matter where it appears. But otherwise, yeah. Letter frequency. Letter frequency. Instead of looking for like a common word you're just looking for a common letter. Right. So why is that important? Aren't all the letters important? A lot of letters are used more because it's not an even distribution for how much each letter is used. Some are used way more frequently than others. Yeah. Let's go back. Yeah. So going to this example, so what's the example here? So the two L's in Hello World become two O's. So the two L's in Hello World become O-O. So yeah, I could look at, yeah. Oh, I was going to say something similar that like Z is not very likely to appear, but if you have two Z's it's more likely to appear, I think. I don't know if it was two Z's or something else. But certain letter pairs appear more frequently. Right. So as far as Cybertex is full of Z's like here we'd say, well that's probably not Z, I mean. Right. And so we try all, we can look at and what you can do with the Cybertex is you can create a distribution of letter frequencies and compare, so this is what we're just talking about so a one gram model a gram just being, I mean one letter so look at the one letter frequencies and compare that to English. Where do you find this? Google. Where do you find all impressions? Google. And then Google takes you where, Google doesn't have it. Wikipedia. Yeah, you look up you look up the frequency distribution of letters and Wikipedia. Where do they get it from? Stats just magically pops out frequencies of English language. Yes. Yeah, what do they use as input? Text. Close it? People put it in. Just by guessing? Yes. Yeah, analyze text. Use a bunch of text. Generate statistical frequencies. This should be something that you could do very easily. Right. Here's a bunch of documents. Literally go character by character and increment a hash table if you see an a, increment a by one figure out how many letters you've seen and there you have the frequency of every character in your data set. Cool. Alright, break this. What's the key? One, two, three You'll get it eventually. I won't tell you if you're right. So we can easily break we can easily do 26 different versions of this. Should we do that though? I mean you can always think about extending the alphabet making it bigger or more difficult let's say we have uppercase characters lowercase characters, spaces all kinds of stuff So, a different way to do that using the statistical approach which we haven't looked at so what do we do? Walk me through this. Calculate each letter's frequency in English? In the cyber text. Calculate each letter's frequency then what do you do? Do we account for spaces? What do you think? Why not? Why are we assuming A through Z? Why are we assuming A through Z? Yes, we're not assuming anything. Our cryptosystem I don't know how far back I gotta go Let's see, there we go. Our cryptosystem Our cryptosystem Our sequences of letters And we can tell based on the key So the key is 0 through 25 So what happens if we have a space? I don't know, it just doesn't make sense, right? We just remove them in this example That does not to say that you couldn't imagine a Caesar Cypher with spaces, you just have to figure out which position you want that to be what happens what things shift to spaces all that kind of stuff So it usually has a lot more complexity so usually in these classical cryptosystems they just ignore spaces and you basically let people figure it out when they've encrypted it So, walk me through this Do you want to do it? Together? Find out the frequency of the letters in this string and see which ones are supposed to be what the frequency kind of matches What was that? What are you looking for? Or how do you test? Or how does that help you, I guess? Like, going forward Saying this in like a string like the frequency of C is between like 6 or something and then in the English language the frequency of the letter D is 0.6 then maybe you could say, oh, it could possibly be D Cool, and then how would you validate that assumption? You would substitute D and D Exactly, because that could do a possible P to test, right? The difference between C and D gives you a possible shift So you try that on all the characters and see if you're right If it makes sense, if it makes sense as in English, right? Cool, alright, this will be fun We can probably do this really quickly That's a bummer, alright We'll pretend like that's not there Because I think this will be better So prove to you that you can do it Sorry, my system is a little broken, okay, cool So, I have I'm gonna, can everybody rate this? No? How about now? Okay, so we have our ciphertext C and I wanna calculate what? Frequency The frequency of what? Each letter Each letter So, how would I do this? Let's see I want a dictionary, so I'll create a frequency dictionary and I'll say for this is not very well, I'll do I and C so I'm gonna do each character in the ciphertext Frequency of Okay, I want frequency of I plus equal one but this won't work, why won't this work? Yeah, so let's see, if not I in frequency I equals zero Live coding is always fun Okay and then I plus equals one Okay, so this gives me just the raw counts, how do I calculate the frequencies? Yeah, so I can I think easily do that let's say how do I do a dictionary comprehension? I don't remember is it key value like this so I want the letter K and then I'm gonna take the value and because it's Python 2 I have to do that divide by the length of C so the length of C is the number of characters in my ciphertext Okay, so then we have the frequencies so now I need to compare this to how do I figure out which is the most frequent letter in this Yeah, well I need to figure it out somehow right, so I can do let's say what was that? Let's say I want to sort these so count dot I want to sort, but I want to pass in a lambda you're gonna make me figure this out I can't remember which one's the yeah, compare that's key I think we can look at the list we can look at the list cool, there we go B is the most all by H so we would look we'd go to our handy-dandy Wikipedia we would look at English level frequencies when would we not want to use this information from Wikipedia? it's not in English maybe it's not in English a completely different language or low a low sample size of letters it might not be super accurate a low sample size of letters meaning what? like if the string is four letters long then it's not going to be ah yes, so if the string's only four letters long we probably won't be able to derive very much information from here what if it's, I don't know, paleontology paleontologists sharing information weird words yeah, weird words with weird characters that don't normally appear in English right, and so that actually may skew our understanding of what strings and what things to try cool okay, so we would try so now this then narrows down the window so now we try what about B shifted to E because we just saw in this that E is the most common letter don't do to E doesn't really make sense because in the string that you had I think it was LB was the start and that would map L to O and OE is not commonly the start of any word okay, then what would I try next, next most frequent H, so I try H to E yeah which is a shift of well, we'll see another thing you could do is you could just draw this histogram right, so you could draw this graph compared to this one and figure out what shift matches up more closely to this so basically this is what this is doing is figuring out calculating the frequency of the character in English, multiplying the frequency, so basically you're going for every key you're trying to figure out what's the likelihood that that shift matches actually English so you could do this like this, this will give you a score for every possible key size which is essentially doing a mathematical way to do our intuition of try the one that shifts it the most while considering all the possible frequencies so here you'd order your search in these terms and so then you could say well we try it with 23 which maps H to E, so this then gets back to what we were just talking about so this is a mathematical sentence this is highly likely not the correct key we could try 13 and we get a nice phrase that you should never build your own crypto just still good advice never do it and we could try the other ones but here we've clearly seen that we've cool, so what, so some of the problems here as we talked about the key is really short what do I mean by short the number of possible keys is very short right the size, the key space is only 26 so we can easily try for giving message all the keys and so that's essentially the brute force, right we can brute force all possible keys so what if we make the key longer so we have to revise our crypto system it still doesn't really make sense because we're still modding by 26 so it doesn't matter if your key is 0 26 or whatever 26 times 2 is so what we'll see is the next iteration of this is what if we have multiple letters in a key and so we can have basically a thing of like a 3, 4, 5 whatever digit key and shift each position of the ciphertext against the key and shift the key forward 3 or 4 characters we'll see that later and this will be the cipher that we'll study so see you on Thursday