 Alright. First off, new homework assignment. I know I was getting a lot of complaints that you weren't getting a homework assignment, so I thought I'm just kidding. Nobody did that. Except for the one thing. Just kidding. Okay, so in this homework assignment, this should be a fun homework assignment. They're all fun, right? We all love doing them. Specifically, in this assignment, you will be breaking a crypto system. So you will be performing the role of a crypt analyst. So what questions do you have? I mean, what? Well, let's not say what you have. Okay, so we talked about, like we talked about in the original, on Tuesday when we talked about breaking crypto system, you have the algorithm, the encryption algorithm is here in Python code. And there are two parts to this, two different tests. And you can get your personal ciphertext on the website. This will all make sense in a second. But first, before we dive into this, we will talk about first, any questions on this? I mean, it's pretty straightforward. So what I want submitted is, so you'll need to submit the key. Make sense? Why do you need to submit the key? Yes, that's how you break the encryption, right? So, and submit your name, ASU ID, plain text. So decrypt it. So in your reading file, include the plain text that you discovered. Where did this plain text come from? So do a little digging, figure out, was this just random stuff I typed in, figured out. And if you use any code, which I encourage you to write some code, technically, I think you could do this by hand, but that'd be a little crazy. I would not do that. So write some code in whatever language you're comfortable with. Again, I don't care, it doesn't have to submit and compile on our system. This is just whatever you wrote to help you break it. The code to include your code in your submission and the description of how you like, how to use that code. So, oh, I wrote a function to do this to iterate over this thing and try these keys. And then I have these other functions to test which one more likely. And an overall description of how you broke it. So any questions on that? Two parts, break stuff, have fun, do next Friday. You got a little bit of time. Is that right? Yeah. Questions? So some of you learned an important lesson on homework one. Yeah, starts early, I would say, not just don't procrastinate, but start early. As measured with something like this where you're trying to break a crypto system. So what should be kind of some of your first steps? Panic? Yeah, start panicking now. Yeah. Yeah. So look at the, look at the lectures. You could read the book to understand a different perspective on how to break a crypto system. What else? Yeah, figure out what the set of, what the plain text that are allowed and perhaps the keys that are allowed. Exactly. So study this algorithm to try to understand what is it encrypting? What else do you try to figure out? Yeah, the function itself. How does this work? What is the encryption algorithm here? And this is key. So you should be able to use this when you don't have to use this. If you don't know Python, don't use this Python. It's more or less pretty straightforward. I'm totally happy if somebody wants to post on the Piazza alternative implementations of this and other languages. That's totally cool as well. I don't really care about that so much. I mean, I don't care about that at all really. It's more about just understanding this algorithm. So you should be able to encrypt things with it, decrypt things with it, understand how it works. And then you can progress from there to do some of the breaking style stuff that we talked about. Any questions on that? This is going to be fun. Yes, exciting. You have to use some sort of algorithm. You have to break it. That's so I don't know if you get lucky and guess a key. You should put that here that like, you know, I may be slightly skeptical of that. So you better have a good explanation for that. You say like a magic fairy told you the key that is also not acceptable. So really, I don't care how you do it, as long as you know how you did it and that you know how to do it. Does that make sense? No, can't tell. You're being awful quiet. Are you already starting to panic? Wait to panic until after class. No need to panic. I'm not saying anything. I'm just saying start. Don't start the day before like, you know, but a lot of, and this is a good, not just in general for coding assignments, but for things like this, we need to break something. Is it easy to get stuck? Yes. Yes. What are you doing to get stuck? No, you don't give up. Bad advice. Bad advice. Have a bad life. What do you do? We're talking about what to do and get stuck. Go to office hours, get help. So go to office hours. We will probably ask you more questions than give you answers. But that's okay. That's the goal. Google. What else? Yeah. Piazza. You can ask questions on Piazza. Sleep on it. Sleep on it. Yeah, you could just think about it. You ever think about a problem? No, you just start typing randomly and hoping things work out. Okay. Right. And specifically about thinking about a problem. This is why if you have more time, you can take a couple hours a day off and then come back with fresh eyes. The other important things to do, reread the assignment description. There are many people for assignment one that just weren't implementing the spec correctly. Their logic actually was right, but they weren't following the specifications. And so by revisiting that, they just missed a critical part. So kind of with fresh eyes, revisiting the assignment description. With fresh eyes, revisiting how you're attempting to do this. I mean, these are all good valid techniques to not get stuck. Getting stuck is normal. It happens for everyone for all types of especially I'd say on these like kind of hacking style assignments where it's you're going to get it or you're not going to get it. So the path you take, you may have to take multiple paths. The other key thing that I'll recommend is think like a scientist when you're trying to break these things and in general. So what I mean by that is create a hypothesis. So if you're stuck, it means that you think your approach should work. But your approach does not work. Right? Similar to a homework assignment, we submit it, you think I've done all of this, and you only pass one test. Right? So at that point, there's two possibilities. What are they? Test is wrong, either the test, the server is wrong, and the test cases are wrong, or what? Or your or your code is wrong, or there's some assumption in your approach or your code that is wrong. So that's what you need to start thinking about. Okay, what assumptions do I make? How can I test that I actually can do this? Some things to do would be once you understand this algorithm, encrypt your own thing, and try to reverse the key and break it from the cyber test. Are you just given the cyber test? Yes, and the algorithm. So you have this like we talked about. So you are a what type of an adversary, are you? Cyber tax. Cyber tax only adversaries. So you get one cyber tax, you need to derive the key or break the key from that. Base 64. Is that C++ code? We will get that in one second. What base 64 is? So can we really submit it once since it's really half the cyber text? Can you what? Can we only submit once? I don't remember. I need to look at it. I think you have multiple submissions, but it's I don't know, like you shouldn't be using the website to brute force the key that you should be brute forcing it on, I mean, figuring it out on your own, right? And submitting actually is not working yet, but you can get the cyber text from the site. Any other questions? So I'm submitting a site, it'll tell us like, okay, sir, like it's not going to manually read it, it will just be like automatically say again, like when we input or leave me file or whatever, like, yeah, when you submit, there'll be a field for the key. And so you submit the key along with your files, and then it will tell you if you're correct or incorrect. So assuming you're correct, everything goes good. At the end, after all your submissions are in, we'll read through all of them just to make sure that you actually looks like you actually did it and all that fun stuff. Hey, other questions? Yeah. So do you think the key is 10? What would you submit as the key? A zero x a. Yeah, so it's specifically says they're starting with zero x. So that's what you submit. And then we'll see and make sure that your key is correct. Cool. Those will be fun. Yeah. There's no submitting for this. Correct. It's not up yet. I will post it later. But you can get the ciphertext so you can work on it. You'll know when you break it. The key is certain decimal numbers. You will have to figure that out from this algorithm. Yes. Is the plain text gibberish? Or is it in English? I don't know. Figure it out. Would you be able to tell if you got it correct if it was gibberish or random? Probably not through the submission side. No, that's not. You'll you'll know when that's when I don't tell you anything you have to play the metagame of thinking what am I likely to do to test you on this? Give this to me. I gotta just output random data and so you will have no way of knowing exactly what it is when you decrypt it. Exactly. So depending on how horrible you think I am, and how much I want you to suffer, then maybe you can think about which one of those. No, that will be done. You'll see grades. Let's do it. We'll do it after we do this assignment. I'll email you individually or grades and I'll be a lot easier. For this assignment or in general, I am more of a fan of emailing you directly your grades. So various checkpoints throughout the semester really to get an individual email that says here are the grades we have for you on these assignments. So if you think there's any discrepancies, you let us know. That way you know where you stand. We know where you stand. Everybody's happy. Cool. Alright, let's dive in real quick. But first, we're going to talk about base 64. So what is base 64? Does anybody know? Has anybody played with it before? So if we don't know something, we go to Wikipedia and we figure it out. So base 64 is a group of similar binary detecting coding schemes that represent binary data in an ASCII string format by translating into a base 64 representation. So it's exactly what. So it is just like binary is based what? Base two hexes, base 16. And so the base 64 is base 64. So why is this useful? Why does this exist? Yeah. There you go. So that's a good one. So actually, I didn't know how to type those in. Let's see. Out of text emojis live. Okay, so something's ever starts up. Okay, so I can type in Hello, I can do control D to end the input. And so it'll tell me that this is base 64 encoded Hello, a GBV. And so if I take this, we'll echo this and pipe it through base 64 dash D. And it says hello. So here I'm transmitting the message Hello. Again, you need to think about what you think I think. Okay, how I'm gonna do something really lame, and just type in emoji. I know that's a challenge. Taste this in. Wow. Okay, cool. So now I can echo this and base 64 decode it. It gives me the emoji. Wow, this is amazing. We are entering a new terrible. So what is the what are those actual characters, right? When we had Hello, it was very simple. But how does the computer actually represent Hello? binary, right in sequence of bytes. So we can look at the ASCII table, which will tell us what each of the characters maps to in ASCII. And that's hex there we go. So we can see this is the there we go. So capital A is 41 and hexadecimal. So we can do if I do this, and instead of base 64 decoding it, I output it to test. Cool. So hex dump is a way for me to know what characters, what are the actual bytes in a file. So if I echo Hello to test two, and I say hex dump test two, I'm looking at I look this up in the ASCII table, I have the bytes and these bytes hexadecimal that translate just a bit 68656C6C6F and what's the zero a? So the new line. Yeah, exactly. So it's a new line because echo, even though I didn't pass that in, automatically has a new one. So if I cat, and I print out test two, right, my terminal knows how to translate these bytes into these characters, h e l l o. So if similarly, our other Oh, I see what I did do, right? Okay, we will base 64 decode it and put that to test. So if we look at test, we can see that it has a very interesting f 0 9f 988b OA. So we know that OA is the new line again. These other characters are and if we can't test, we'll see that awesome emoji. So it's very complicated to have you do it mostly because I don't understand all of the Unicode things. But basically, these four bytes represent that emoji when interpreted using Unicode. If we just like, yeah, what's Okay, yeah. Anyway, so you can learn all about Unicode code points, which we're definitely not going to go into. But the core idea is that to easily transmit the fights that this emoji represents, right? f 0 9f 988b. One way to do that would be to base 64 encode it. So turn that into a base 64 representation, tell somebody else that it's base 64. And so they can do what with it? decoded back to the original bytes that I sent. And we can look at the Wikipedia page for base 64. And we can see the exact table here of what everything means. I actually don't know base 64 off the top of my head. So doesn't really matter. But the point is, is base 64 a crypto system? A lot of answers, I don't want to expand on one of those guesses or no, I think one person just going back and forth. Yeah, there's no key. Why not? I mean, you can say there's a key but you cannot change the key. So what would the set of possible keys be empty set, right? There is no key. So I said you have a 64 message. Can you decode it? Do I need to send you the key beforehand? No, similar just like hex, right? If I said, this bite has the hex value f zero. Do you need a special key to decode that? Well, isn't the key in a way kind of the mapping between the symbol of the numerical values? Is the key a mapping? I would say that no, because it's part of the algorithm. So the key in crypto systems, the key exists outside the algorithm itself, and it's an input to the encryption function. But here we have the here we have if I mean, I do need to tell you that it's base 64. So you can decode it, but you don't need any additional information to decode that information. So this is great for if you want to share as text, like say an image file is an image file just printable. You can print it to a printer but into ASCII characters that you can just copy and paste. Yes. I mean, not ASCII, right? So if you want to do that such that you can easily copy and paste in an ASCII string and image, one nice thing to do is to base 64 encode it. Or if you have any type of binary data you want to share such that it can be base 64, I mean, sorry, such that you want to share it to somebody to make sure they get the exact same thing, you'd use base 64. Yes. So this is used all over the place, which is why I'm bringing it up and spending time on it. And also why we're using it here. So we can see that if we walk through this encryption function, the encryption function takes the clean, clear text, does some stuff to it, and then takes the string to return and base 64 encodes the string. So why is it doing that? Exactly. So that when you go to the website, and you get your ciphertext, if I just have random bytes, how do I display that on a web page such that you can easily put that into your program into a string. But here, the other nice thing about base 64 is it doesn't care about new lines or spaces. So these new lines don't need anything. So you can just copy this string base 64 decode it, and then you have the exact bytes that the algorithm. Does that make sense? So this is really the way to think about it is this is just a way to make transportation of that data easier. But don't let this part trip you up. We just had a 10 minute conversation about it. Yes, you'll have to figure that out. That's just the part I want to talk about because that's, it has nothing to do with the actual encryption itself. And now, does any of you know French? Yes, you pronounce this name. Vis now. Vis now, I will never be able to pronounce that right. Okay, I will try. So we talked about the Caesar cipher. So over some of the important things we learned about the Caesar cipher, bad cipher, why is it bad cipher? Easy to decode how why? Yes, the key size, right? The set if you think about, and this is what having formalism is nice, because we can say very specifically, the set of k are these the cardinality of k, the number of possible keys is fairly small. It was 26. So we could easily enumerate all of those. So this was a similar idea. And I believe being out of a I see I already messed up. What actually came up with this other idea for this next crypto system? The idea is your key can be any size and the way it's represented here is as a string. And what you're going to do is you can think of it as every element of that string. So let's actually just go through this. So we have I'm sorry, my no beautiful drawings today because my tablet doesn't work yet for that. So that's not helpful. Okay, so we have the message that you want to crypt. The boy has the ball, which is very security critical. And so what was the first thing we would do in a Caesar cipher? Determine the key? What else? Before we can even encrypt the message with a key? So we take up the spaces, not your question. Right, we can encrypt the spaces. And so the idea behind this cipher is we can have actually any length, key we want. And what we're going to do and we'll represent here the keys as letters. So we can do let's say our key is in this example, we'll do key is a because that's going to be easy. So what we're going to do is go along the plain text, repeat the key over and over and shift each element of the original message by whatever the key is from essentially zero to 26. So rather than have each key, rather than have a key just be zero through 26, we're going to represent this as a word, which is a lot easier to think about. But that just tells us the number of shifts to do. So we will repeat ABC, ABC, ABC, ABC, ABC, ABC, and this doesn't work because we don't have monospace fonts. Okay, slightly bigger. Okay, so similar information. So we have ABC, ABC, ABC, ABC, ABC. And then what do we do at the end? We add a B? No, there's no more message. Cool. Okay, so what we're going to do is shift for every a, we're going to shift the course timing character in the message by zero. For B, it'll be one for C to D3, but we don't have a D in our key. So for every one of those, we'll shift that message. So what's the, so what's the first character of the ciphertext? D? What's the next one? Do I have like 100 people here? B? Oh, B wraps around. Cool. So then what do we transmit to the party we want to talk, send this message to? Yeah, we need to transmit the key just like always, right? We need to have some way to transfer the key to them securely so nobody else knows it. And then we transfer the ciphertext. And then what do they do? The opposite, they subtract, right? So then they take the key they do the same thing, they go ABC, ABC, ABC, ABC, ABC, ABC, ABC, ABC, ABC, ABC, ABC. And then they will get a the boy has the ball hopefully if that goes right. Yeah, so, what's the process again? Subtract it's back this way. No, just like the regular like, so the regular one is your shipping it forward. So gonna take T, so first, you've, you've basically mapped every letter in the how much they've shifted it. So A is zero, B is one, C is two. So let's take T, the first character shifted by A. Which is zero. Zero, so it's gonna be T. And then take A shifted by one, which is gonna be I. And then take E shifted by C, and it's gonna be G. I think this is actually the key we used. Nope, okay, cool. And so why is, what's the benefit of this over the Caesar Cypher? It's gonna be C. What was that? Is it more difficult to brute force it? More difficult to brute force? Why? Because the key is of length three. Yeah, how many, so let's say you know that the key is of length three. How many do you have to brute force here? 26 times three. I can just. 26 to the three. Yeah. 36 to the three. Is it two of three? Yeah. Right, because 26 tries with the first one, multiplied by 26 tries with the second one, times 26 tries with the third one. Anybody know that off the top of their head? 17,000, 576. Did you actually do that in your head? No. Okay, I was gonna be ready for read and impress. I was able to read it. So 17,000 tries, so could you do that by hand? Yes. Eventually. And eventually, yes. I guess the question is how long is this gonna take you to do this for every possibility, right? And so then what happens as we add more characters to the key? Because the key space grows more and more complex. So what about like a size eight? Could you do this? Yeah. And finish before you die? Yes. Yes? Do you think you can do this? Challenge. Challenge? All right, $5 now. I don't think I could probably do that legally. Okay. But what's that, yeah, $208 billion. There's no way you're gonna be able to try all of those. And what if I have a 10 character key, right? This number just keeps growing exponentially larger. So if I want to increase the security here, I just add more keys. So do we agree this fixes the problem of key size? Yes. So what can't, I mean, so essentially, what we can't do is brute force the keys. Okay, let's look at another example of this. I'm gonna continue this way. Cool. So similar way, but with a different key of VIG, we can do the same thing and we'll see that the cipher text changes significantly. So now if we put ourselves in the shoes of the attacker as we like to do, if I'm given this cipher text, we'll think about in decreasing order of attacker. So just like we did with the Caesar cipher of how to break it, so let's start with, we have the most powerful attacker, which is what? What was it? Chosen plain text. So what plain text do you wanna choose here? Yeah, a, a, a's, however long you think the key is, y. Yeah, so let's say, so what I'm gonna do is I take the key, a, b, c, a, b, c, a, b, c, a, b, this is actually more difficult than it looks. And then I shift each of the character from the input that many down, so what's the result gonna be? A, b, c, a, b, c, a, b, c, y. Forward, so when we encrypt, we're going forward. So this is a chosen plain text attack, so we choose the plain text, so we're gonna shift, A, when you're shifting it back, when you're decrypting, a, a, a, a, a, a, wouldn't it be a, z, y, a, z, y? Well, so, okay, wait, wait. So we have the chosen plain text attack, right? So we choose the a's, they're gonna take their key, which we technically don't know, and then we're gonna get this cipher text. Isn't the, oh, oh, okay, yeah. This a, b, c, a, b, c, a, b, c, so how do we encrypting, we're encrypting it. Exactly, yes, yes. Is there, like, a version of this new crypto that's, like, backwards? So, like, a subtractive person, then you add it? Possibly, though, it's just a different crypto system, right? So again, if you assume that the adversary knows which one it is, it doesn't matter, because going backwards is just the same as going forwards. Right, as long as you're doing the reverse. So we get this, and then we get this cipher text. So, what's the key? Yeah, very simple, right? So, we're able to easily break this so we can choose the message. So what about, what about the other case where we don't get to choose the message? So if I gave you the boy has the ball and this T-I-G-B-P-A-H-B-U-T-I-G-V-B-N-L, could you get the key? Part of it. So you subtract it, so what's T minus T? A. A, and what's H? I minus H, H minus I. Right, so now, so if I know the plain text, I can easily recover the key from the cipher text, right? So we still didn't fix the problem against a chosen plain text adversary and a known plain text adversary. But what about just a, what about an adversary that just has this? Well, let's first have a think, the first thing we need to think about is applying our previous techniques against this new technique. So then what happens if we, so what do we do before? Okay, we brute force it one way, we don't want to brute force it because there's 17,000 tries for here. Frequency analysis. Was it? Frequency analysis. Frequency analysis, right? So what we did is we looked at the frequency of each letter and we compared that with the English language. What's that going to show here? I mean, uniform distribution there. Yeah, why uniform distribution? Because we assumed the keys chosen on random uniformly. Right, so think about this, why in a Caesar cipher did the frequency of the English language, the characters in the English language remain the same? Yeah. Every i becomes the same. Right, so that entire graph is just shifted, right? Because every character has just moved to fix them out. But here, what does the shift depend on? Not just the key, I mean it does depend on the key, but what else? What is it? How long the key is? Which letter in the key again? Which letter, the position of the, sorry, the position of the character in the message, right? Modulo of the length of the key. Exactly, modulo of the length of the key, right, so. So, how, so if we, cool. So if we look at this, we could use our frequency analysis of this one and we'd say that it's using, if we tried to break this as a Caesar cipher, we'd say, well, using the techniques we talked about on Tuesday, maybe a likely key is 22, which we'd try and it's this, or maybe a key likely is 10 and it's this, or maybe a key that's likely is four and it's this, and you can do this for all of these and are you ever gonna find the message? No, because fundamentally there's no way to get back to that message using a one key Caesar cipher. So, a couple of things, and actually as we'll see when we eventually get there and we'll try to think about this. So, we're gonna define some terms. So one thing is the period, which is the length of the key. So think about like a pendulum in physics, right? The period is the time, and here is essentially the time that the key repeats over the message. Okay, cool. So, how can we break this? We can't insert our own message. So here we're only chosen ciphertext. This is all we have. You have this, T-I-G-B-P-A-H-B-U-T-I-G-B-B-N-L. We know the length of it. Let's say we don't even know. Well, okay, we'll start, first, assume we know the length of the key. So we know it's length three. So we should break up this string into groups of three. Why would we break up the string into groups of three? So what would that mean? Like this, like T-I-G-B-P-A-H-B-U-T-I-G-B-B-N-L, or B-B-N and then L. Yes. Why? Because we perceive it. Very well. Because you know that if the key's of length three, then it's just repeated on each of these. Okay, if the key's of length three, then we know that the key's been repeated there. How does that help us? Yeah, so go ahead. We can do a frequency analysis on each position. So now we know that the first character of all of these are encrypted with what? The same, the same letter. It's a Caesar's. The same letter. So this is a Caesar's cipher just on these first letters, right? On essentially here, we have T-B-H-T-B-L. So these are all essentially, and it spells Caesar, is that right? Yeah, with the first letter of the key. And so we could do I-B-I-P-B-I-B, is that right? And that's with key one, G-A-U, and that's with key two. So then what do we know about each of these letters? So we know that they've been encrypted with the same key. So then what does that tell us? Offset. They have the same offset? Yep. They're two positions away from each other. They're two, they're every third position. The length is the third of the... What was it? How can't we just do analysis on each one of them? So what kind of analysis, what do you wanna do? So the letter analysis is the same thing, what count of frequency? Yeah, so then we can do frequency analysis. Because we know each of these letters have been shifted the same amount. So they should each have a different shift. So essentially, now how many Caesar ciphers do we have to solve? Three. Three, right? So we can solve each of these. Are they connected? Can we use techniques? Like I think last time we used techniques looking for like a T-H-E or using in the ciphertext itself. Can we use the fact that this B and this B follow each other? They've been encrypted with different keys, right? They've been shifted by different keys. So the fact that these two letters are the same in the ciphertext just because they're next to each other tells us nothing. But once we're able to figure out one of these keys, and we know that it's correct, then what can we do? Then we can just start guessing about it. Okay, we think we know maybe we have more characters here. We get a frequency distribution and we say, oh, this is likely encrypted with A, which means it's not shifted at all. Now you know these characters and then maybe you can start guessing the other keys based off of that one. So you say, oh, I know this person has a T. I wonder if this is like, and then you maybe break the third one and figure out that this character is an E because it's been shifted forward to. And you say, wow, it's probably highly likely that this is T-H-E, so you'd know the second set here. Is it also like, because I think someone said we have like a repeated like T-I-G as well. Is that, can we also use that? Yes, so let's, we could definitely use that. So yeah, especially when we group this up here, we'd say, well, T-I-G and T-I-G are the same. So that means in the message, those letters must be the same. So yeah, you can think about three letter words and try to guess it based on this. The thing I want to talk about now is, what if we don't know that it's, so we knew to split this up into three different alphabets, why? We knew that the key was like three. What if we don't know? Well, we just said we saw T-I-G and T-I-G repeated sequence, but we looked for stuff like that as well. Right, so we can look for repeated characters and see, but these repeated characters are, let's see, what is it, this T is one, two, three, four, five, six, seven, eight, nine. Nine characters away from the other one. So should we guess that the key is like nine or eight? No, but we might be able to guess like that chunk of the key. Or we know that, so how did, so if we think about what are the possibilities that these characters ended up the same, means it's likely that parts of the key went over to characters that were the same in the ciphertext. Otherwise, those characters would essentially be random. So we'd know that one of the things we'd look for is, so we'd look for repetitions like this. We could say, okay, I know T-I-G is repeated, so I know that the length is probably a factor of nine. It could be nine itself, or it could be three. How could we guarantee that the, how do we know that it's not one? But the key is of size one. Because then it would form a Caesar cipher. It'd be a Caesar cipher, right, and we could tell based on the frequency. We could calculate the frequency and see that, wow, it's pretty random. So we could try this. We could look for, are there any other repetitions in here? Yeah, so we actually have T-I-G-B, right? Which is T-I-G-B and T-I-G-B, which is also nine, is that right? So it could be nine or three. Do we now have more confidence that, yeah, it's probably definitely nine or three because we have more characters matching, right? So it's rather than just randomly occurring because of the way the shifts worked out, it's highly likely that those characters are repeated. So we can look at that. We'll look at some other ways to do it. But essentially, you just figure out how to solve a Pineré cipher. And this is the fundamental approach. So first, figure out the length of the key. And then once you have the length of the key, you know what to do. You split each of the characters up into its own alphabet. Each of those is essentially a Caesar cipher. And you break each of them individually. So we'll walk through that. But this is the core idea. You all just figured this out. Good job. Cool. So if we're given some random stuff, and here I've just, the spaces here don't matter. I've just split it up into groups of five just to make it a little bit easier. And so this is the, I guess, this person was, I guess the first person to figure out that these repetitions of cipher text occur when characters of the key appeared over the same characters in the cipher text, or at least it's highly likely. And we already saw that here, but this is just a different key, right? This was B-I-G instead of A-B-C. But we can still see that same cipher text. And so this is exactly what we talked about. So we take the distance between the repetitions, nine, and we take likely that the period is a factor of nine. So one, three, or nine. And so we can do the same thing here. So does any of you notice any repetitions? What was it? E-Q, are you just shouting out random letters? No, it's a left, second column, third row. Second column. Ah, E-Q-O-O-G, that's good, okay. E-Q-O-O-G, and where are our first column, second column? Oh, here, E-Q-O-O-G, anything else? Are we assuming that Boolean Boolean Boolean Boolean Boolean? Yeah, that length is five. That's pretty big. We'll just go look at this, because you'd want to look at kind of all of them. So you'd see M-I repeated O-O, and actually the E-Q-O-O-G actually has an O at the beginning, so it extends longer. M-O-C is the other second longest. And so you figure out the distance, figure out the factors. So based on this, what's a likely key size? Three, two, three, two. So what are the possibilities based on these factors? Two, three, five, close it. Seven, 11, and also all the combinations of these. So 22, 15, 10, also by six. Two times three. So what do you want to, what would you, if you were gonna, one shot or you're trying to maximize your time, because you don't want to, probably six, why? Because so A, we want to make sure whatever we test first is a factor of the longest repetition. Yeah, a long repetition is less likely to occur due to chance. These repetitions could happen just because of the way the key ends up and the way the cybertext ends up. It doesn't necessarily mean that the cybertext is the same. So we definitely want to try two, three, six, 10, 15, or we did 10, and 30. I mean, we think 30 would be kind of crazy. That's pretty long and compared to our cybertext. And then we look at the other ones and we say, OK, what matches, like what of all those factors matches roughly with the most of the other repetitions? Two and three or six, right? So yeah, you'd think we could try that. There's some that won't work, but that's how the way it goes. So it could be any of these. So this is exactly the analysis we just did. And we could even calculate how many of the others have. So seven out of 10 have two in their factors, almost as many. So six out of 10 have three in their factors. So we could try six based on this. Now, do we want to split this up? How do we actually verify our guess? Do you want to then split these up into alphabets and then try to solve each of them individually? What if you guess the key wrong? You just wasted a lot of time. I'm trying to save you time. So let's think about and to think about that, let's go back to here. So what if we guess a period of two? We ever get it? No. No. Couldn't you tell after the first few characters? After the first few characters, so what I could do is split it up just like we did. So ti, gb, pa, hb, ut. So can you tell I'm on the wrong track right now? Because I can. Yeah, there's no repetition, but I don't know. Maybe the message isn't that long. It says a word is repeated in a 10-character, so 15. How do we break this? Or how did we, when we guess period three, how did we break this? Yeah. None of the 15-charter letters. Why? Why is that odd? An English message will probably repeat some of the letters. It has to do the letters in exactly the same positions, right? Because in this example, we put every third character into the same alphabet. I mean, you could say, well, the odds of that occurring are however long it is by the key or something. What's also weird about not having any repeated characters? That's my frequency? Yeah. Every character has the exact same frequency. Does that match with the English language? No. How do we break it in this when we guess period three? Frequency analysis. We use frequency analysis. We can map the frequency of each of the characters and try to map that to English. But when we do that here, our frequencies essentially become random. So this is the idea that we can use for this is, and it's based on kind of your intuition, is essentially how random are the characters in our alphabet. Are they close to English or are they not close to English? And if we guess wrong, it should be essentially random those characters in each of our alphabets. Does that make sense, questions? This is definitely a good, this is the kind of check yourself moment you want to do. So that you don't spend time trying to make a cipher when you have the period wrong. Because that's just you're asking for trouble. OK. So we're going to define a term to help us kind of quantify this. And this is the index of coincidence. So it's a probability that two randomly chosen letters from the ciphertext will be the same. So what does that mean? So if it's English and you randomly choose two letters. So let's say the opposite. So you have a completely random input, random characters. You grab one character, what's the likelihood that you grab another and it's the same? Six. Yeah, one out of 26, right? Because that's the probability. It's just a completely random whatever you're going to get. So let's say it's English. You grab one character and you grab another. Is it going to be one out of 26 that they're the same? No. Why not? Because the frequency of the letters. The frequency of the letters changes, right? It's different. And you can think about it as kind of so. And this was in some sense the key problem with the Caesar ciphers. We had this distribution of English that just got transferred or moved around, right? It got shifted. So we say we add an extra character to kind of shift some characters by some and some characters by another. So the idea is if you only have a period of two, you can actually calculate how much from English the index of coincidence deviates. And so you can find tables of this online. So for one, it's the index of coincidence once you calculate it is, I don't know, on average 0.066. For a period of three, it's 0.047. For five, it's 0.044. And then you can see, hopefully, that it's approaching 1 over 26, which is, I believe, what it's doing. Is that right? 1 over 6. Yeah, which is 0.038. Can you see that? Yep. So as the period increases, it's essentially a random. That make sense? Because each character in the cybertext has just moved a complete amount. There's no relation between them at all. So there's a couple different ways we can do this. And so this is computing the index of coincidence. I'm not really going to go into how to do this. It's just a mathematical formula. We can break it down very quickly. So you take the n is the length of your cybertext. fi is the frequency, the number of times that i occurs in the cybertext. So you go through all the letters. So this 0 through 25 is all the letters. So you say, how much does what's the frequency of a in the text? And so you say the frequency of a times the frequency of a minus 1. Add that to the frequency of b times the frequency of b minus 1. And keep going with that. Divide all of that by the length times the length minus 1. At some point, I didn't know how to derive this, but I honestly don't think it's that important. So we can compute the index of coincidence for our original example here. If we do that, we'll get 0.043, which is about where? Yeah, 5 or kind of between 5 and 10. I mean, you're talking about this is why you don't want to use this just as your initial thing to tell you exactly where to go. A, this is based on a lot of data. And the granularity here, I mean, look at the difference between 4 and 10. Very, very small. And it could just be weirdness, or you could get lucky. So you just kind of want to use this as a check. So this roughly tells us that our period between 5 and 10 is that giant with our guess, which was 6. So we'd say, OK, that's awesome. So this agrees, so that's good. And then when we go and we split our alphabets up, if we calculate the index of coincidence on each of these alphabets, so if we go back to our old example, what should be the index of coincidence for this alphabet? What's the period of this 1 or 0 or 1? Why 1? Each individual character is shifted. Yeah, each individual character is shifted the same amount, right? It's a Caesar cipher. You think of a Caesar cipher as essentially it is a subset of an NRA cipher where the key size is 1. So here now we have a bunch of these. So we kind of make the index of coincidence. They should be all roughly around 1. So what would the index of coincidences be for these alphabets when we said the period was 2? 0 is not Dimmerson? Not 0. We could self-calculate it, although that's interesting. Actually, I don't know what it would be in this case. But in the general case, what would we kind of expect? Significantly off from 1. Significantly off from 1 and probably closer to random, because we're essentially pulling characters at random that were shifted with different keys. So they could be, and each character, the key, assuming a random key is completely unrelated to each other. So we're essentially just pulling random characters together. So we should get something close to 1 over 26. And so this is how we can use this. So this is why this index of coincidence is awesome, because we can use it in two different ways. One, to verify our period. Does the period actually match with what we expect based on what we calculated on repetitions? Here it's roughly the same. And then after we split it up into six different alphabets, we can then calculate an index of coincidence for each of those alphabets, which should be close to 1, with the period of 1 of English language. It's actually kind of very similar to thinking about looking at the distribution of characters and comparing it to English. It's another type of measure for that. So we can do that. So the first, these are every sixth character. So the first character of the sixth, all right? Seventh character, maybe? This is the first alphabet. So we can see the index of coincidence calculated is 0.069, which suggests a period of what? 1. And then we'll do that for the other one, 0.078, 7856, 1, 2, 4. This one is 0.043, should be worried. Yeah, not necessarily, right? It's something that we'd be keeping in mind, right? Our previous measure checked. A lot of them are very close to, and this one's also very high. I don't know why. It's random, essentially, right? So these are just approximate measures. Nothing's going to match exactly, especially with a small alphabet, right? Because from the original cybertext, how much of each of these do we now have? Sixth of it? A sixth of it, right? So now our sample size of samples to draw from is smaller, so we'd expect a kind of higher variation in these values. But if they were all close to 1 over 26, we'd say, man, I messed up. We should restart from the beginning. And then we solved, just like we did in the Caesar space. And so just like in the Caesar cypher, we can, so can we just brute force each character? So now we have it, because this is how we solve the Caesar cypher, right? We just brute force all the 26 possible keys. Can we do that here? Reach out for that? Okay, so brute force that first alphabet, that's the key. How do you know if you're right? So why do you need to brute force three alphabets? Why can't you do it on one? That's what we did on the Caesar cypher. This is just like random, just every sixth letter. Yes, exactly. So it's every sixth letter, right? So previously, if I can ever find it. Previously, we were able to use the positions of the characters, right? So when we shifted, we knew that when we saw English, because we'd get, ah, now the message makes sense. It's actually English. But here, if you just magically were able to get, like how do you know that every sixth character is English? Like how do you know when that's actually English? Yeah? We'll match the. Right, so you can try to do that, but how do you know that you're right? Because in the Caesar cypher, we can say, okay, match the frequency analysis, and then we say, bang, we got it. It's English. Well, let's say you shift the distribution such that it matches English. How do you know for every sixth character that's actually English? You don't, you fundamentally don't. It's very difficult to be able to tell right away that you've got one part of the alphabet. Because you may get, when we broke that, you may get, there may be a couple ships that are actually pretty good, right? So you don't know which one it is. So a common technique is you're trying to use, so you basically guess in some sense. So this is why you can do this by hand. It's a little tedious, but you can do it. Writing a program to do this is very nice. And so this is kind of a, basically a way of doing frequency analysis. So in this case, rather than doing the frequency of every character and calculating that value and figuring out which ship we wanna do, here we're just counting out what's the number of times that each of these characters appears in that alphabet's ciphertext. So we can look it for all of these. And so we could probably say something like, is the fourth alphabet at the right shift? Like in English, is there a lot of, do you frequently encounter a lot of Ws, Xs, a lot of Xs, Ys, and Zs? No, so this is definitely not at the right shift. And what we could do is we could, and this is kind of a cheating by hand version, is this is rather than that precise graph of all of the letter frequencies of English, here we're just putting high, medium, or low for all of the characters. And so we can roughly see, so like, based on this, what kind of should the shift be for one? For four, so if we shift it, what, back? Four, one, two, three, four. So that becomes one, two, three, four. So that would mean there is a lot of Ws in this text. Six, a shift of six, or is it a shift of six? Four, five, six, so I put this back, one, two, three, four, five, six, one, one, one, one, three. So then one of the most frequent characters then becomes C, which is the length of a medium frequency. What was that? No shift on the first one. No shift, why no shift? So A is frequent, E is frequent, high, high, high in the low. O is frequent, high in the high in the high in the low. And I is also frequent. So it seems like a reasonable thing to say, maybe this was no shift. Does it completely ruin the, so I don't think we've talked about it explicitly, but it's pretty easy to reason to yourself that a no shift of a Caesar cipher, a key of zero is terrible. Why? The plain text and the message. Yes, it's the plain text. The plain text and the message are exactly the same, right? But in here, do we have that similar problem? No, not necessarily, right? Because even though the shift is zero, we have a whole bunch of other shifts that are shifting those characters around, so. Okay, cool, so we kind of go through that and we'd say, okay, what if we assume that this first alphabet has a shift of zero? We could also maybe see, I think, five in here. This one stands out to me because of the, it stands out because of the five C's here. So, I would, and then the five M's. So it's probably likely that one of these is an A and one of those is an E. Does that make sense? No. Maybe one is an E. A and O. A and O, oh, okay. It's already, you know, yeah. So, anyways, we can play with this, try to figure out what some of these things are. This third alphabet, if, what does it say? If I shifts to A, so you have A, B, C, D, E, so that's kind of nice. You have the two A's, four E's. You can try to make some guesses, essentially, and then start substituting and start seeing, okay, from the ones that you can guess, now we can use our knowledge about what are likely two grams, so two character sequences, what are frequent two character sequences and three character sequences to try to decipher. So, we could look for various clues here. So, basically in bold here, we now have all of the ones that we think are correct. You'll have to go with this on your own, but you start guessing alphabet by alphabet. Once you have a guess, now you've got more of that alphabet. And then you can keep, I don't agree with kind of any of these things, like none of these are obvious to me, except for, I mean, the other thing you'd look for is like the, I'm pretty bad at these kind of things, but I can code stuff to do this for you, so you don't actually need to do it, which is nice. And computers are a lot good at, a lot good, a lot better at guessing than you are. So, then once you're here, you can start to group floors or start to pick out words and figure out what the eventual thing is. So, yeah, so here you could use, and this is something that I guess I haven't used before. So, if you know that there's a Q somewhere, the letter following it should be a what? A U. A U, that would be the first thing to try. And so, then you get this terrible limerick, which is a limerick packs laughs. Yeah, whatever you can figure out later, I'm not gonna read it. I will lay out good stuff, look at that, we break in Cyprus every day. Here's the next line on the end of each line.