 Okay folks, what should have started on Tuesday's lecture with the program crashed in the last 15 minutes so we don't have the last 15 minutes of lecture on Tuesday, so good for you for coming in-person. I can see at least some familiar faces so you came to bulk, good for you. I don't know what's to happen, so, luckily the whole thing didn't crash. I was slightly worried about that when we left. Let's see, bad news, good news. Nobody coming to office hours? Have you come and started yet? People came to our office hours. Okay, people came to your office hours. Just, I'm very scared. Okay, we don't have to answer if you can think about it. Okay, well, I'll tell you, we had a very good success story yesterday where somebody came in and was only able to compile their project and I was able to help them fix just two minor bugs and then bang, all the test cases passed. So that was, I think that only took about 15 or 20 minutes. I only had two people yesterday, which is why I was slightly worried that either not a few people are starting early enough or, I don't know, you're getting help elsewhere, which is good, so I'm not worried about that, but anyways, don't feel free to come to my office hours. I was just me sitting in there, I don't know, reading or something. Cool, so let's break some cyber attacks. All right, so we have some cyber attacks up here. What is, so, what are the two ciphers we've talked about so far? Caesar hasn't seen the time of work? I should have seen what? Yeah, shifting the letters by the amount that the key specifies, and then to decrypt, you shift back, okay? Then what about the second cipher that we've been talking about? Yeah, visionary cipher, and what's the difference there? How does the visionary cipher work? It's just a bunch of Caesar ciphers glued together. Ooh, it's a bunch of Caesar ciphers glued together. Okay, yeah, so it's a multi-length key, right? And then each key shifts that alphabet by that amount, so one key would shift it by, so for a, we saw the example, we'll go back a little bit, we saw the example with a three-letter key, so we saw a three-letter key big, we have a message, the boy has the ball, and then by shifting each letter of the plain text by the amount that the key specifies, he's now essentially shifted every third character by a different Caesar cipher amount. Cool, okay. So we have this key, we need to know what we want to try to understand. So first, so we have, we've talked about many general techniques to try to attack the cipher. What do we need to try to determine first? The key length. The key length, why is the key length important? If it's small enough, you can brute force the key. If it's small enough that we can brute force it, like how small? Depends how fast you can check. Depends how fast you can check, yes, but. So yeah, because one, we know that that's pretty easy, it could be more and greater if we assume something that's probably harder to have to counter to force, we still need to know, yeah. You can determine the interval that the different ciphers are on, so that if you see where it is, you can tell if it's actually the same sequence of letters, or if it may be just the code, or. Yeah, so let's think about that. So first, what we need to be able to, going back to what you're saying, we need to be able to split up this plain text into different alphabets that are all, alphabets is encrypted with the same cipher, the same Caesar cipher. If we don't have that, we are essentially shooting in the dark, we can't really ever solve this. So, going back to our original example, what were some interesting patterns that we saw that still made it through the encryption process into the cipher text? Yeah, the is encoded twice as OPK, and even more TAPD is encoded as OPKW. And we talk about that, this is because it just so happens that those characters in the plain text were repeated, and they were exactly the amount of the key off, so the exact same parts of the key were over those same characters. Cool, so how does that help us then uncover and try to determine the period of the key? Yeah, so we know that the key, so we know the distance between TIG and the cipher text is, we'll see, one, two, three, four, five, six, seven, eight, so nine, there's nine difference between the repetitions, so we know that the key must either be nine or some divisor of nine, right? So it could be, yeah, so it could be, it's three in this case, we know exactly what it is, but this tells us it's very likely. So what should we probably do here? Where are the repetitions, anybody remember? EQOOG, and the O before it. And the O before it is repeated O, E, Q, O, G. Yeah, so that's a pretty good difference. That is, let's see, I think these are groups of five, so that's five, 10, 15, 20, 25, 30, so 30 apart. Any others? MOC. MOC feels like a massive course just without the online part. MOC, and where's the other one? There we go, awesome. We can count that, what is that, that's five, 10, 15, 20, 25, 30, 30, 40, 25, 50, 55, 60, 65, 72? All right, I have it on another slide. We'll just say it's some length. You can count it later. Any others? Could you write a program to do this? Not probably, the answer should be yes. You can write a program to do this, right? You're just looking for repetitions, you can spit out the repetitions, what the difference is. And so this is basically exactly what we talked about, that repetitions in the ciphertext occur when the characters of the key appear over the same characters in the plaintext. And this can happen for a lot of reasons, like we saw with common words like bub, repeated often in the plaintext, it's highly likely that those, and then it's more likely that the key will go over them. But yeah, so we already saw this here. So we know that in this case, or I guess I should say we don't know for certain because again, this could just be due to chance. It's possible that this just happens randomly. So we have that the difference is nine, so the period is a factor of nine. It could be one, it could be three, or it could be nine. Right, we could easily, how could we easily check if it was one? Yeah, if the key is length one, what does that mean? It's a Caesar cipher, so we can break that using all this other incongruity. Your frequency analysis, we can just brute force all possible keys. Awesome. Okay, so we can look at this in here. You can stare at this for a long time, you can write a program to do this, and you could do all of the, so we can also do all of the two character repetitions, not just three, so we can do two character, three character, and as many as we can. And we'll get a ton of them. So we have M-I-O-O, distance 10 and five, O-E-Q-O-O-G, which is distance 30, as we said, where was the move, it was 72. Yeah, look at that, I can count. So it could be, 72 is kind of a lot, it could be six, it could be any kind of these factors and combinations, let's see. So, let's think for a minute like this one, Q-O. The factors are seven and seven, compare that one to all the other ones. Probably unlikely, right, so this is probably one that just happened by chance, because what does it make sense that everything else is telling us it's either something with five or two or three, but this says well it must be at least seven, that's highly unlikely, so we can probably rule this out as a, so we can probably rule this out as maybe like a serious result. What should give us the most confidence that it actually occurred because of repetitions in the key and the point test? So were there two moments during that? The long one, yeah, look at the longest string first. Right, again, it's important, not the distance, the distance doesn't matter, right, such as how far apart they are, but how long the repetition is. So this is the longest one, O-E, O-E-Q-U-G. So we can start there, and we need to look and say okay it's 30, so the factors are two, three, and five, which could mean it's either, is any of those comments, so it could be one, two, three, five, six, 10, 15, or 30. Do we just try all of those? I don't know, what would your intuition be at this point? Three. Could be three, what else? Yeah, six. What can we use to maybe validate those hypotheses? We can look and see what of those makes sense with all the other ones, right? So we have two, three, five, is it probably not 30, that seems kind of crazy, like to use a key of like 30 with a, what was there, about 150 characters or something? Okay, so now let's look, let's see, okay. So we were looking at and considering, so we're not gonna, are we gonna consider one? No, it kind of seems silly, right? We could just test that, pretend it's a seizure type or break it that way. So we were looking at two, so we say three and six, are those what we said? Yeah, keep this hearing it, okay. Maybe five, so one, two, three, five, six. So let's think for a minute, okay. So if it was, okay, one, we're gonna rule out immediately. Two, all right, can we rule out two? Yeah. Because you have that larger sequence, do you like weigh it every minute, by saying that maybe that's more valid because it is a lot of longer sequence? Yes, so that's why we're starting with that sequence and we're doing all of the multiples here. So that's why we got the one, two, three, five, six. I think it was 10, 15 and 30. So we're gonna start with those as our possible set, we'll window that down a little bit and then we'll try to see which of these matches up most with the rest of them, right? Knowing that it can be serious or it could be not serious. So let's think of things maybe that we can rule out. What about five? Like two and three are factors of a lot of them, but five is? Yeah, so two, we'll get two, right? Two is a factor of a lot of them, actually. Almost all. Three is a factor of not all of them, but more, right, three, three, three, three, three, three. And we can also wait, with our waiting, we can wait at MOC heavier than the other ones, right? So we can see, okay, it's in here, that's good. What about five? Five is in two, only in two of them? Three, well, three including this one, but that's why we have it up here, right? So it's in the longest one, but the only other time five appears is in MI and OO. So we can probably say, well, we can maybe ignore five. What about six? Six. And five of them? Three, one, two, three, four, five, six. Six of them, including both of our six. So what do we think? Yeah. So four is? Yeah, four is not in the divisor of 30, so that's what we're, we've listed all the divisors of 30, we put it down on the other slide. And so we copied all of the smaller ones. So we kind of got rid of 10 and 15 and 30, saying those are probably pretty big, we can check those later if we need to. And we also maybe rule those out here, because we looked at five and we said, well, five only appears in three of them. So let's give it a five, but if you give it a five, you have the area of 10 and 15. And after that, OO has five, and then along the one, OO is the sub-stream of it, does that have, tell us anything? That is a, I think the answer is no, because that's a different repetition. So it's included, it starts at 22, which is actually before this, and goes to, and the next repetition is at 27. So it's not, yeah, we wouldn't use a subset of the same one because that doesn't give you more information. Yeah. I just want to make sure I don't follow up correctly. We haven't eliminated four as a possibility. We're just trying to identify what's probable what to throw our resources at, right? We've eliminated four because it's not a divisor of 30. So we're only looking at all the numbers that divide 30 easily, because those are what the key length could be, which makes sense with the largest. So that's why we're starting with one, two, three, five, six, 10, 15, and 13. Does that make sense? Like a key length, if the key length was four. The longest cycle, repeating cycle text is the highest probability of not being a chance. Exactly. Okay, yes. And so therefore we're gonna be confident in eliminating. Yes, and again, but it could be serious. Maybe only four of them match instead of five characters, and the first one would just do the chance. But we have limited resources. Exactly, so we're gonna, we're trying to, exactly, we're trying to. Narrow down and limit what our guesses are gonna be for the period, and we'll have different checks later to try to check roughly how right we're with it. Which again, we'll also be due to statistics, there'll be some error there, so it's a, you know, this is a, you're breaking cryptography, it's a little bit of a guess and check, I'm saying how you're going, yeah. Just because you've used it four times, spurious means suspicious, what was your name? Spurious means due to chance. So it's not related because, it's not because of the, so I don't want to move away from the slide because I'm not ruining all of my great writing here. But it means that the repetitions in the cyber text are not because the key was over the plain text, so the key was over the exact same letters of the plain text. It just happens due to chance. Questions on this side? You're all going to be quiet. You have to be over here to write. I don't, not like you. Okay, all right. So we have, we eliminated five, so we can maybe rule out some of these that have, well two, we can't rule out, eh. So we look at this, we have two and three, two is in a lot of places, three is in a lot of places, and six, the combination of them is in a good amount of places, what do we have? So this one, this one, this one, this one, this one, this one, four, five, six, six, including both of our largest, also have six in them. I'd say it's probably a good bet to try that one first. If we have a strong argument for not doing that. I was curious, would you want to lean towards a longer amount or a larger amount or not in general? Cause if there's like a lot of characters, wouldn't it make sense that he wouldn't really only be like two or is that, assuming? Yeah, I mean, you'd use your intuition of the problem and I mean, if you want to meditate game it, if this was an assignment too, you'd also want to be like, well, a two-character string is probably not that interesting, like a two-character key because that can be trivially brute-forced and so it's probably something longer-ish and we saw all of the key space that we did on Tuesday, right, of all the different sizes, so yeah, I think you factor that into your guessing first. Yeah, wouldn't using the larger ones first anyway or the larger multiples of the smaller ones first anyway, be better in general because the smaller ones would be substrings of that key, so it would be captured by. Yeah, yeah, yeah, so okay, this is good. Actually, this happened a lot last semester, so you could consider this key a nine-character key that is V-I-G, V-I-G, V-I-G or you could consider it a six-character string that's V-I-G, V-I-G or a three-character string that's just V-I-G, V-I-G. So yeah, in a big case like this, you'd want to try maybe the largest number because that will at least tell you if you're wrong immediately, so okay, let's put it this way, it was a three-character key and we treated it like a six-character key, we would see that the key was the same thing repeating and so there we solved it and we figured out the key. Okay, so we're already, yeah, so we kind of, this was our kind of reasoning that we did, we could also look at it and how many of them have twos in their factors, how many have threes, so maybe we try six first. But we want to check, so how can we check? So we have this, how do we check that six is correct? Yeah, so we looked at the frequency of this whole text of all the characters, we could see something that was closer to uniform, because all the frequencies get all mixed up and put together, but if we split this into alphabets where every six characters in its own alphabet and we graph and we think about the frequencies there, each of those should approximate English, right? The problem that we have is now we have, we talk about the sample side, we'll only sample it from the six to the size of our ciphertext now so the noise gets a little bit higher, but yeah, that's essentially the idea is we can look at, split this up and try to run some, we can use the same statistical measures we did before, I'll show you a different statistical measure so this is just showing you different types of ways to check is this text that I am looking at, is it English text but shifted? So we can use, and this is not to say that this is a better or worse way, so we could do the, essentially the frequency analysis. So we could do our statistical frequency analysis and determine, oh wait, that's not true, this is the correct shift, we could use that later. Sorry, we're gonna use a different measure, okay. We're gonna use the index of coincidence which is the probability that two randomly chosen letters from the ciphertext will be the same. So think about English, this is kind of, in some sense, one measure that we'll look at and try to see how English ish is this text. So for instance, if you, let's see, if you randomly, let's look at the math and then we'll go back, yeah, okay. So have you taken statistics yet? You got like a yes and a no's, something? You know something about probabilities, right? Okay, me too. So the index of coincidence, we're gonna do, so it's essentially a, we're gonna, what is the probability that we choose two characters from whatever, ciphertext, anything, you think it involves? We're gonna choose two things, what's the probability that they're the same? Yeah, so the probability of the first, so the probability that it is that character and then the probability of what's left, that it is that character, let's see if that's right. Yep, okay, cool. So we'll use this, we'll call this, fi is the frequency of character i. So we can do this for, well, so we have fi, okay, cool. So we have fi is the frequency of that character, so just how many of that character appears in our ciphertext? So if I wanted to know for a, what's the probability of getting a? I could say, well, it's the frequency of a times the frequency of a minus one. But I need to divide that by what? So I want the probability. So the frequency here we're just saying is the count, like how many there are. Divide by n for the f a and n minus one for f a minus one. Times n minus one, everyone agree? So this is the probability that I picked two letters for my ciphertext and they're both a's. What if I wanted to calculate that probability for every single letter in the alphabet? So I do, I'm going to use sigma to represent the alphabet, nope, that's ugly shit. I'm just going to call this ic. So I want to calculate this for all letters of the alphabet, right? So I could do, I'm going to stop doing it like that. Times a minus one, divided by n times n minus one. And I'm going to add to this f of b times what? F of b minus one over n times n minus one. Don't need to keep writing this, because I'm running out of room. Okay, so I'm going to dot, dot, dot that. And I can do something a little bit nicer. I can say this is the sum of, we'll turn these into numbers now. It doesn't really matter. We can say zero is less than equal to i, which is less than equal to 25, 25 characters. So for all 25 characters, we're going to do the f of i times the f of i minus one. So the frequency of that specific character times the frequency of that character minus one, divided by n times n plus n minus one. Everyone agree? And we can simplify this a little bit further. Yeah, we can take out the n's, because they don't have anything to do with the summation, right? You can think about it up here. We could factor out this and just multiply all the summation of all of these by one over n times n minus one. We'll just clean it up a little bit, n minus n minus one. So here we have it, and this is all just based on counting, right? There's nothing special that we're having here. n is just the size of the ciphertext. And each f of i is the number of each of those letters in your ciphertext. So this n, so this is kind of one statistical measure that tries to get at what's the aggregate multiple occurrences of characters in a certain ciphertext, in actually any text. So you can apply this measure to English. And does this matter about shifting? So does shifting affect this? Right, so if I have a's in my plain text and I shift those all to z's, does this measure care about that? No, because it's summing up over all the letters. It doesn't care which letter is which, which makes it perfect for using on our ciphertext. So essentially, what you can think of what we're gonna do is use this measure. We, people have pre-calculated this measure for English text. So we can use this measure in multiple ways. A, we can use it to actually split up our language in the number of alphabets. We can calculate the index of the coincidence for each of those and see if it gets close to English. If it does, that means those are all in the same thing. We can also do what we talked about earlier and look at the frequency analysis. So we can look at the graph, compare that with English to see how close we are. Okay, so we can do exactly that. So this is, so now we've split our ciphertext into six different alphabets and we're just gonna calculate the index of coincidence for each of those. Yes. You said that there are pre-calculated values for English, what is, what are those? Yes. I don't wanna get into too much. 0.066 for right now. We'll come back to this in a second. So, and again, remember the important thing to think about here, and again, this is a statistical measure. So here we have what, four, 22 characters. And so we're only on 22 characters. So these may vary depending on what happens. Let's bring it up into six. So we have 0.069, 0.078, 0.056, 0.124, and 0.043. So which ones of these are roughly around 0.066? First one's pretty good. Second one? Yeah, I mean pretty good, just close. The third one? Yeah, pretty good. What about this one? No. And this one? Pretty far away. I don't know, what are we gonna think of? We actually are pretty good, right? If they're all layoff, we would know that this was probably just random. We messed up our period, we'd go back, double check that. And the other cool thing about the index of coincidence is that you can calculate it for different periods. So you can, and maybe this helps us more when considering if those things match or not. So you can calculate it for a period of one, which is essentially no English text. You can also, they've pre-calculated this for different periods of, and you can see as essentially as you get closer, as the period gets higher and higher, it gets closer to one over 26. The probability is equally likely for every character that you poll one and you poll, it's just random. So it's not based on any distribution, so the characters are essentially random. So this is another way we could actually check. We could double check in multiple ways. We can compute, so for our cyber text, the index of coincidence is 0.043, which is in between, it's like a little bit above five, but you can see if these measures are very close to each other. So this is why you need to keep double checking. So it indicates a key may be slightly larger than five. But this is, again, now we're double checking our previous estimate, so this had nothing to do with repetitions or anything, right? So you can think of them as independently coming to the same conclusion using two different methods, that the key is likely to be six, based on just computing this value and our analysis of the repetitions, and then our double check analysis of looking at the index of coincidence for each of these alphabets. Questions? Yeah. Why did you, when you said it could be larger than five, did that mean you were just, because you were testing the value of five, is that why? It is looking at this, so sorry, we're gonna have to go back and forth a little bit, oh, sorry. Okay, so our index of coincidence is 0.043 in that cyber test. And then looking at here, 0.043, at least for the numbers that we have here, is somewhere between five and 10. But it's not gonna be an exact estimate, and especially with the difference between four and five being 0.001. We're pretty small deltas, so we use this as kind of ballpark, are we roughly where we think we are? So now what do we do? Yeah, each of them likes Caesar ciphers. So I kinda mentioned this on Tuesday, but let's think. Should we just brute force each of these ones because we know they're a different key? We can brute force this, try all 26 combinations, but how do we know when we're right? It makes sense? Yes, so that's how we did it for the cyber text or sort of for the Caesar cipher, right? We took this value here and we shifted all 26 values until we read it. But the key difference was the letters in this album that are all the every six letter of the cyber test. So maybe the better way to read this would be something like this. So rather than thinking about, where are you? Pacing, the most difficult thing. Okay, so rather than thinking about it like this, which is kind of where we're thinking and where we think brute forcing may help us, what we really have to be thinking about each of these alphabets is something like this. What did I do to this? Yeah, I left space for all the other letters, right? So all the other letters from all the other alphabets, right? So this is the very first letter of our cyber text. The next character is D-Q-Y-S-M and I left those as underscores and then we have I. So if we tried to brute force this, we could generate all possible 26 values of this. But how do we know when we're right? Because we're missing those other five characters that give us a context solution between all those other three languages. So what do we do? Yeah, now with the key thing to think about is all of these, we have six alphabets, but they're intertwined with each other, right? They're not like previously, where a seizure cypher was just one plain text encrypted and we can easily just brute force decrypt it, right? So we need to make educated guesses about each of the cypher's, maybe shift them and then start thinking about putting them together and trying to make, essentially they're gonna guess and check, so trying to figure out, okay, well I think this alphabet is shifted by this amount and I think this other alphabet is shifted by this amount, then when I put them together, maybe I can find words. I can find things like the and that would help me break the next alphabet and I could try that shift. But I may be wrong, so I may have to go back and it's just like, you guys do puzzles, it's a doku and stuff, right? Sometimes I have to guess what is in a specific box and then you see if that leads you to a contradiction then you go back and change that and hopefully do it in pencil and not pen. Cool, so this is gonna be our plan. We're going to use the techniques for Caesar cypher, but we basically, it's not that we can't use brute force, but that if we were really to brute force this, you have to brute force all of that together, which is the same as trying to break the six character key, which means you're gonna have to guess a lot. All right, but I'm gonna show you. So we can do exactly what we did before, we can calculate this statistical analysis, we can calculate the frequency of each of the shifts to see which gets us closer to this graph, which is exactly what we did before. Instead of just doing exactly what we did before, I'm gonna show you a different technique that we could use, that's a little bit more granular to try and do this. So the idea is this graph is great, right? Because it's a literal distribution of English characters, but it's a bit much in terms of information and it's kind of hard even if you remember when we were breaking that Caesar cypher of exactly what shift was what based on the frequencies. So a different way to approach even this problem in a single Caesar cypher, but also here in a Vigenere cypher, is what we're going to do is count up. So we're gonna take all the letters, put it on A through Z and then for each alphabet, we're just gonna count up how many of each character appears in there. Those are just pretty similar, we're just literally counting in the first cypher text, there are three A's, one B, zero C, zero D, four zero, one one and so on all the way through. And we just do this for each of the cypher texts, this is not all of our alphabets, it's not kinda crazy. And then we'll simplify that model of the graph of English, simplify that down to just high medium low. So just think about it in those terms. So we have kind of this pattern, so you can think about with English, you have this pattern of high mediums, high mediums, high so the highs are all the sites and then you have a series of loads at the end. So how does this help us or does it? Yeah, we can see shifting this alphabet in these different ways, how does that affect and change this value? Right, so we can essentially try decrypting, right? So we can even look at this first alphabet, right? So the very first alphabet ignoring everything else. How does that match up? Or what shift would you do? So let's think about maybe you're thinking, okay, let's look, E is the most frequent, maybe that's actually A. So let's shift everything back by one, two, three, four. Shift everything back for, oops. Right, and then what does that mean for this high? One, two, three, four. So now we've just put W as a very frequent character in this cypher text, so that probably doesn't make too much sense. Yeah. It's actually really close, just not doing anything. It's actually really close, not doing anything. Is that good, bad? It could mean the letter was A. Yeah, so it could mean the letter was A, which kind of seems, on the surface, silly, you can be talking about a Caesar cypher, but would you ever pick zero as your key? Right, because if you pick zero, it's not going to correct your message. But the visionary cypher, could you have one of, would you want the key of all A's? But could you have A's in your key? Yeah, because they're hidden by everything else. It's just not shifted at all. Right, so we could guess for the first alphabet, no shift. It actually matches very well. It has a peaks at A's and E, a peak at O. What about the second alphabet? So the other thing that's kind of nice, like let's maybe look at the third alphabet. The other thing I like to do is to look for like a series of lows because there's this nice part of the alphabet where kind of the ends are all, there's a one, two, three, four, five in a row of zeros. So if we were maybe to think about this as the end of the alphabet, what would that make this first character, I? A, which is kind of high. I mean two, we'd have A, B, now you need like two fingers, which I can't show you here. A, B, C, D, will that make this? E, which is a high character frequency. E, F, G, H, I, which is a high character frequency, is that right? Yeah, I is high character frequency. So maybe the third alphabet, maybe shifts A to I, whatever that shift is. So it would be an I as the shift. Ever agree that maybe that would make sense? Just exactly so. Like I mentioned before, this is a puzzle essentially that you're trying to uncover. So when you're doing this, it would take a while to try different things, try those shifts, see how that matches up. So I know these answers, so I'm gonna lead us to them, so we're not spending all day just working on just one problem. But it's natural when you're doing this on your own for it to take a little while. So don't freak out when, but it's important that if you get two stuff, maybe you come to us for help or you can double check your assumptions, maybe you got the key length wrong. And now you're trying to decrypt things that don't make sense. Okay, and we can look the sixth alphabet. So just kind of roughly looking at it and looking at the sixth alphabet. Yeah, okay, so V, yeah, so here's another grouping of zeros. So shifting all of those to Z would map V to A, and you can see that other one's kind of map up too. So we can maybe start with these three assumptions. So we can say, okay, what if we assume the first alphabet is A, so unshifted, the third alphabet maps A to I and the sixth alphabet moves shifts A all the way to V. So now what do we do? Try it. So we can try it with the cipher text, right? So we can try shifting each of these three alphabets, which we know how to do with the Caesar cipher. And I'm gonna put in, yeah, okay. So I'm gonna put in bold the things that we think we know. So these are the three alphabets will be in bold with the other ones. And now what do we do? So one thing to be trying before thing, we can also just try seeing if we see any patterns with the bold characters. I think in the next slide it'll be. I think it said A, J, E, and that was right above your mouth. Oh, yeah, I know. I'm just trying to see if there's anything else that anybody sees. So the interesting thing is that we have at least the characters that are next to each other. We can try to see in any direction if we see maybe something that looks like the. Is that a bolded A, G, and the second row from the bottom of our left side? It is. Yeah. Yeah, okay, that'd be one thing to try. That's great. Keep that in mind. I don't know if that's right or not, so we'll see. But that would be definitely something to try. What else? Yeah. There's also a T-W-E and going from the second line to the third that could be of a. Yes, there we go. So there's a T-W-E, so that could be the T-H-E. Yeah, that's good. I'm really bad at word games, though, by the way, so this is something I'm very bad at, which is why I have all of you. And so, okay, anybody else have any ideas? Yeah. A, J, E, let me see. Oh, okay, yes, yes. Here, maybe bold is not the perfect choice, but the M is not bold. It shouldn't be bold, right? Oh no, it is bold. It is bold. Okay, yes, yes, yes. J is not, okay. So yeah, you could try maybe, I don't know, Mace or, I don't know, you could try whatever words are four-letter words that M, A, blank, E. A tricky thing, though, is not having spaces, so that also messes you up a little bit. There could be a space kind of anywhere within there, so we need to take that into consideration. We could, and I don't know if I'd necessarily get here, but one way you could think about this is A, J, E. You could treat just that as its own word, and think of a popular three-letter word like that. So you could say R, and maybe try that. So try mapping A to S for that alphabet. So you could say, okay, this looks maybe more, does it look English-y? We got where? Rick, Pace. Yeah, so at least we're double checking now that at least that, does that guess make sense, right? So we can see, does that just make it all gibberish or did it actually, none? Yeah, that's good. What was that? How did our the go? Did any of our thes pop up? T, R, oh, that's a good one, that's a good sign. None. Oh, R, there we go, nice. These spaces are really annoying. Goo, I don't know if that's a word. Goop. But, it's a common word. All right, anybody, what would you, so let's say we started from here, what would be next guesses, maybe? One, two. Sorry, I don't get the numbering scheme. I thought I, oh, fifth, yeah. Oh, oh, oh, the, yeah. The clown are taking over. Oh, that's creepy, okay. The clown on, it goes. Yeah, so we can try, okay, T-H-E, that would be one thing to try, that's a good one. Anybody have any other guesses, yeah. Oh, good, yeah, so that would be a good guess. The last block only has one letter missing. So what would you guess though? Me can? Yeah, so actually, so one thing you could do is look up dictionary words and you could see it, because we could actually, yeah, this is a nice way to kind of prove force it, because we can say, okay, A something, I mean, I guess you could have a tire phrase end with act or an or something, but that would be not great. You could say C-A-N, maybe can or something, or you could, maybe if it's all one word, you could look at like M-I-C-A, what's like a common ending for that? What was that? Micah, if it was a word, if maybe we knew something about this. Yeah, anybody have any other? So it's not certain letters, it's alphabets. So what I'm doing is, so for instance, here mapping the second alphabet from A to S means I'm taking this alphabet, Duke, whatever, this alphabet. So starting with the second one, every character you're shifting from A to S, or sorry, from S back to A. So you're just doing the decryption operation there and then putting it back in with the ciphertext. And you could write a program to do this too. So it's actually kind of nice to be able to, I don't know, try different shifts and print it out together to see what it looks like. Oh, shoot, yes. Which one? Oh, here, A and P. And, okay, is that P the same? And then that would also be a good. Yeah, that's actually a great point, yeah. Yeah, yeah, I like this goop. Maybe a better approach would be to color the remaining alphabets so we could see which ones are in the same alphabets so we could see that that good, like that same P in two different places we think would be very nice to be Ds. So that would actually give us an excellent shift to try. Okay, if you're, I guess, really big on English, you could do something like this and figure out that the last line suggests a mechal, a common ending for an adjective. So that would shift the fourth ab. And so at this point, we could brute force the last alphabet. We could, what happened to our Ds? Did our Ds appear? It did? Where is it? Why can't I see it? Oh, and yeah, the roast. Okay, that's why I was going crazy. Okay. So good showed up and our and, our good buddy and showed up. How is that? Why did I do that? It's insane, all right. Yeah, so we could do anything. We could brute force it right at this point. That would be easy. Or we could do exactly all the cues that you're thinking about. And we could do this and then we got it. We could also use, let's see, QI here so you could say that if we know it's a Q, we highly likely a U follows it. So we could try that. All kinds of tricks that you could try here. So we finally get our plain text. I'm not gonna read it. You can read it on your own time. It's a silly limerick, but you can't even read it. Left to right, top to bottom. Okay, I'll just go let you do this. There we go. I should have had you do that on the mic for everyone. It's my life. It's my life. Yeah, it's tricky. Yeah, so there we go. We just broke this cipher as a group and you guys had much better intuition and kind of paths to go with this, right? So you can see how this is kind of a choose your own adventure or puzzle and you can try different ways of going at the back track and figure out different things. So it's really important when you're trying to solve one of these ciphers to look at it and make sure you understand and you're very clear about what assumptions you've made to get to where you are now, right? If you keep getting stuck and stuck and everything seems gibberish, maybe you made a mistake with either the period or one of the other steps or maybe you're coded wrong to do the decryption. This is what I've seen before. So you should, one of the best ways to practice this is if you have some visionary cipher algorithm take your own plain text that you know exactly what it is. Encrypt it with a key that you know exactly what it is and try to decrypt it by hand or with your tools. If you can't do it that way, then you're totally, something is very wrong and you'll never be able to break it if you can't break something that you know what it is. Cool, questions on this? All right, cool. So the other type of cipher that we've talked about before, so we talked about substitution ciphers which is what we've been looking at, substituting one letter for the other. We can also do transposition ciphers which basically essentially change the letters in the plain text, in the plain text to produce cipher text. So you're doing no substitutions which means you have this very nice property where you have the same one gram frequencies, right? The letter distributions are exactly the same because it is the English text that you have. It's just moved around. The letters are moved around. What is going to be different though? Say it again? The position of the letters, yes. Yes, exactly, with respect to the other letter. So one gram frequency is one character, two gram frequencies is the frequency of one character following another. The three gram frequencies are the three in the sequence. So all of those get mixed up and broken when you shift characters around. And you can look and analyze, just like we analyze the one gram frequencies for English, you can analyze different n gram frequencies. And the interesting thing is that the index of coincidence is going to be exactly the same as English because we haven't done any substitutions. So we can think of, there's a number of different substitution ciphers, or sorry, transposition ciphers. So the idea is we could break a, so a simple one would be break the message into blocks. So blocks of letters. And the key is how you transpose and move around those blocks. So you could have a key of three, zero, two, one. So this means our block size is four, so we have to break the characters into blocks of four. And then the way this key works is the zero with character will go to the third index. The first character, I'm using zero indexing, I hope that's okay. The first character is going to the zero index. The third character is going to the two and the fourth character is going to one. So doing that swap and doing that swap on every single block. So you'll then get something like this. So again, we haven't completely, we've just mixed up the order here. We haven't gotten anything. So we see we moved a to the last one, to spot three. We moved s to zero to the front. We moved you to two in the same spot. So the you never moves. The third one doesn't move here. And the one goes back here. Pretty simple. It's called simple transposition cipher. So how would you attack this? Yeah, so you can use basically the similar ideas and similar approaches when we've talked about work here, right? We can look at each block and we can maybe say what makes sense. Can we maybe make words or something by swapping some blocks around? But what's the key size? So we tried to brute force. So how many possibilities are there here? Yeah, the key size. Correct, yeah. So let's ignore this problem for now. But yes, there is a problem inherent here that if there's extra, let's say our message is always gonna be the correct offset of our blocks of our key size. So what about a brute force? So how many four digit keys are there here? Yeah, yes, why? What's the operation? Yeah, so it's a factorial operation. So the size of four, it's four factorial, which is four times three times two times one. And then with five, it's five factorial. Six, six factorial. So you can think of, I don't know, that increases very quickly. So you can have, with a key size of 13, you have, well, it was a six billion different keys to try. So maybe brute forcing is a little bit more difficult. But we have our friendly, friendly, nice technique that we've been looking at of doing English analysis. And specifically, we likely buy grams and try grams. So specifically looking at letters that are likely to follow other letters. And again, we've used exactly the same techniques we've used before. We'd look at each different block. We'd maybe analyze it to say, okay, what's the highest likelihood of one character following another? If there's queues at Q and U in one block, we'd swap those so that they were next to each other and then start working and going from there. We'll see more of this in the next cipher. Okay. So I don't know why that was messed up. I'm gonna fix that real quick. Okay. So the rail fence, but the problem here, the problem before is that we have these kind of, these notions of blocks, right? So we're only mixing letters and swapping letters that are in one block. So a rail fence cipher is kind of a different, clever way of how to do this. But essentially the idea is you rearrange the plaintext into a different format and then read it off differently. So for instance, we have the plaintext hello world. We're gonna write from top to bottom. So we're gonna do H-E-L-L-O-W-O-R-L-D. Like as if like a, like the fence, like we're putting it on the top of a fence, I guess. And so we write H-E-L-L-O-W-O-R-L-D. So we write it top to bottom left to right and we read it normally. So left to right, top to bottom. So the cipher text then is H-L-O-L-E-L-W-R-L-D. So what's a benefit of this versus the previous approach? The size is as large as the plaintext, right? It's the, our characters can move and our key here would be how many rows and columns do we do? Or how many, I guess, rows do we do? Here we're doing two. We could think of using three and the same technique. We could do, we could read it off in different orders and write it out in different orders. But yeah, fundamentally here we're now mixing and transposing more characters across more of the keys. So an attacker can't just reason about it in terms of blocks. They'd have to reason about it in a bigger way. Yeah. Ooh, interesting. I don't know. Yeah, that's an interesting thought. I don't know how that would impact. Maybe if it was in your language, like that language would be particularly bad to do it this way. But I don't know if right to left, top to bottom writing would impact this. That's interesting. I bet you could do it in the opposite way. That would be unexpected. Cool. So these are kind of simple things and this is something you could just do on paper, right? This isn't a complex algorithm. It's gonna definitely do that. Cool. So how do we decide, so more to high level, so we're given some ciphertext. Maybe we don't even know what algorithm is used. How could we try to determine that? Okay. So for the previous example, yeah, so we could H-L-O. So A, we can know that it's encrypted somehow because it doesn't make sense, right? Yeah, so we can check the one gram frequency. So we can look at the distribution of letters in English. If it matches 100% with English or matches very close to the English, but we can't read the message, that means it's probably a transition cipher. If not, then that would tell us that it's a Caesar cipher, or sorry, that it's a substitution cipher. So then, okay, if we know then it's a substitution cipher, then how do we go farther to determine Caesar versus Vigenere, yeah? Yeah, so we can, exactly. So we can actually just easily test Caesar cipher or even just brute force a Caesar cipher and see if that works. If it doesn't, you know it's not a Caesar cipher. We could use the index of coincidence. We can use correlation. We can use our statistical analysis. We can look at the frequencies. We can try to, anyways, so these are all different ways that we can use to try to determine what it is. In real world crypto, they actually use oftentimes XORs instead of shifts. So why is that? Somebody remind me what is an XOR? Exclusive OR, so what does that mean? Yeah, yeah, so XOR zero is, or is that the only difference? Yes, so XOR zero is zero and one XOR zero is one, zero XORs one is one and one XOR one is zero, right? So why do you think they use this so in, so okay, we talked about what it is. Why do they do real systems? They use this more than a shift. Yeah, it's true. It's very fast for computers to do this and it's very, you don't have to worry about numbers being zero to 25 and transferring the ASCII value to the number whatever you could just do in XOR. You XOR something into a value and you XOR that same value back and you get the original result. So okay, yeah, and I'll show you an example. We'll do this really quickly. Don't implement your own crypto algorithms. This is what I've been trying to say for a long time. There are plenty of ways to get this wrong. There are side channel attacks. So there's all actually crazy research that they've shown that so they showed that if somebody can get access to the power readings of your system while it's doing some encryption operation, they could break the key that way because different operations on a CPU use different amounts of power and you could figure out what those operations are by the power usage just outside the server and they went even crazier and used the sound that the fan makes because the fan noise is correlated with the power draw which is correlated to what CPU operations are used so they can actually break just by listening to the fan noise they can break your cryptographic keys just by using that. There's all kinds of techniques now where if somebody's executing on your machine they can use a shared cache between your processes to try to determine what your keys are. There's also crazy things like timing attacks. So this is a very classic one. It's why it's incredibly difficult to write crypto operations because the amount of time it takes for a correct operation versus an incorrect must be identical although if that leaks information to an adversary so you can break crypto systems if they fail early sometimes or if some operations take less time than others. So I wanna tell you one example before we go of a, this was a DEF CON QUALS 2011 challenge that I worked on. It was called the binary leatness and the score was 300. So it was a tar archive. So it was just a zip file basically with a .dex file. So let me know what a .dex file is or was. It may do Android development. So it's basically an Android app and JPEG S. So the S ostensibly stood for secure. When we looked at this and this is insane that this app still exists. So they used a real app that's on the Google Play Store. It's a free version and it says that it encrypts your photos with a, it does encryption on your photos. What it does is it uses an eight byte key. So eight bytes and it does XOR. It's essentially a vision air cipher that did eight bytes over repeating over your picture to save out this secure version. And your goal was to find out the key and so we broke this because you can do a certain thing like you know certain fields in a JPEG are always the same. They're magic bytes based on the structure of a JPEG. So you can use that to break and figure out I think at least three or four of the eight bytes. And then the rest you could actually just brute force until you generated a valid JPEG file. And then when we finally broke this it like the picture was a picture of a whiteboard with the flag written on it. And this still exists. Is this app that you could go pay for and think that you're getting securely storing your keys or your pictures when it does not. And so it's really sad. So the free version does not do any real encryption. It says it does. But you can pay money to get one that actually uses allegedly real cryptographic operations. So people still use this stuff. Don't do it. Don't be that person. When we get back we're gonna talk about modern encryption systems. Hey, the recording.