 All right folks, first up, a couple announcements. First thing, some of you in here have come to talk to me and I've talked about the cybersecurity concentration and the scholarship for service program we have specifically for cybersecurity students. We're having an information session on Friday. Is it coming Friday? Is that right? Yes. I will post more information on the PANZA Friday, 3 again at 4.30 p.m. UAC 270. We're going to go over the cybersecurity program, how the job market for cybersecurity is insanely hot. I think they said they have a negative unemployment rate so basically they can't fill all the positions that they have. We'll talk about our cybersecurity for service program and it will be available to hang out and talk. One more information on scholarship, cybersecurity concentration. Please attend. Any questions on that? You can. Yeah, so it's basically the scholarship pays for a year's worth of tuition plus a monthly stipend plus book support and in exchange for every year of support you pledge to work with the federal government for a year. So it doesn't have to be the FBI to be FBI, NSA, CIA, actually any government agency federal agency will do. So a lot of people go to a lot of different places. So yeah, we're looking to support people. We have funding from the National Science Foundation to do that. So that's interesting to you. If you'd like to see us out here in this course, please attend. We've talked more there. Any other questions? All right. I made this announcement on PANZA. I'll make it again in a year. Unfortunately, no office hours can be today. But surprise, we don't have a homework assignment. So I assume none of you were going to show up anyway. In case you were. If you want to meet or you need to meet, you know you can find a new time, better time just for today. Let's see. There's something else. What do you think about this second assignment? Are you all experts? C++ code readers? Did anybody learn it? Yeah. I just wanted to say that was the most fun kind of homework assignment ever. Cool. Which part? The bandit or the breaking stuff? Both of them. Both of them. Cool. Noted. And you're not just saying that to try to get easier homework assignments. Did you learn anything new from reading this other C++ code? Yeah. Anybody? Yeah. I definitely learned new stuff like the string string. I had no idea you could like use that to parse a string and then use the arrow operators to extract things. That was actually pretty cool. It made the C++ part of saying really nice. Yeah. The policy admitted that we didn't have to break it. Were they from like this class? Some of them. Some were faster than others. Yeah. I took a sampling of many things. Some from us. Was there any shock to find your own sample in there? All these 30 bugs added to it. I'm not certain that I would say they already happened all, so I'd say them. We may do some more. I think it's a super helpful exercise that we get a little bit more complicated in the past. Cool. All right. Let's rock and roll back to crypto. Great stuff. So here we are. We're giving this ciphertext. How do we break it? What are the steps we're going to take to try to break this? Look for repeated characters. Why? Yeah. Do we know what kind of encryption it is yet? Technically no. We're just giving this, right? Of course. You all know here that we're doing this in the context of the internet ciphers. So it's highly likely it's encrypted with that, right? But how do we check and try to see is it a Caesar cipher, which we've seen, or is it a denoted? Man, I lost the name. I always have to refresh my mind. I'm going to ignore it for now. Or this other cipher. Visionary. There we go. French pronunciation in my head. What was that? Physical analysis. Physical analysis of what? Letter distribution. So we calculate the frequency of the letters in the ciphertext, and then do what with that? Yeah. So the cipher should maintain that shape of English with the correct peaks and valleys, whereas what about in this case? Yeah. It should be closer to equal distribution, right? So depending on the key size, all those kinds of things. Cool. So we can do that. We can say, OK, we think it's not a Caesar cipher. Maybe it's a Visionary cipher. So then we can look for repetitions in the ciphertext, and specifically why are we looking for repetitions in the ciphertext? Great. Yeah. Indicating repeated words in the, not just the ciphertext, but where the keys aligned exactly the same over those repeated words, right? And we're looking for that. And because that's going to tell us, hopefully, what? Forgive us information about what? The period. The period, right? The size of the key, right? So we're trying to figure out the size of the key so we can break this thing. So we went over this. We went over a lot of the repetitions in the ciphertext. We all spotted the largest one, O, E, Q, O, O, G. And we looked through all of these, and we tried to think about different periods. And of course, in different key lengths, we may be wrong, right? So we're just going to start with the first one that we can kind of think of. We'll start with six, and then we'll work through that, see, and we'll have ways to kind of check our guess to see if it works. So we may have to do some kind of backtracking here as we do this, right? It's not necessarily a deterministic process of you do XYZ, then you've broken the crypto system. Cool. So we can use another statistical measure talking about statistics, right? We can use this notion of an index of coincidence. So in a sense, this is the notion of if you randomly grab one letter from the ciphertext, what's the probability that the next letter you grab from the ciphertext is the same letter? So what is that telling us about the ciphertext, or about any text? Because we're going to apply that to any of them. It's just the difference between like two characters within the key, like that difference, and then the difference of like that first key, that first character, the second character in the original string, and those differences added together. Okay, let's think about it. Well, yeah, let's think about just what is this just actually mean, right? And doesn't we can take it away from ciphertext, right? We can say, so if I had some ciphertext that was, let's say, until forever, right? Infinite. So I draw an A, what's the probability that my next one's going to be an A? It's 100%, right? There's no other probabilities. Now, what if this was completely random? Okay, this is not completely random, so I'm drawing it in order. What if I just repeated like this through Z, over and over, infinite number of times? So if I grab an A, what's the odds that I'll grab an A next? 1 over 26. 1 over 26, not 0, right? Because there's, assuming it repeats forever, right? I grab an A, that means there's an infinite number of A's, but the distribution of each of the characters, right, is 1 over 26. They're equal likelihood that the next character I'm going to grab. So I can see that, you know, and this is the, and this is about just the letters. So we're thinking about the index of coincidence for just the letter A, and the first one, it's highly skewed in the second one. Then I can calculate that for every character, and for, actually, I guess this is a very poor example, because I think they have the same index of coincidence. It would be 1 for A and 0 for all the other ones. Here it would be 1 over 26, 1 over 26 for, let's say, A, B, C, so on. But in some sense this gives you some notion of, look at how random is the text, and it's squishing it all in one value, yeah. Is it 1 over 26? Or like, if you had like a whole bunch of A's, like 50 A's and 1 B's, it's still like 1 over 2? So that's a good example, okay. So let's do that, we can probably calculate that pretty quickly. Let's not do 50, because I think I'll have to draw that one. Let's say, what was that? 4 A's and 1 B. So what's the index of coincidence of A? So you grab an A, 4 fifths, and what about B? 1 fifth and C, 0, right, so on. So yeah, so this, and then here when we're calculating these numbers, this is summing all of this up. So we would assume for completely random text, yeah, places. That's a good question, I think was sufficiently large text, it doesn't matter. Which is what we'll go with for now. But yeah, so we may need to tweak our probabilities there, because we only have five characters. But the purposes here, we're looking at really, so large text, and we can calculate this, and it's going to actually try to give us some high level notion of what the period guess is. So this could actually help us to say, are we completely off? So if we ran this, and it's large at 0.038, we'd say, wow, that's really close to 1 over 26. So we'll make this text more random than we think. Why is, let's say, 1, 0.066? It's more likely that it's going to run to the same original text being encrypted, but the same key. Right, so at the period 1, how large is the series size? 1. 1? Which means what's the distribution of characters like? So it'd be the same as the Caesar site. Yeah, so it's the same as the Caesar site, which means the character frequencies look something like this. Right, so if you randomly pick from this distribution a character like A, the likelihood that you get A is skewed by the frequency of the letter distributions in English. Right, and so you can calculate, and we'll look at how calculate the index and coincidence, but the idea being that you can calculate this on a, let's say, all of English, or on this type of frequency, and then you can calculate the same thing here. So you can actually use this as another way to verify or try to check. Again, if we're looking at differences in statistical matters how to do this, we do way to check if we calculate the index and coincidence in a sum of like 0.066, or roughly around there, and maybe it's a Caesar cycle, right? We have other ways to check. We also look at the distribution, which is much more information than we're just looking at the single value. So on the previous slide, we had those like A, had them 4, or 5, or whatever, whatever. Are each of those indices of coincidence, or what is the index of coincidence in that situation? Yeah, so okay, but it's probably helpful to look at computing. So what we're going to do is, the negative one makes this a little weird, but, so we are going to, I say, sum of all of the characters I, we'll just call it I. So the frequency of, the frequency of I times the, alright, I don't want to move this down, sorry. So for every letter I, we're going to calculate the frequency of I and whatever text we're dealing with, that frequency of I times the frequency of I minus 1, all of that divided by the number of characters in our text times the number of characters in the next one. So we're calculating this for every, so this is the measure of exactly that probability that we just calculated, right. So choose one, what's the probability that you choose something of the other. Well it depends on the frequency of that letter times the frequency of I minus 1 over the number, total number of letters times the total number of letters minus 1. And we sum that up for all the characters essentially voiling this entire distribution down to one number. There's no way of looking at it, right. So rather than looking at this distribution because of what you think. So this is why, this is a bad, these two examples are very bad. So what's the index of coincidence of the completely random text? We know it should be like 1 over 26, right. And this, if we sum all this up, you'll have 1 over 0, 0, 0, all those 0's cancel out, so it's going to be the same thing, just because I messed up the way these distributions are. So we can calculate this, and this gives us some notion so that we can use this and double check our estimate. So we can use this to say, let's calculate this on our psychrotext. And we can do this, right, it's just counting, right. So for each letter, yeah. I'm not really seeing what the value of this would be, but there was only unique, like, characters. Would that be 0? Or would that be 1? Or something else? If there are only unique characters, yeah. That would possibly be... That's interesting, yes. I think it would be 0, in my case. But again, that is kind of a pathological case that's probably very unlikely to happen and stuff that you care about, because there'll be enough letters that, assuming you want the 26 letters, you'll have stuff that repeats, right. Yeah, and you'd have zero coincidence, right. There's no coincidence, there's no repeating letters. Cool. So we can calculate this on our psychrotext and we get something around 0.043, which when we look at the periods here, where does 0.043 kind of fit? With kind of like... Yeah, of course, these numbers, why are these numbers so close? Yeah? Yeah, so those are the things that have to, like, different periods. Because n, from what I gather, is the entire length of the cycle. Yes. So how do we apply this for different periods of that cycle? Yes, so if I was calculating this, and I did not calculate this, but you can look up the index of coincidence for different periods for vision and air ciphers so you can use pre-calculated values. You could calculate this actually for these different keys. You can randomize that, right? So actually, you can probably root for something to 5, I would say. There's probably better ways to do it than in all statistics, right? So you can calculate this for every possible key. It's going to shift all of the outlets in different ways, and you can calculate that for every key. Does that make sense? So, like, let's say we were doing a period in 1. Sure. So for period 1, so A for period 1, you wouldn't have to do anything because you know that period 1 is only going to shift the frequency. It's not going to change the actual rate of the frequencies, right? It's only going to shift the frequencies, so A's frequency is now going to be at D because a key of size 1 is only going to move the frequencies. It's not actually going to change the values of the frequencies. So you can use a value that you computed for English text and you can use a sufficiently large value to calculate that. Alright, so then that gets me about 0.2. So then for 2, to be perfectly honest, I'd say you use statistics to do that because you base it off probabilities instead of, like, the probability distributions instead of calculating it based on the number N here. But I don't know that that's 100% correct. So you sum up over the probability of seeing the letter B in English text. Yes, exactly. Okay. And then you can, I'm sure you can figure out how if A was shifted to B and you have every other letter in English, like the probability distribution is going to each Yeah, I mean if I was doing this, I'd do an honestly sufficiently large text and I'd start with that and that's what I'd use to calculate that. So is this average over every key? Yes, this would be every possible key size, exactly. So this will be the average over every possible key of this because it's sub-treated? Yeah. And that's how you get these values. I'm sure there's a better way to calculate this. So in that case it would be like the situation would be like that's a lot times F of minus 1 Yes. No, no, no, because that's calculating the probability of pick 1 then what's the probability of the next one to the same? And that depends on the frequency of those. And so does the incident of the number just go down when you have larger periods? Because it's getting closer and closer to random, right, because assuming if you had, let's say, an infinitely large key size and you assume those are all random then essentially each letter in your text is being randomly selected to every other letter. And so your distribution would be completely flat. It's more important to know how to use them and be able to calculate them on text that you need. So yeah, the derivation is not as important as understanding the intuition of what is this trying to capture from the frequency and understanding how does the frequency change from size keys from that standard English frequency chart from this, right? So understanding how when we have cethercypher, right, the cether shifts and that's why it's very easy to break because essentially the one gram character frequencies of English are still there. And then as we have more and more longer size visionary keys, how these will approach basically a uniform distribution. So this index of frequencies is not calculated by case by case this is something with standard for all. These values that I'm presenting here are calculated across all. So these values here are all pre-calculated. So these are given to you, right? But you can then use this formula to calculate this on text of your choosing, right? Your cybertext to compare this to these values is to see roughly where does this fit and how does that match your guess, right? Because if you guess 6, but the index of frequencies was, let's say, significantly larger than you'd say something's really wrong here because it's significantly lower it's just a good way to help check yourself, yeah. Do you just mean that it's 0.03 more? Yeah, I would say so for here, right, so over here 0.043 So you just use this as kind of a double check. Okay, I guess the period was 6. And 0.043, which is right about at 5 maybe slightly greater than 5 so yeah, that actually matches with what I'm thinking about. So if it was like 0.04 like 0.04 would that be considered like 2 often? I'll try it probably, but you'd keep it in the back of your head, right? If it was 0.65 you'd say something's weird, like why am I thinking I'd say either direction right? It's probably not going to be lower than 0.038 because there's some noise in here, right? Because you only have a sign and not a text so there's going to be some noise but this is just a nice way to double check and say am I on the right track without doing the whole gap? So if a relatively large key is used are you out of luck as approaching that 0.038? What if the size of the key is the size of a cybertext? Then there's no, there may be repetitions but the key won't ever repeat over itself so you won't have any repetitions the key size is incredibly, so the exact size, right? This should be roughly the same here as 1 over 26 and then how many different alphabets will you have to solve for? What was that? I was going to say N, to the N, right? Sorry, not alphabets but sorry, that's how many keys you have to try but alphabets would be N the size of your cybertext which means as you're brute forcing that you could get it whatever you wanted right? Because you could try keys that would match every possible word or phrase that is that long or sentence. So we'll actually see that, this is called a it's going to be, you notice that coming up, but we'll talk about that later it's called a one-time pad where if you do that and the key is actually randomly generated and it's not a word that we can brute force which we talked about and it's as large as a cybertext and is sufficiently random and it's never reused then it's actually perfectly secure. Well what's the problem with that? Key every single time. Yeah, you'd have to communicate a new key every single time, that's how big. Same size as the message that you're actually trying to send them. Yeah, or larger. Yeah, so you could send them a message that's larger, but you have to be certain you never reuse the key right? So this is the main problem like well at that point why not give them the message you're already doing this but there are circumstances where they actually use this, so I think it's the there's various times I think with like either nuclear codes or whatever that they exchange like a fact book like this that's just like a random keys and they use that to communicate to each other and it's perfectly secure, it's a one-time pad and I think they rip out that page so they never accidentally reuse it again but yeah, that is one way that you can do that, but think about if you wanted to let's say securely transfer a file to your friend, that's a one-gigabyte file, you would need a one-gigabyte like key that you would exchange in advance and then later you could securely send them that one-gigabyte file. So we're just using this to test a little bit so we can test this now that we've guessed that the period is 6, what do we want to do with the Cybertax? Cybertax, right, so split every character into different alphabets 1 through 6 and then let's try to do that so we can do this and another nice thing we can do is we can again use the index of coincidence and calculate this on just one alphabet or each of the alphabets so if we're right in our guess, what would we expect the index of coincidence to be? Right, point 0.066 we expect it to be around here because we've now split up the Cybertax into different alphabets that are each encrypted with one key it doesn't know the reason why this is useful as kind of a self-check so we can split this into alphabets, we can calculate that this is everything that's encrypted with the first letter of the key or what we think is the first letter of the key the second letter the third, the fourth, the fifth and the sixth it's a lot of variation, why? yeah, so we have really high values 0.124, but the highest before was 0.066 or something and we have low values of 0.043 so why do we see such high variation here? would that mean that our period length is wrong? it could mean our period length is wrong what else could it mean? like normal English, but the alphabets are not really normal, it's just a bunch of different characters yeah, it could be weirdness that came in with every fifth character that's possible what else? yeah, the sample is just got cut in sixth, right? so now instead of our massive text that we have which has a decent distribution I can't remember how many letters did we say was it in this line? was it 30 or 40 or 50? actually a decent amount of letters each of these alphabets has a sixth of that so we'd expect a little bit of variation and repeated letters that appear in there by coincidence by chance will affect our values so again, this is where it's a little bit of art, a little bit of guesswork we have to be okay with going backwards if they were all around 0.043 or four out of them six of them were in like that that's probably a massive problem this doesn't make sense a lot of them like the first three are right around 0.066 which is one we have this one that's close-ish and this one that's much higher which is weird, but okay, we can deal with that and the last one is kind of a weird one but we can say we have six, we have five that are pretty close it's not bad if they were way off we would know something was really weird which we would expect if we were drawing these, if we did five and spent and calculated those hopefully we would see that they were very different so now we can break each alphabet we can start by trying to break each of these alphabets what's the key difference that we talked about from breaking each of these alphabets as opposed to a Caesar cipher yeah yeah, so even if we break, so A, the easy thing is we could just calculate the 26 possible values of a Caesar cipher try all of them, brute force all the keys until something makes sense if we just did that for an alphabet one, would we be able to figure out and say this made sense by the way, it's actually very difficult because it making sense quote quote relies on all of the other characters so we can use all these kind of techniques and we're going to try to, you know again, this is kind of the puzzle solving aspect, right, we're going to use multiple techniques at our disposal the other thing that we'll do is I kind of like this approach so one thing we could do or I mean, one thing we could do is we could calculate this for every one of the alphabets and see which shift makes sense it's a little bit difficult to do and why is that yeah, the small sample again, right, because here we have I should probably count can somebody just count 1, 2, 3, 4, 5 20, 21 letters in each alphabet so we're not even going to count a sample from everything so it actually can be a little bit difficult to try to match this here but what are some interesting things about this graph? there's extreme peaks so this is with if you play the second show Wheel of Fortune right, you know like the popular letters yeah, there you go, see so what we can do is rather than think about it, we can actually kind of in some sense compress this down so we have high peaks, what do we also have I would say valleys or like close to zero probability letters right, very low so we have some letters that appear with very high frequency, some that appear with very low frequency and some that we can say appear with medium frequency just kind of in between, right so we can compress this somewhat and say okay we have some high some medium and some low and what other information is contained in this graph that we can use so we talked about the vertical right, we can say some of them have high frequency, some have low what else is in here an average distribution for a good set of letters like B, F, G F, P, U, W they're kind of like about the same level B, F, G yeah, so they're about the same level we can call those a medium though we have some low and we have some highs yeah, I also think the difference between adjacent letters like E and F is that he's got a dramatic more frequent one and not just, so you have E and F, right but what about E and G what about E and 8, well specifically what about E and J right, so how many letters there are in this one, two, three, four, five right so there's actually also the distance between highs and lows right, that there should if we look at the probability distribution there should be about a five letter difference between a high of E and a low of J going backwards there's also going to be one, two, three, four, five letter difference between E and Z right, so we can think in both directions right, the highs, mediums, lows and also distance in some sense from each letter from the other because remember our distribution should follow that so there should be in each of these alphabets those same distance between those letters so there's just another way of looking at this and we can then try to there we go so then we can try to solve each alphabet by here so what have I done here, so this alphabet one I have A through Z what am I doing there with those numbers yeah, just counting right the frequency of each letter and this is not even I'm not dividing by anything, it's just the wrong number of each of those right, and I can do this for every alphabet that I have, I have six of them right, I can create something actually from that graph of the letter frequencies that I see in that graph right, I can say well high, medium, medium, medium high, medium, high, medium, high, medium high, medium, low high, high, medium, low low, low, low, low, low, low kind of all then, so now how does this help me I can try matching each of these and see what shift matches the alphabet here right, I can kind of try that to help give me some guesses so, what about the first alphabet, keys are frequent what else, A's what else, what's infrequent Z's, Y's X's, W's B's so what shift should we try that's a what, I mean how low do you want to go so should we shift B to A and shift everything back one so now we have three C's it almost looks good as it is so anything that says our key can't be A and then the six character key is it bad so we talk about Caesar ciphers we probably don't want to ever use the key A because why it's exactly the same, there's no actual shifting being done but here with the six letter key is there any reason why one of the letters can't be A we surely wouldn't want our key to be A, A, A, A, A, A, A, A, A, A, A, A six A's right but does that mean that one of them can't be A So maybe this isn't shifted at all, the first alphabet. So what? A is zero. So the shift. So A is the zero character which I'm having in that thing. It looks like, I mean, the key to zero, zero shift. No shift, what's that? Right? So like A maps to A, B maps to B. So if it was B, A would map to B and it would all shift one over to the right. So that's actually why we talked about Caesar Cypher. We had that the key could be zero to 25, right? For a similar reason. It's like here, we look at a visionary Cypher. It could be zero to 25 for as many characters as I need. Cool. So this, I just got a nice, so we could just say, well maybe we've gotten the first alphabet. Okay, what else would we look at? To the highs of K, U, and W. Number two, the highs of where? K, U, and W. All right, number two of K, U, and W. Yeah, four zero, four, and that one is the highlighted piece to me because you don't, I always want to go back to the main alphabet. I don't know if it was the closest to together highs because that seems like a very stable key here. Right, so you could have that four zero, four at the end of two. Maybe it's better if I have to draw instead of just saying it for the fourth. Oh, is that at the bottom of the regular alphabet? Yes, so this is the frequency distribution of the alphabet. Yeah, where it's three highs in a row, I have four over four right over that, and you just say that maybe there's a chance that one of our high levels isn't there. Okay, so if we shift it, let's go with that, right? So if we shift it back, let's say one, two. Is that right? One, two. So we shift everything back two. Let's see, how's that going to affect us? So this three goes back to i, this goes back to, there'd be zero a's in there. What about the five zeroes in a row? Could we shift that towards the end? One, two, three, four, five, this. That's a little bit longer. And then shift it one, two. What, this all the way, this all the way is e? Yeah. Let's shift up. One, two, three, four, five, six, seven, eight. So eight forward. So I would shift w two, one, two, three, four, five, six, seven, eight to e. Yeah, I would put four on here. I'd put four here, and then two back from that. So we'd put four on c, which has medium frequency. And then I would put one on a. So one, zero, four. What is what? I think that's the three on t. The three on t? Which looks pretty good, I feel like. I'm calculating that. It's s, isn't it? So it's also supposed to be a high number. It's s and equals. Yeah, that looks really good. All right, we can try here. So here we'd say that q is probably a, or that's a to a. Here we could try this eight. So it's going to be a to, what about the third one? Yeah, so we're trying to block a zero to the end. It's kind of a similar trick. Yeah, so we can shift all my writing on this. So we can try whatever, and it may not always be clean, right? Why? Yeah, maybe like here there are six zeros. So we need to figure out if there's still two or two on your side. So anyways, so we can do this, right? We can see how we can leverage maybe trying here, moving this five back to, so it matches there. And then here, just move that five to m from o to m, which is a medium. Or we would try other ways of shifting that to see how that affects everything. So this gives us some stuff to try, right? And maybe we're wrong and we can start putting it together and trying to piece that piece by piece. So whatever we're doing this, you could do this kind of in, so we can see, let's see, look at some guesses that we made so fast. So i to a in the third alphabet. So i shifts back to a, and I believe that will shift all these zeros back where we expect them. So then I think this was actually what we had just said, which is cool, for three. We can try v going to a for the sixth alphabet, sixth of v here, and that actually matches up all those zeros. So we actually got it, you can see a lot of mileage out of dealing with the string of low values and shifting them to the end, right? So that was actually pretty useful for us. Yeah, this is that same third one that we said. So if we shifted i to a, that would shift all of these zeros all the way to the end, which was the guesses that we could try here. So now we can substitute into the cybertext. So we can do these shifts on, let's say we just broke in, where we guessed these three alphabets, the first, the third, and then the sixth. And everyone can see the bold here. Every first, third, and sixth character is bold. So you can leave that in mind. The first, third, and sixth. The first, third, and sixth. Okay, so now, and we can kind of keep doing that. We can keep playing with that. But here we got half of it guessed. So maybe, what can we do now? Yeah, there's the CK and the third thing. You can guess what there's a vowel before it. CK here. Yeah, so we can maybe think that there's a vowel here so we can use basically clues based on what we know, right? And what we see here that we've guessed, we can say, hey, maybe there's a vowel which could help us with shifting. We could see, let's see, there's, yeah, so there's a lot of different things we could do. Obviously there's clearly a track that this is taking here. We could take all that, and I'm sure we could solve this as a group because you're all very, very smart. So, we can look for clues. We can maybe, and this would be one thing, I'm trying the last line, maybe AJE. We could try here that looks like the English word R which would be something that we would actually expect it's a frequent word too. So we can maybe try mapping A to S in the second alphabet. And we can look it back here at our frequency analysis. It doesn't make sense. Mapping, shifting, all of that this way. Kind of hot. Those are both frequent, three letter words, right? So we're kind of making some kind of progress. G-O-O looks like it could be good maybe. Which one? G-O-O to the fourth column, third row. Here. The upper one is right. Ah, sorry, sorry. Okay, so yeah, good. Yeah, so we could say G-O-O. We could try good here. What other things? Yeah. But T-H-E? Yeah. And especially if this then P maps to F and that shifts both of them the same amount. None? Yeah, so we're done. So the other thing that's tricky with these, right, is adding these artificial spacings of your fifth letter kind of messes with you trying to put these together so you have to remember that when you're looking at these things. None? Rick. I mean, you can also look for other things. If there's a T-H and then something, you'd probably think that's the top. So you try shifting that next one to an E, which I don't think we have here. There's an H-E. Where's one? There's an H-E. The fifth one down, first column, you have an H-E in a row, so maybe try shifting the H to a T. Oh, this one? Yeah. So maybe you can try shifting this T to a T. Or the, it's H-H-E. Oh, sorry, this H. Okay, yeah. How do you really tell the difference between these on this slide? Okay. Yeah, we can try that. Another thing, and this is kind of enough. I'm not sure how you would know this. You would try all these things. You could maybe say that the last line suggests an NA for an additive, so ICAL. So you can try shifting the fourth alphabet O to A. And you could get something that looks getting more and more close. You're saying words, good, so that also shifts into good, but we've got that there. And what are some other things that we try to guess about? So now I have a nice letter here. Why do I have a nice letter here? Because you know that's the U. Yeah, so U, Q, U almost always follows Q, so we can try shifting that. We can also brute force the last alphabet. Basically, we're already there. So it's just a silly, limerick thing, I don't know. Questions on this? We just broke this cipher without any knowledge in the case. So this ciphertext, yeah. So when you shifted the third and the sixth alphabets in the beginning, why did you stop there and not try to shift the rest instead of trying all of them? Just to show different ways of, like, after you've shifted, after you've guessed a few of the alphabets and starting to put them together rather than just relying on the frequencies. That's a nice way to help guide you at the start. And then I would actually probably do both. And I would say, okay, well, what's the right shift? What are the frequencies saying? I should try for shifts. And then what does that actually do if I think that the first two or I have the first three alphabets? And then I would do this. When you start breaking them, it becomes easier and easier. Is there a way to automate this? It seems like a lot of it is non-sensitive. Good question. So how would you automate this? Do people tell me, yeah? You know, they have a dictionary set up, basically. Keep doing my guess and check type work. Okay, so you do guess and check in some sense, right? So you could try. So again, one way to do it, you'd have some way to verify are you correct, right? And that could be dictionary. But I guess it's kind of tricky. Are you going to look for all the dictionary words in your text? Right? It's got to do with that. Yeah. These are like, what do you call them? Yeah, you can actually parse the dictionary for different types of. So that would be like a weird three or four gram, I think. You could look for very common three, two, three, and four letter words in English like those could help guide you to tell you when you're correct. And a lot of this is based on the idea that you know the probability of the words in their quality of an output. Couldn't you just like really destroy this method of, you know, part by doing something like, okay, we're just not going to put vowels in places where it's obvious what vowel goes there. And then we're just going to add to the end of different verb alternating Q and C. Okay. So Q, X, and Z or something like that. End of every word. And then to a Q and I, you might be able to look at it and distinguish it. If you do a Q, you shift everything. Then to the Q and I, you go, okay, this isn't exactly easy. I like getting ideas. I see what they were writing in this message. But it would completely destroy your probability of things. Yes, in some sense. Although the interesting thing to think about is that, you know, is this now part of your algorithm? So are you assuming that somebody knows because it's part of the process of transforming plain text which doesn't have those characters to cipher text, which now does. So you're inserting characters. So one thing you think about is, is this just part of that thing? The other thing to worry about is, are you adding additional signals that maybe somebody could use, right? Because they can see, hmm, there's this similar or the same character is repeating every two to five. So you put in the distribution between those characters when they appear in English. So you look at a frequency of all these characters and you'd say, hey, that's like word sizes in English. But how would they even realize that it was those letters to begin with when you're shifting over those? Actually, in that case, you maybe not don't care. Maybe you just drop those letters and start focusing just word by word. And you use those to maybe break up the... Yeah, there's interesting things you can think about doing where you could hide the statistical analysis here. But fundamentally, somebody can still... You can still use these principles to try to... And, you know, computers are very fast. So you can try reinforcing a lot of things. And you can have tricks to say when it starts making sense, give it to a human so they can look at it and maybe go from there and figure out what's wrong. Yeah, right? So this is sufficiently smart for a computer person. That would not be a great defense. Also, to confirm that you're not like over, you can also check if there's a sequence of things that are just impossible. Right, so, yeah, you could check. That would be a good one. Yeah, check the sequences of things so you could... Because basically, I mean, you've done the dynamic programming. In some sense. Or just backtracking. The idea is you're making a bunch of decisions. Each... So you try, you know, you sort all possible key lengths and you try them in order. Your algorithm will just try them what's most likely the least likely of the periods based on the repetitions, right? You can write a program that finds repetitions. You can write a program that breaks them down into factors. You can write a program that figures out what are the most common factors and then orders the list of periods to try from most likely the least likely. Then you can have another program of exactly what we did and then evaluate and return what's the most likely key for that guess based on these measures. So you could define these different things that define when you're correct. So you could say, you could create a measure of how close is this to English, all this kind of stuff. So, yeah, you can definitely do it. But I wouldn't do it in the sense of... I wouldn't say that it works all the time 100%, but you could say it pops out five answers and then you select which one you think it is. So there is some human intelligence here. We've been looking at ciphers that do what fundamentally? Are there any rotation here? Shift. Yeah, so shift in terms of plaintext to ciphertext, right? So ciphertext comes in and we're mapping one character to another. Other ways to think about it is what if we just rearrange letters in the plaintext? So what's some of the nice things here? So we just rearrange the letters in the plaintext. So we don't have to worry about... So now the frequency of letters are the same, but they're just not in the correct order. So this is actually good. So we get this, but what? So the one gram frequencies, right? So the one letter frequencies are the same. But what's not the same from plaintext? So the frequency of one character following another character. So this is, we could say 2 gram, 3 gram, all the way up to n gram. We could actually calculate this for English. And we use that even when we were breaking the the generic cipher, right? We said, okay, hey, there's a Q. I mean, it's very likely that a U follows it, right? So that's a 2 gram frequency that we're looking at here. So here, we've kept the one gram frequency, the single letter, for instance. But since we've swapped letters and moved letters around, it's going to have different n gram frequencies. And thinking what we've been talking about, is exactly the same as it was in a Caesar cipher and in plain English because we haven't transformed any of the letters or changed any of the letters to something else. So we'll look at this. You can actually take this very complicated, and of course the question is what letters do you swap? In which positions? Because you need some algorithm to do this on any arbitrary size text so your crypto system has to do this. There are many different ways to do this. We're just going to look at just one because I think it's kind of redundant to look at all these different types of transposition ciphers. So we're first going to break the message into blocks of some type of key length to do this transposition cipher. And we're not going to worry we're going to assume right now that all of our messages are the same, are in chunks of the size of the key length. And the key is the transposition of the block. So for instance, so basically we have a key 3021 we're always thinking zero index based. So this means so the key is size four so we split our plain text up into characters of groups of four and now we're going to shift each group such that the zero with a plain text to the third character of the output the first character maps to the zero character the second the second and the third maps to the first. So this key is giving us the order of swapping that. So if we have a message ASU is awesome, we split this up into blocks of four and so what's the output here going to be? So this goes where? The A goes where in the last block so the zero character maps to the last and this one maps here the first one the second one stays where? The same and the first one maps to the first and we can do that for each of them just swapping yeah So the key here says so the key rate has position zero, position one position two, position three so basically take a block that's the size of the key line in map the zero with character of the input plain text to the third character of the cyber text and then map the first character of the plain text to the zero with character of the cyber text map the second character of the plain text to the second character of the cyber text map the third character of the cyber text to the first character of the sorry the third character of the plain text to the first character of the cyber text You drop arrows, right? Zero goes to three, one goes to zero, two goes to two, and three goes to one. And this zero, one, two, three is implicit based on that number and emission. This would be a little bit easier to prove for us, wouldn't it? So long as the key wasn't too long. Why? So what are the possibilities? So say I'm checking, so I need to figure out two things. One, what's the key length, and two, what's the ordering. So if I'm making a guess at the key length is four, here's how many guesses I need to make. I mean, there's four possible for the first one, three for the second, two for the third, one for the last. So I think it's like 24 attempts for a length of four. Yes, definitely. And, right? So we can see the case here. So yeah, there's that interesting factor. So it's not just that the way that the key size impacts the security of the brute force is very different, right? Because in the previous example, we add a new key length. That choice is independent from all the other choices. So it was 26 times whatever we were, or 26 to the power of whatever we were doing. Here, to get the equivalent input space, we need a much larger key that swaps things. So anyway, we could try to brute force it, but it's another way to be very accurate. You could also find the key length pretty easily because given it's a question that won't have to be changed. Ah, okay, interesting. Which letter distribution, I guess, is the question? Because the whole thing, right? Because we haven't changed any character. So all of the frequencies in the original plain text is exactly the same as the cyber text. As you can see, if you have a sufficient amount of key length, you notice several valves in the area and stuff like that. So we kind of know, depending on how long you think words are, how many valves should be in a word, and things like that. Yeah, so we can use, again it goes back to the end, correct, right, we can use distributions of letters so we can see what are likely correct transpositions and we can try brute forcing those. So we can see letters that are likely to be followed by each other and if they're not followed by each other, then we can say maybe something's wrong. Like, they should be swapped so that can give us a bounded key length as in what you're going for there. Yeah, we can try that. What is the other thing we can try? I think this one would really end this up too. You could just pick like a small section of text and just try, you know, some different rearrangements until you've got a really big sense and you don't, you at least have the X part of the key, right, it seems like you could work it that way. Yeah, it's interesting. So you could maybe, and I guess the question is, right, the scope that you look at, is it the, it's dependent on the key size, right? So you focus on something that's smaller than the key size, right, when you just look at these two letters, I don't know, you're not gonna be able to figure out just based on that. Yeah. I feel like this one would be especially easy for a computer to do because they are already simple programs that help with every possible English dictionary word that we made out of seven letters. Yeah, like scramable, yeah, scrams and stuff. Basically, you basically hand them that and then you just assume you look at that, okay, which one of these is most likely, so like even if you only get it as a portion, like, and not, like it's a big number and you think the key is small enough and then you assume you look at that, okay, which one of these looks like a very probable one to know of it. Yeah, and you can see how does that affect the rest of the site, all right, well if, assuming a business, you want the rest of the site to look like. Yeah, yeah, these are all good. It's just like a simple transition site, right? Group force, so you can group force basically key sizes less than or equal to 13, I mean, you have to get up to 13 to get, because it's what we're talking about factorial, right, in the size of the key sizes, so you need a key size of at least 13 to get up to like the six billion tries, but, you know, six billion tries is actually not that much, computers are very fast and you've got enough computers to compute like working on this. You can do that and it's all very parallel. Other ways that we talked about this when we did this was basically in biograms and trigrams, right, so this is actually not, okay, and we'll just go over one example of a different type of site there, so I'm gonna make this a little weird, but basically we're gonna take a letter to the plain text and arrange them in column, like, so in this way, in this case, the number of columns is two, so here we have, so we're gonna have an H, E, L, L, O, W, O, R, R, V, so we have to arrange the letters like this and then our cybertext is gonna be reading the letters in this direction from left to right, so H, L, O, L, E, L, W, R, V, so the cybertext goes like this, what are some of the benefits of this versus the other approach that we just did? There's definitely a lot more possibilities that you can do, you can count that in keys, you can reinforce this pretty quickly, but most people wanna do factorial on the entire cybertext, but that's... Yeah, for a lot of cybertext, right, you look at where does this L come from, right, that came from over the back, right, this L came from all the way to the end of the cybertext, right, the hello world, we could do this of size, you know, depending on how long our plaintext is, we could go three columns, four columns, five columns, right, that could be, that would be our key size, that would rearrange these in different letters in different ways, again, how to attack this, I actually wanna go over it briefly, right, so we can look at anagrams, which is essentially the straddle idea that we talked about, we can rearrange the letters, exploit some pattern, and if it's a Rails friend cypher, we can actually kind of easily check this, but there's all different types of ways you can do this, but again, swapping the order of letters kind of doesn't get you too much, so if we're given just some cybertext, we don't necessarily know exactly what algorithm is used, how do we go about that, like, what are our steps? The ratio of letters would be pretty good, because if it's close to an English, you knock it off. Yeah, so we look at the frequency of letters, we can use some statistical tests that we've looked at to try to say, does this look like a normal distribution of English test, we can say a seizure cypher, that should be easy to test for, right, we've heard four steps, we can look at the index of coincidence, we can look at correlation, we can look at different frequency distributions, single letters, two letters, as many letters as we want, exploiting common English patterns, we can say Q is always followed by U, E is the most common letter, right? These are actually all things we did by using statistical measures to actually do these types of things. Okay, real-world examples, real-world doesn't use shifts, it's easy to break, but well, okay, I'll say this instead, rather than this, real-world crypto usually uses XOR rather than shifts, why? Yeah, so XOR is reversible, right? So you can have your, if rather than shifting the seizure cypher alphabet, you can XOR the letter with a number, and they've got a new letter, right? So work, you can do it in the reverse, so you can go backwards, right, or encryption and decryption are actually exactly the same algorithm. Also, XOR is very fast on CPUs, I mean, not only addition, right, but when you're shifting, you need to worry about modularity and making sure you're staying in the same range, right? So that can actually be slower than usually an XOR. So basically, all modern crypto uses XOR over the hoods to do this. And the super interesting thing is that, well, yeah, we're looking at real-world crypto systems and we'll see they're actually just built up of substitutions and transitions. And that's basically all they are at the root. And they send this again, everybody tries to handle it through the crypto, everybody thinks they're smart and brilliant and they should be right in crypto. Don't do that, side channel attacks. So this is crazy things where people have shown, they first showed that, so CPUs, everyone knows, CPU processor, right? Uses various amounts of power, depending on what it's doing. So they found that you could actually, if you put like a sensor to determine how much power a computer is using, when it's computing crypto operations, you can actually infer the private key from those operations because different operations use different amounts of power. So you could actually break the key just by observing the power. And then they went even further. So is anyone's laptop span currently on? Does anybody's laptop fan ever go on? When and why does it do that? Yeah, when it's over, when it's heating up, when you're computing something, when you're compiling something, running something. So they've actually found that by just observing the fan noise, that's enough of a side channel that leaks the power usage of the CPU that they can recover crypto keys from. In addition to that timing attacks, so this is a big thing that people often mess up. CPUs are very complicated and will often try to optimize different packs. And if it's optimizing based on the private key, you can actually leak the private key based on just timing different operations. So if you have a crypto operation that fails early, when something doesn't match, you can actually use that to break and brute force the key oftentimes. And the timing difference can be very, very small because you can use multiple samples and draw different distributions to determine things. It's actually crazy and sane. I will say one of the cool things when I played Defconn Calls 2011, there was a 300 point challenge called binary weakness. We were given a TAR archive. So just like a TAR archive that we've seen with a DEX, a DEX file and JPEG-S, what's a DEX file? No, not a DEX 30 file. Anybody do any Android development? Never built an app. And nobody, some people? Yeah, so this is a compile Java, like they put one about class but for Android and I think it's maybe for old versions of Android. So it's a compile version of a Java application. And the JPEG-S was like this encrypted file. So you looked at it and it was all basically random-ish. We looked and we found that it was actually a real app that still exists. So this is an app that exists on the app store. And we looked at it and this is an app that allows you to encrypt your pictures and has a free version and a paid version. And this was the free version. So what we found out what they did is they took a picture of a white board that had the flag written on it and then encrypted it using this app. And as we investigated it, the encryption was an XOR eight byte key. So it was a visionary cipher with eight bytes, eight, like the period of eight. And so then we had to use this to break it into those JPEG files. So how do we do it? Yeah, so we know that it's an XOR, right? And we know that the length is eight. We can look for repetitions which is kind of difficult in a JPEG file, JPEG pixels that are our values. How do we know it's a JPEG file? How does anybody know what any file is? No, not the extension, extensions are lies. How do we know it's land? The file header, yes, almost every single file format has a series of magic bytes in the header that tell you exactly what the file is. I don't remember what they are for JPEG but almost every file has this. So what we're able to do, that gave us two bytes so it gave us two of the keys. So we were able to reduce this from an eight byte key to a six byte key because we knew exactly what those two bytes were because it had to be the exact JPEG headers. Then we used various other aspects of the JPEG header and we said, okay, this byte also has to be the same in a normal JPEG file. So you use basically like a known plaintext attack because we knew what the plaintext should look like. I think we got, we broke three or four of the eight bytes and then that way we just brute-forced it until a JPEG reader could read it and then at that point we had the thing popped out of the image and it got a flat. So people really do mess this up. When I keep saying people roll their own crypto, this is real stuff. There's a real applications that is not good. So don't think one of these people used modern symmetric encryption which we'll talk about on Thursday.