 Alright. Cool. Hello. Good to see everyone. You don't seem very... It's the homework assignment. It's the homework assignment? I highly doubt that. There was very few people in office at our desk yesterday. That's because it's Friday. Exactly. Cool. We will go right into it. What was the type of cipher that we looked at previously? What was it? Visionare. And what else? Caesar. Caesar cipher. And what did they do? How did they try to hide messages? It's a simple transition. Offset. Offset? What does that mean? They used a key to offset the characters in the string. Okay. They used a key to offset characters. But what are they doing with the message? Shifting. You can think of it as mapping. They're mapping one character to another character. That key tells you exactly how to do that mapping. The key is three. You're shifting everything forward three. But you're essentially creating a mapping from character to character. Intuitively, why does that work? Or why do we think that works? Or does it seem to work? It looks like gibberish. It looks like gibberish, right? Because it's not the original... It doesn't look anything like English because we're replacing characters. So another type of cipher is basically instead of replacing things, why don't we keep the original letters but move things around? Why is that helpful? Or why can't that be helpful? Not as a vulnerable to analysis over the alphabet? Yeah, so maybe not as vulnerable to analysis over the alphabet? Why? Because I can't look at a string of characters and go, alright, so there's a lot of I's, so I'll just replace those with E's or something. Or X's or Z's, right? So if you look at the distribution of English characters, if you just switch all the letters around, you still have that same distribution. And so this is the key idea behind a couple other ciphers that we'll look at. Basically, a transposition cipher says, break the messages into blocks of some key length. So what was the key length we saw in the vision air cipher? Wasn't it five? What was it, though? Not the actual value. What did that mean? The repetition, right? How long the key is between how long you need to repeat that key in order to understand the shifts for every letter in your plain text. So here we break the message into blocks of key lengths and we're going to move the letters in each block around. So if you have, for example, the key here is three zero two one, so the key length is what? Four. So we're going to break our ciphertext up into blocks of four. So we got something like this. ASU is awesome. We break it up into blocks of four and then we're going to transpose and move around each of the letters. So the way to read this is this will now be at index three. So this is zero index, so it'll be the last character. The next character will be the first character. The third character will be the second index character, which is here. It won't move and I can move there. So we'll do that to every one of those blocks. So did we do this right? I hope so. And then we put it together. So how do we break something like this? Is this secure? Is it not secure? What? Couldn't you take one of those blocks and kind of arrange it until it makes sense? So what do you have to know though in that case? The length of the blocks. The length of the key, right? Because if I was going to send this to you, I'd put it all together and send it to you, right? So you need to know the length of the key. When you know that, then you can try what? You can try rearranging all of them and what are you looking for? Yeah, words that make sense. Why can't you use the way we broke the Caesar Cypher? How do we break the Caesar Cypher? Brute force every offset of the key and then look for what? Frequency analysis. Frequency analysis. So we looked and tried to see what matched English frequency. If we just talked about it. If we looked as the letters then substituted at all. So the frequency hasn't shifted at all. The position of these letters have shifted though. So how do we, so then we can try all the possible combinations? What else should we do? What if the key is really big? Like, I don't know. What if it's blocks of 8 or 16 or 32? Compare the same blocks in what sense? Try to make sense of that. But how do you know what matched to what? So let's look at this. Let's go. Okay. How come all of my stuff is gone? Okay, so we have this Cypher tax. How do we go about breaking this? So what was your idea? Let's say we know it's size 4. How does knowing this block and this block help us? Well, since the substitutions in each block are the same, we could, when we're making a substitution in the first block, we can also make it to the second block at the same time. Sounds like substitution. We'll do transposition. Transposition? Yeah. And then we can make, we can do them at the same time and see, we'll probably see sooner, if it starts making sense, we'll see that sooner maybe. Or if it stops making sense, we'll see that. Okay, interesting. So in some sense, so yeah, so basically it's kind of similar to the Caesar Cypher way, right? Where you're trying to brute force the keys and you try different things to see how that affects things, right? So here you can say, well, what if we swap the first two characters in every one of these Cypher tax blocks? Why would I want to do that? Or why do I think that is a good stuff? Because like it is in the first block, right? So I could do something like I-S-U-A-A-E-A-W-S and then E-O-M-S. But now what do I do? Then you see that the last word is some backwards. Some backwards? So you want to do what? Completely reverse them? Yeah. But then that's also going to mess up your is that you just made? Well, you know, no words to start with. I-S-W-A-E-S-M-O-M. Oh, it's SMOME. Yeah, just close. So it's going to be ASU-I. Instead of that, we swap the E and the S. Which one? Here? Yeah. Swap which one? So it would be ASU-I-S-A-W-E and some. Oh, now you want to swap these middle ones? The middle ones here? Yeah, we can do that too. So why did that work? Or was I guaranteed to have worked? No, we're just trying random combinations of transpositioning these characters to try to get something that looks English-like. So if we go back and we think about, okay, for this cipher, so it's keeping intact the one gram frequency of English characters, right? So what was the one gram frequency? What does that mean? The frequency for single characters in the English alphabet. The frequency for single characters in the English alphabet, right? That was the graph that we saw, which is the one gram frequency of English. So it keeps that intact. But what does this break? The sequence, so two grams. So what two letters? What are the frequency of one letter followed by another letter in English, right? Why is that going to break? Why is this cipher going to break that? Because it shifts those around. It's fundamentally shifting them around. No, exactly, right? So it's breaking what character follows another character and also three grams and all those. So you could think about similar into how we ordered the possible shifts into what is more likely. We can say we don't have any here, but if we had in one of our blocks a q and a u, we could try to rearrange the blocks such that I think u follows the q. Just like we did here, I think that was probably a good first step is trying to do is. Is is probably a common word. We may think that a could be the first letter of this, just the letter a as a word would have been the first letter of this block so we could try different things. But essentially those are the approaches that you take to break these. The problem is this gets very, very large in terms of brute forcing a transposition cipher. So if you're just brute forcing this, a key size of 13, so why is it 13 factorial? That's the number of permutations of a block of size 13. So that is what that 6 billion tries to try. It's kind of a lot. That's 27 billion. Yeah. And then, and we'll get to this in the other example. So then the other way to do it on a bigger key is to use what we just talked about of using the 2 gram. That's a 5 gram or 3 gram n gram character frequencies to try to figure out what are likely transpositions. And you could break it like there. That's one of the downsides of this cipher in terms of security. Yeah. I actually have a question. So what happens if the message doesn't break up into 10 people blocks? So if you had like some extra space at the end? Like extra letters exactly. So you could, it depends on what you want to do with your scheme. You could put padding at the end. You could just put random letters because for the person reading it, they'll be able to understand where the real message ends and the fake message, for the real message ends and this padding begins because they know the exact key. You could, let's see. Yeah, you can't just, so what would happen if I just left off the last two characters here and did this transposition? Then either the last word wouldn't be like changed at all. You would be able to see that the key length is going to be larger than whatever the last chunk is at least. Yeah, so you would then, you have spaces here, so you need to transmit those spaces so they go to the right place. Because if you just had, instead of OM, if you just had, no, not OM, ME, so if you just had OS here and then you're trying to decrypt that, it would end up in weird places because the position isn't constant. That's right, can you just describe how the message maps to the encryption from the key? Yeah, just how you got the encryption from there. Right, so it is basically, so you take for every letter of the key, so for position zero, this key tells you where to map position zero. So position zero gets mapped to position three is here, position one gets mapped to position zero is here, position two gets mapped to position two, let's say the same, and position four gets mapped to position one. Do you notice any other problems with this? What are some problems with this? It still retains the original message. Yeah, so part of it is it still retains the original message, so we can use things to try to determine what that actual word is. There are still words in there you can try it as well. Right, so that's more the Vigram or Trigram frequency, which is essentially a version of that, where you're trying to figure out what the thing is. So if I see a letter in this block, where do I know that that letter came from in the cybertext? Did it come from anywhere in the cybertext? No, just the first four letters. No, just the first four letters, right, or the key length. So this is one of the, a problem here is that you're only shifting around characters in a single block. So there's a slightly more complicated version of this called the rail fence cipher, which, oh, that's weird. Essentially you, so let's do it by hand because that will be a little bit easier, and of course I didn't install the software to get this thing to work, so that would work. Okay, so here instead of writing, instead of swapping around the letters in here, because let's say you took, I don't know, the first four letters here and swapped those around, there's a repeated letter, right? There's two Ls, half of the, those things are the same letters. So you could probably, no matter how that's split up, you could probably pretty easily figure out what that first word is supposed to be. With the rail fence cipher and the idea it comes from like a fence post is you put the words vertically and the key is going to be how you actually arrange this fence post, the fence post. So you would do h-e-l-l-o-w-o-r-l-d. So I'm writing the word vertically and then I'm going to read off the cipher text this way. So the cipher text becomes h-l-o-l-e-l-w-r-d. So how do I decrypt this then? Or what do I need to know to decrypt this? The length of the rail. Yeah, like the length of the rail, like how far I go down, right? So here, excuse me, to decrypt this, you would, and now you could read the message, e-l-l-o-w-o-r-l-d, you could even, you could go further when you want to encrypt it so a different key would be h-e-l-l-o-w-o-r-l-d. And maybe you could put padding characters here too. So why, so then you could read this off and now this cipher text h-l-o-d-e-o-r-x. So what does this change about the previous method? Are you talking about the transposition? Yeah, so what is this, yeah exactly. So what is different about this approach than this transposition approach that we just talked about? The letters can be in any part of the... Yeah, so one of the difficult things is the letters can be in any part of this work, right? So the l that was in hello, instead of being in the first four characters, ends up way over, like, ends up, one of them ends up here and one of them ends up over here. So you have basically characters being drawn from more of the letters. This is kind of more of a... So yeah, basically you attack this essentially in the same way as we just talked about. You look at, hey, the one gram frequencies like a Caesar cipher matches English, but the n gram frequencies do not. So, right, the two gram or three gram frequencies don't match, so it's probably some kind of transposition. You could rearrange that. You can also just check, I mean, the rail suspense is not very secure because there's not a lot of possible combinations of this, so you can easily check all of those. It's an easy one to do by hand, which is kind of cool. If you ever find yourself in the need to do a very easy cipher by hand, yeah. So, is like the key then, the vertical height that you see? Exactly. Yes? So, the primary security of a rail fence then is people not knowing that it's a rail fence cipher? Yes, you would have to extend this to make the key more interesting. So the transposition cipher is very easy, but the problem there maybe is for shorter messages, your key would have to be either roughly the size of the message in which case you're basically doing some kind of rail fence, but in a different way. It's more of a historical thing. Okay, so we can pop through this very quickly of how you would go about this. We have our cipher text that we just computed. We'd look at what are the most frequent two-letter occurrences in the English language that begin with H. Why do we care about that? Because we know that the first word starts with an H. Yeah, we think it starts with an H, right? So we want to figure out what of all of the rest of these letters are likely to follow that H, because then we can try to build the equivalent rail fence here. So we look at H E is 0.0305, H O 0.0043, L W R D, very infrequent. So this would probably recommend that this E should follow this H in whatever method that we're using. We can also look at words that end in H, because maybe this H actually is not the beginning. Maybe it's at the end, because they're transposing it some weird way. We could look and see these are all very infrequent. So out of all these probabilities, which one would be the one that you'd try first? H E. H E, right? It's pretty clear. So you would try that. So you'd arrange it like this, and then you'd write H E, L L O, L O, and then you broke in that site. Okay, cool. You could do this. Yeah, anyways. Cool. So I'm going to briefly go over this. Extending this idea is basically creating a matrix, and having a key where you write the message, ASU is awesome, and then you transpose each of these columns. So this is where you can extend this now to get the benefits of transposition in addition to a rail spend cipher. Really, this is about showing you what the basics are, because it's going to be kind of crazy, but all of the symmetric encryption algorithms that are actually used today to secure communications are really based off of these ideas. So it's important to understand that at a basic level, and you're practicing breaking some of these ciphers as well. So we talked about that. How do we decide which cipher text is which algorithm? So we assume, so think about it this way, when we're assuming that we are, when we're thinking about considering the security of a crypto system, what do we assume? That the encryption and decryption functions are public. Yes, that the encryption and decryption algorithms are public and that they're known to the adversaries. Why do we make that assumption? Because all of them are, and if the security of our encryption algorithm relies on nobody knowing the encryption and decryption algorithms, then they're basically the key. So when you're considering the security, we make that assumption, but if we say we just have some cipher text, do we automatically know exactly what encryption algorithm is used? No, maybe not. That actually would be a good, maybe addition to you. So how do we test which, how do we, from the things we talked about, how do we try to tell which is which? The frequency and see if it's like a one gram or two gram, like the colors. And what's that going to tell you? So yeah. It's a transposition cipher. Right, so how, okay, so then what specific case would tell you that a one or two gram is like, what are you looking at from those frequencies if it's a, just tell you that it's a transposition cipher? That anything higher than a one gram isn't going to give you any data. Right, so a one gram would match the English language and a two and three gram would not match. So that would say it's probably some transposition. So that helps you kind of cut the universe of possible encryption algorithms that way. What, what else from there? If the frequency of characters doesn't match up, it might be a Caesar cipher. Right, so the frequency, so you can kind of, having you, so you have some text, you say, okay, the frequency, excuse me, doesn't match English. Exactly, so it's not a transposition. The one gram frequency doesn't match English. It's not a, it's not a transposition cipher, but how do you tell if it's a Caesar cipher? Is it would just say well it doesn't match English. What are you looking for? The exact distribution, but it still has those peaks and valleys that English has, so it's a shifted distribution. Right, exactly. So it doesn't have that, then what are you looking for? Okay, so yeah, different distributions, smaller sizes. Yeah, so you'd think maybe it's a, a visionary cipher, and so you could try from there and you could try all those techniques we talked about to break that, so. Anyways, you can do all kinds of cool stuff. I'll talk about some real world examples of, so these crypto systems are actually used in real world. The main thing, and this is something that those of you who hopefully started your assignment will see, we typically use XOR instead of shifts. Why is that? Computers do it faster, a shift is just addition, and is an addition basically XOR? Maybe subtract, I don't know, yeah. Let's say the inverse is also XOR. Say that again? The inverse is also XOR, so if you want to encrypt you use XOR and then... What does that mean? So spell it out for me. Yeah, it's symmetric. So symmetric, so that means what? So if I have A, XOR, B is equal to C, what does this mean I can do? B is equal to C, XOR, A, or A is equal to C, XOR, B. Right, so this is a nice property, but also do we have the same property with shifts? Kind of, but not really, because we need a different operator, but it does have an inverse, so we can go given, if we say we take A and shift it B to get C, we can take C and A and recover B, or C and B and recover A. Here it's a lot nicer, because it's exactly the same operation, which is much, much easier. Why else might one of the properties... Just kind of a question going back. Why would we want it to be easier, like to decrypt, but if we were encrypting something in a real-world scenario, why would we want to use XOR when it might be objectively easier than an addition? Interesting. What do you mean by easy? I guess what do we mean by easy? Well, like you said, XOR can be done using XOR itself, but if you do it using like addition, the person decrypting might not know whether you used addition or subtraction. Right, so I'd say a couple of thoughts. One is easy implementation, so you want your implementation to be easy to be able to be seen that it's correct. So if you have two different operators for one for encryption, one for decryption, that might make it more difficult. We won't get into it here, but there are crypto systems that don't use any concept of encrypt or decrypt. It's just one function. So that you can take the same ciphertext, put it through the system, and the encrypt and decrypt are exactly the same, which also simplifies things a lot. I think is the same with this, mostly. Like rock 13? No, I can't remember CTR mode, I believe is what I'm thinking of, but I can't remember what that stands for right now. Say it again? What does that mean? We can train the natal? Yeah, so XOR is kind of native. I mean, there's native operations. The other thing that's nice about FAST is in some sense, it seems a little counter-intuitive. We might want our decryption to be very slow. Why would that be? Harder to brute force. Harder to brute force? Yeah, exactly. But on the flip side, if encryption or decryption is very slow, who's going to use it? Are you going to use it? Are you going to browse a banking website if it takes 30 seconds to load an encrypted page? Are you going to purchase with an encrypted account? So this is actually something that I didn't realize and it's super interesting, but part of what I worked with when I was doing an internship at Microsoft Research and one of the people on the team, their job was to get crypto test cases into the .NET frameworks test cases and into their performance tests so that that way they could improve the performance of that over time and track it because they found that when encryption is super fast, people actually use it. If it's really slow, people don't use it. So you want it to make it as easy as possible. So the way you deal with that is basically even if it's super fast to decrypt for brute forcing, you just make it basically completely infeasible, use key sizes that are large enough that there's no possible way that they can brute force it inside. Cool, good question. And so the other nice thing about XLR, I mean it goes with being reversible, I believe, but yeah, so anyways, so that's why it's used in a lot of places and we can, again, we're not restricted here to, which was the point mentioned just right now, is we're not restricted here to adjust the length of an alphabet or something or we're shifting characters. Here we can use bytes and do XORs. We can actually, if our system supports it, we can do 30 bytes, 8 bits, we can do 32 bit values, 64 bit values, 128 bit values, and it still just works, that basic XOR operation. Cool. So this is again another instance of me trying to drill into your head, do not implement your own crypto. Why not? What are some of the things that we saw that could fail? What was that? Brute force. Brute force? Yeah, maybe, so in what ways could you brute force maybe a real crappy crypto system? If you just run a program through it, maybe whoever made it's not going to think of the big cases, I guess. Yeah, what about their key size? How are their keys generated? Maybe the key size is technically, I don't know, if it's only 32 bits, have you ever tried to brute force something that's 32 bits? It seems like it would take a long time. How many do you need to try? How many do you need to try? So you're going to try brute force at 32 bit value, how many tries do you need? Yeah, you don't know that value? 2 to 38? Memorize all your powers of 2, what's wrong? I don't know, it's like 4. something. Is it a million, billion? Let's see, 4.2, yeah, 4.2 billion. And it turns out that with crypto operations this is actually not fast enough. Or this is not a large enough key space. You can actually brute force this very easily. So this is why key size off needs to be very larger. So maybe they think that their key size is large enough, but it's actually not. So if you're assuming that the key size we'll just stick with 32 bits because it's kind of easy. So if you're saying that their key size is 32 bits what are you assuming when you brute force starting from 32 zeros and then adding one until you get to about 4.2 billion, all ones? What are you assuming there? You'll probably hit it somewhere in the middle. Why are you assuming that? You're assuming that their key was randomly drawn from zero to two to the 32. But what if they only randomly drew a key from zero to two to the eighth? Right, so if you know that their key is weak or that the key is confined to a certain range you can actually break it much faster than people think. There are, you could go on and on and on and I could talk for a real long time about all the different ways preface systems can break. So please don't ever write your own. I don't, especially if you're trying to do some, maybe I don't want to get into it, but cryptocurrency madness where you're creating your own thing and you're creating your own hash functions and you have no idea what you're doing. Do not do it, yeah. Say again? Louder, I can't hear. Yeah, so those are done in a lot of different ways and they have very different properties. I think we'll touch on it a little bit but I think if you want to get more details on that you should look it up. So those are essentially pseudo random number generators so when you call a ran function and you need some random value if an attacker can guess what that value is that can have massive problems and this actually has been used, that's been used to break poker games is they're able to break their online poker games that they were able to figure out the seed used for this random number generator so then when they do that then they know exactly what everybody's carbs are and they can win the game. All kinds of really bad stuff. This has been, there have been problems in, you know, so what happens when you get locked, anyone ever get locked out of an online account, forget their password? There are people in this class who've done that, so what happens when that happens? You're just stuck. They send you an email or something you've got to verify your identity. They send you an email, what's an id-mail? The link to click on to verify that you are the one and what's in that link? Was it a random generator? Yeah and in that link, right, because what they need to do is you're on the website you say I want to reset my password so they're making sure that whoever controls that email address actually requested that password change so what they do is generate a random value that you put in the URL, you click on that URL, ideally you're the only person who knows that random value because you're the person who controls that email address and then they let you change your password. But when people use poor random number generators attackers can do that reset password, never see it but guess what that random value is and ultimately take over your account. So this is a problem that's definitely happened before. Other types of stuff that you and this is why this rabbit hole of like poor crypto implementations goes real deep, even if you have it perfectly there are demonstrated side channel attacks where people send basically what the attacker uses so think about password checking this is kind of my classic example so how do you check whether a password is valid or not? So let's say the server knows your password you're a client, you send your password to the server how do they check if your password is correct or not? Compare it to the password they have how does that actually happen? Usually they store a hash and they hash whatever you send it Yeah, we haven't got any hashes so how do you compare how does a string compare some work? If I have you write an algorithm it's called a string compare how do you do it? How are you testing if the string is equal? You're not using equal for string comparison So you start at the first character and each string you compare them if they're the same you move on to the next character you compare them if they're the same you move on to the next character So let's say the string is some crazy long 32 character value If you were brute forcing that just randomly how many guesses would it take you? Yeah, however much your character set is to the 32 which is a lot Even just lower case letters that's talking like 26 to the 32 that's a lot especially testing a remote service So what happens? So think about this We don't know the remote password The remote password is some random 32 characters Let's start off by assuming that we know that it's 32 characters So we know the password is exactly 32 characters So Given the thing that we just talked about about how we do string comparison What happens if I send a string of all B's 32 B's to the server So what's going to happen? It takes my input and it does string comparison on it So what does it do? It compares the first two and says what? They're not equal Stop What happens if I change this first character to A? What happens? So it'll compare the first characters say that they're equal and then execute the loop one more time go to the next characters So if I don't know this first character how many tries do I have? So I can send The idea is I can send we'll go with lowercase I can send 26 requests changing just the first character here and on most of those what's going to happen when the server compares the passwords Yeah, it will compare the first characters and say not equal But on one of them there's one additional character and it turns out you can actually detect that difference remotely from a remote web server It's a perceptible different I mean you would not be able to do it by hand but you write a program that does that timing and you can see a timing difference there and then when you've got the first character then what do you do? Try the second one Same thing for the second one keeping the first character with the one you think it is and then you keep doing that byte by byte So how many So instead of 26 to the 32 how many guesses do you need for the first one? 26 26, how much for the second? 26 26, so it's 26 times 32 instead of 26 to the 32nd which is a lot less So you want me to put the number I trust you to do the math if you're confused about which one is a bigger or shorter number But yeah, so 26 times 32 this would require only 832 guesses but in 862 guesses you could get that entire password that was stored there This is a real thing that happens on real websites Super cool And so this is just a super subtle timing side timing attack of literally something that you use every single time for programming which is a string comparison operation and by using that if you're using that incorrectly in your crypto system you just introduce a bug that somebody could potentially use to take advantage of it And this happens in more complicated cases where and this happens not just in these password comparisons but it happened in a lot of crypto systems where it would leak information about the key based on timing attacks So when you do crypto to do it securely you have to make sure that every single operation takes exactly the same amount of time or that whatever the time that it takes doesn't depend on the key which is very difficult to get correct in a modern system Oh, yeah, okay So actually I completed these two a little bit so that's more timing attacks Side channel attacks are anyone ever notice fan turn on or off on their computer Why does it do that? Yeah, when it heats up, why does it heat up? Because because we're just going down and diving from computer science into the hardware another tab and you know when Chrome is executing yeah when something happens fundamentally, right, some kind of computation so there was originally the work that showed that if you were on a system and you could figure out the power usage of well, I think the original one was if you could figure out the exact power usage of a chip you could figure out what operations it was doing and so that could leak the key that way but of course having physical access to measure the power is kind of crazy so then they figured out if you could just measure the access the power draw of the whole system so you could see how much power the entire system is using you could infer a crypto key from that and then they figured out that you could actually use the fan noise to as a side channel that would leak bits and information about the key so these are all crazy and side channel attacks are actually experiencing a renaissance now through cash attacks on chips, all kinds of crazy stuff so crypto libraries do not write it yourself do we all agree on that have I scared you enough shall I go on if you have any compunctions about or any weird desires and it's totally cool you could write your own thing to break it I'm not saying don't experiment to have fun with your own crypto system but do not introduce that at work if you're ever in work going like huh, am I implementing my own crypto system and one of the examples I want to use is a challenge from DefconnQuals 2011 that I still remember working on this this was a challenge called I don't know why it was a binary challenge but it was a 300 point challenge and usually the points range from 100 to 500 it was like a medium upper level challenge it was a tar archive that had a .dex file and .jpeg files so what's a .dex file no android developers here no so a .dex file is a .jar file but for android devices so all of your apps are .dex files basically so if you ever mess around with android apps so it gave you this android app and then it gave jpegs our images you all agree on that cool so it gave you an app and then when you looked it up this app was actually an app that still exists this is an app called Picks Light so it was an app that would um store your pictures safely this was the light version obviously it means it was the free version and so it's kind of crazy this was not actually written by the DefconnQuals people at all they found this and it turns out so we reverse engineered this android application we found out that the encryption was XOR with an 8 byte key which is what kind of encryption vision air cipher it was a vision air cipher so it was 8 byte keys and it was just exactly what we talked about before where every 8 bytes got XORed with whatever the key was and so on and so forth throughout the thing for all of these jpegs and we had to figure out the key is jpegs anything about the format of jpeg file why is that a header to know what the program is reading because it could be a GIF or an app most file formats have some kind of magic value at the start that tells you if it's a jpeg or whatever value so we used those bytes and we took those bytes we XORed them with the bytes in these encrypted jpeg files and we got the key and then I think from there we just brute-forced the remainder until we got a valid jpeg file and then eventually we got the key but this is a real crappy crypto system that's probably I didn't look at it but it's probably still being used by this Android app and so if you're paying or you're not paying for it you're getting real crap encryption so this is the show that people do implement their own crypto systems as you're learning and this brings us to today of modern symmetric encryption so this is going to be a little crazy in what happens here and I don't have all of the answers about exactly in terms of so I'll say a couple of things one, these things are incredibly complicated modern symmetric encryption algorithms they are fully specified you can go look up all the details here what I really want to convey is more how they work at a high level why they work and try to develop that understanding about that rather than so don't be too freaked out because we're going to look at real crazy stuff so they are essentially basically they're in a class called product ciphers which are combinations of transposition ciphers so why why not just use one exactly we just saw you just use one that can have problems so let's use more in crazy ways this is actually an active area of development there are all kinds of crazy ways that these things have been attacked but what properties do you want from a symmetric encryption system so what does this symmetric mean of what we've talked about yeah so both sides have exactly the same key so what properties do you want so you're sitting down to design a new symmetric encryption algorithm what do you want it's a trick question you don't want to design one but if you were to what would you want it's like verifiable so you want some way to be able to tell that the decryption process is the same that's actually not handled by these systems we'll see another way that that's done with using hashing that are combined with these so these kind of focus on one thing you want the key unable to be guessed by a third party or you want it to take a real real long time given current computing capabilities I don't know what would make you feel safe so you have the most important document or thing that you can think of okay we'll use this example let's say you have 10 million dollars in bitcoins your private wallet what would you encrypt that with so let's say you encrypt it and with a key that takes 4 million guesses that's small why yeah so it depends on how long the operation takes but even if it takes a second what's 4 million seconds a few months would you be willing to wait 4 months to get 10 million dollars a month and a half possibly and you may be able to make it go faster let's say it's some hard limit of 1 per second but even that so 4 million is not that big what would make you feel comfortable what if it took a year yeah well I'm trying to what makes sense for you a trillion what's a trillion seconds it's going to get a bull from alpha that's what it's great for it's a lot of years how many years 31,000 31,000 years is that enough years that's decent here why maybe it's a secret you want to keep longer so maybe you want even a trillion so you kind of want the longest you possibly can again it depends on what your threat model is if this is just information that you actually in a year will be public then who cares it's great for but you do want to think about that so one of the things is the key size is it large enough to resist a brute force attack so that was kind of the core idea there and again that resistive brute force attack depends on your personal definition and your personal kind of opinions what else do you want so immune to brute force is so large that brute force is not feasible that you can encrypt a whole bunch of things that you can so for example computers you want to be able to encrypt more than just text maybe you want to encrypt images or videos so the set of plain text messages that we talked about is anything you would ever want for it so computers that's basically any digital thing what else? even if you put an algorithm you don't want to be able to use it even if you assume everyone knows the algorithm yeah that's something easy implementation why is that important right so if people can't install it and use it and understand it and know how it works they would be highly unlikely to trust it would you trust me if I said I had this great new encryption algorithm just encrypt all your bitcoins with it I'll actually look at how it works sending all your information to me any other properties? easy to compute can you encrypt and decrypt right so we may want encryption and decryption to be fast as we talked about yeah perfect so these are all good properties and we wanted to so what was the other type of attack we were defending the brute force attacks what was the other kind of attack that we've done side channel so we want well side channels are more difficult they're more about the implementation itself so it's hard to I'd say design an algorithm that is itself immune to side channels timing attacks timing attacks so I think for timing attacks it's a little bit more difficult I won't say you can't maybe you don't want people to be able to analyze the cipher text in order to determine patterns so there should be absolutely no correlation between the plain text and the cipher text maybe the key and the cipher text so really what you want is that there's no possible way to go from the cipher text back to the key which really you want that to be incredibly difficult but what you want is that the security rests in the size of the key that way you can have a very large key and things work so look at one of the first encryption standards here why do you want it to be a standard I was looking here yeah yeah so again this is about trust right so how much trust do you actually have in this algorithm if it's something that nobody's actually using for anything secure how do you know there's not a trivial mistake in it or so yeah it's just interesting things to think about why you want this and you want it to be a standard so that if you need to share documents between people you can say hey we're using this algorithm to do that so this was proposed by IBM as a standard for encrypting sensitive or unclassified government information what is specifically not on there classified information yeah classified or top secret or any of those other things right so that's important to think about super interesting and this we'll get into in a little bit it was standardized in the mid 70s 1976 1977 what's super interesting so think about this so the NSA actually gave suggestions on how to tweak the algorithm that was incorporated into the resulting system so why did they do that why do you think they did that so so maybe to weaken it they could break it yeah right so maybe they want to intentionally weaken it what's the NSA's job pretty pretty so at least so what about in the computer realm what's their job in the computer realm being secure what was that to keep things secure so it's actually so their main focus historically in the computer space has been to essentially I mean break into systems or be able to have that capabilities essentially it comes from an intelligence and signal gathering thing so I'm actually reading a book right now about the history of the NSA it's super interesting I don't remember the title on my head but they talked about how the NSA at the start was really focused on like microwave emissions and other types of radio waves and so that's how they got a lot of their information but as things moved more to digital communications they had to shift their focus but they also do have they have a area a part of the NSA that's actually dedicated to securing computer systems why is that it would be bad if they got hacked probably and not just them but the US government right gets hacked and so that's part of it is the government was eventually discovered hey our computer systems are connected to everything and we can easily hack into people but so that means they could probably hack into us so we should do something about that and not just for us but also for companies so it's yeah so it turns out that actually the NSA had classified methods of doing basically like super advanced crypto analysis that they could recover bits of the key based on a lot of I think it was I can't remember what exactly what type of attack but these tweets actually made it more secure so differential crypto analysis that's right so it's a technique called differential crypto analysis and the NSA actually knew about it it was not public but they intentionally tweak this algorithm to make things safer which is something that's super interesting and it didn't come out till many years later when the public invented differential crypto analysis and then applied it here and realized that the previous version was vulnerable but this version was not which is crazy okay so and maybe they did introduce something else that nobody's ever found I have no idea I won't think that so the idea was 64 bit block size and a 36 bit key so this means what's the size of the key space this is 56 yeah 2 to the 56 which is how much is that within our safe range 7.2 times 10 to the 16 7.2 times 10 to the 16 how many seconds how many years is that in seconds 2.2 billion years 2.2 billion years all right so the key thing to think about is when was this created 76 76 what's the average speed of a processor then I don't know not very fast the average speed of a processor now right so our one operation per second is incredibly low number you can do these computations very very very very very very very very very fast I don't know I don't know off the top of my head exactly what how that translates now into strength but this is what they were working with so the key idea was and this is where again it gets crazy so again it operates on blocks just like for everything else right so for for vignette cipher and transposition cipher so we look at the data 64 bits at a time but why don't we just do something with this why don't we just explore it with the key we don't have bits we're leaving some bits and it would just be again exactly it would be a we wouldn't get any transposition benefits we'd only be replacing things so anyways at a very very high level the plaintext comes in 64 bits it gets transposed and mixed around and then parts of it go into this F function which gets XORed with the previous results and this happens for 16 different rounds and then this F function basically has these these different blocks these are all different tables so the key actually that's a mistake because the key is 56 bits and so yeah so this is generating different parts of the key for each of these rounds and each of these PC1 so this is just a transposition this is exactly the transposition thing we talked about but instead of only size 4 here it's I believe it's and it's not just a transposition because you can see that maybe you can tell that there's more blue dots on the top one than the bottom one these are all in Wikipedia by the way if you want to go check these actual images zoom in so it's moving all these bits around and throwing away some bits and then does some left shifts on some and then combines them again and does some left shifts here and then this PC2 is a different permutation algorithm that permutes those bits in different ways again this is what I said this stuff is crazy to be a professional cryptographer to understand not just how it works I mean the how is very straightforward because you can read this code and figure out how it's more of the why it works and why you get properties that you want and then this operation so it takes half of a block size passes it into this E function passes 48 bits from the key and then you have these are called S boxes so each of these is a basically its own you can think of it as a Caesar so it's going to transform these bits into different outgoing bits and it was one I can't remember which one of these S boxes it had one weird the NSA fix was in one of those S boxes to change certain values from one thing to another so and then you permute that output again so this is happening 16 rounds all of these things essentially diffusing in some sense any of the initial input but all in a way that's reversible so these are the S boxes which are crazy so you can look at this and you can say the input is this and this and the output should be there you guys are going to memorize this for the test I see nodding heads cool so this is just a crash course to definitely take I believe we definitely have an undergrad crypto class that goes into a bit more depth but also the theoretical underpinnings of these crypto systems which we will not get into here ok so how do you use this something like DES how do you use that what was the inputs and what are the outputs like the key in a plain text and you get out of just a simple text ok so the key has to be 56 bits as we saw but what from what we just saw can you put any plain text in that algorithm say 64 bit block yeah you can only put in a 64 bit block right why is that important so do you want to only ever encrypt messages of 64 bits be nice if you did then you don't have to do anything so how do you actually operate on more than one more than one block so now let's think about it in missing my alright I'm going to draw with my mouse which always ends up well ok so let's say we have this black box we'll call it DES for now it could be anything this is really bad ok so we can feed what into this 64 bit block we can feed a key first the key K which is 52 bits 56 and we can feed in a block of plain text and we said that's was it 62 64 and we get out what 64 bit block of ciphertext yeah we get out a 64 bit block of ciphertext cool so and I mean DES has been deprecated there's newer algorithms in DES but fundamentally these all kind of work in a similar way you have a key, you have plain text in a block size and output ciphertext so how do you and let's say we'll leave the statistical analysis alone for now and we'll just say that given even if you can choose the plain text and the ciphertext you still cannot derive the key based on doing however many you want of this algorithm so it's immune to a chosen plain text attacker which as we've seen is the strongest type of attacker so so but does that hold for how do you encrypt more than 64 bits of data seems very limiting right yeah what you can do is you can split if you have something larger than 64 bits but it's still a multiple of 64 bits you can split the plain text into blocks of 64 bits like all of those and then send like all of them so then we can do so this is plain text 64 we'll call this p1 we'll call this p2 64 bits what do we use as the key here same key and then what's going to get an output here ciphertext 2 64 bits so I can do this for every block what do I do with the extra bits I need to figure out what to do at the end but and that's a whole separate problem but we'll assume for right now we have a multiple of 64 just to keep that nice so so what are some properties that are going to hold here so given what if p1 and p2 are the same we'll get the same ciphertext we'll get the same ciphertext why do we get the same ciphertext so if the key is the same the same input we should get the same output if that wasn't the case what could we not do decrypto we could never go backwards so that has to hold what does that mean for a crypto system what was that what decrypto yeah so it's a one to one mapping and so so this is you just amended electronic codebook ECB mode which is one of the older styles and the idea is exactly like we just talked about split the plaintext up into blocks use the key and encrypt the ciphertext and where is my image cool so one of the problems here is that due to the block size right even though the text is randomized so um aspects of the plaintext message can still leak out so here's an example this is a this is the linux penguin in I believe it's raw character mode so there's like RGB value per pixel so it's not a jpeg or anything so taking this and encrypting it with a key in ECB mode shows this and why is that what was that yeah so it's encrypting 64 bits right so just mapping certain bits to other bits yeah all the black all the black colors have the same RGB value so when you encrypt it they'll become encrypted the same ciphertext yeah and then all the outside the transparent value gets encrypted the same ciphertext the even though there's no real core correlation here because once the when one bit of the plaintext is different the ciphertext should be different as well like very different and so this is a problem because why did you encrypt this image so no one would know what it was but if you send this to somebody and they did this do they know what you sent yes yes you're still leaking patterns of the plaintext through here oh no we're over time alright we'll revisit tux in a second okay think of ways to fix that