 Okay, let's get started. My name is Nathaniel McCullum and I'm a Principal Engineer at Red Hat. This class is Cryptography for Beginners, so we need to start out by a disclaimer. I am not going to teach you thorough cryptography in the next half an hour, okay? There are long courses on this, people spend decades of their lives studying it, so we're not going to attempt to cram all of that in one hour. The goal of this talk is just going to be to give you a simple introduction to some of the concepts and then there'll be links at the end if you want to dive into more detail that you can go do that on your own. So let's get started. The first encryption we're going to talk about is actually the Caesar Cypher, which is probably the one that you are all familiar with. It's the one that you used to pass notes to each other in fifth grade and you didn't want people to see what you were saying. And it basically works with a simple shift pattern. So you shift by two, all the A's become C's, all the B's become D's. Is this time familiar when you guys have done this? Okay, so you take this little statement over here, welcome to flock, and you shift it to, that's what it looks like. So it's hard to read for the average English speaker. A Dain might be able to speak that pretty well, but for the average English speaker we have no idea what that says. So this was a very convenient way to do it, and if you don't kind of know the little trick, then you don't know how to decode it. Well the problem of course is that this really isn't secure. With any significant amount of analysis you can actually reverse engineer what this says, which is the reverse process. So you shift it back, two spaces, all the C's become A's, all the B's become D's and so on, and our nice Danish phrase simply becomes welcome to flock again. Now lots of information is being leaked in this sector, so it's certainly not secure. You can buy a statistical analysis, you know the contents are English, so you can sort of figure out what some of the letters are, what they might map to go backwards. So this is not something you would use for anything more than a great lesson with your kids. It's really fun, by the way, to teach your kids this kind of stuff, because then you get all the little notes passed around the house, because they think they're keeping it secure from you. So we're going to have a pop quiz here. Using this basic technique that I've taught you in the last 30 seconds, what does this say? Write it down. If you have a piece of paper or a computer in front of you, write it down, think about it. You know the answer? Okay, go. Genius. Now tell me how you figured that out. I saw the two S's next to each other, and I was talking to myself, what has two letters in the same third fourth position? Exactly. And then that's the same letter in the next fourth position. How are we going to do that? So you did a very basic statistical analysis by noticing some of the commonalities that I hear in correspondence to English phrases. And by that you were able to deduce, without knowing the actual answer, or the shift code that was used for this, you were able to determine the primary, the plain text as we call it in cryptography. So a step up from that, probably the most famous of all early cryptography was the Amadeum machine. You can see that it was actually invented by a German guy named Arthur. And he filed for the patent in 1918 and it was in commercial use in the 1920s. Now this was done right at the end of World War I and was used very heavily by the Nazis in World War II. So one of the, for the Allied powers, one of the very important things for us was to crack the cryptography. So this was actually cracked by the Polish Secret Service. Three guys there were able to figure it out. And there's a really fascinating history here that sort of illustrates a lot of what happens in actual cryptography, which is that they did not actually crack the machine, which is a fascinating phenomenon. Although there are weaknesses in the Amadeum machine that we now know of today, that we could crack it, they did not actually crack it because there was a defect in the machine at the time. They cracked it because people were using the machine wall. And so this is the way it is in a lot of cryptography. We can give you cryptographic principles, but the vast majority of the problems of cryptography come from procedural error. You're simply using the cryptography incorrectly. Moving on, we'll talk next about the one-time pad. The one-time pad is the most basic form of encryption. This is actually in binary. If you haven't noticed, there's no numbers here. It presides zero and one. So we have this plain text. This is the thing that we want to keep secret. We want to pass it to someone else, and we want nobody in between to know what's going on. That's the top line. Then we have a key that we use, and this is how we're going to hide this plain text. We do it with a very simple mathematical operation called XOR. That's what this symbol over here means. XOR just simply means only one of the two can be true. If this is one, and this is one, then if you XOR them together, you get zero. If only one of them is one, then you get one, and so on. If they're both zero, then you get zero. If they're both one, you get zero. Otherwise, if they're different, you get one. Very simple mathematical operation. The XOR operation is the bedrock of all cryptography. Very, very simple. In fact, a lot of people joke that cryptographers really only know how to XOR. That's the only thing they know how to do, which is very true, but there's a lot of other stuff that goes into it as well. You can also see here that there's a decryption operation implicit as well. If you XOR in the opposite direction, you get the plain text back, which is these are the two bedrock operations of cryptography, encrypt and decrypt. They are inverse, yes. We can go in both directions. Can somebody tell me a downside to this encryption? What's that? It's binary, so we can convert pretty much anything to binary if we want. That's really not a problem. You know too much already, no more. This is for beginners. You can easily decrypt it, which is great, because we want the decryption operation to be very efficient. We want you to go backwards. You can't decrypt it if you don't know the key. That's the point. You want to keep it secret. That's definitely a problem. That's not the specific problem I'm looking for, but it's related to that problem. You have to give this ciphertext to your party and you have to give them the key. What's the problem that arises out of that? No, because you can always reduce your message to binary. Just assume that for the rest of the class. You can always reduce your message. What if the message is longer than the key? Thank you. The message must be the same length as the key in a one-time pad. If you want to send a short message like this, sending a short key is fine, but you have essentially doubled the amount of data that you need to send to the remote party. What happens if you want to transfer 10 terabytes worth of data using one-time pad? How large does your key need to be? Well, your key needs to be 10 terabytes. Your message is 10 terabytes, so you have 20 terabytes total. This is very inefficient. We don't want to double all of the traffic on the internet. This is one of the reasons why one-time pad is not used, or rather it is used as a building block to build other kinds of encryption. I want to ask the question about one-time pad. Is it secure? Does it have perfect security? Thank you. Seema's got it right. There's actually two principles. It has perfect security so long as the key is random and the key is never reused. As soon as you start reusing the key for other operations, then people can begin to determine by the messages and they can start to do the statistical analysis against the message and they can reverse engineer the key and then you're completely hosed. One-time pad is secure, but it has some very stringent criteria. We're going to see these criteria play out in some other ways. The first way we can get past the problem, the major problem here with the one-time pad is that the key has to be as long as the message. One of the ways that we get around this, basically we get around it, is by using a pseudo-random function. This is often times called a PRF. A PRF takes n bits of input and produces n bits of output. One of the things that a PRF can be used to do is to expand essentially the size of the key because our third principle will hold that the output is indistinguishable from random. If you've ever used a hash function and hash function like SHA-1, for instance, is a PRF, you provide some input, it provides some output, the output is indistinguishable from random. So building on this, we have the concept of a stream cipher. And a stream cipher is essentially where we take a seed, which is our key. This is the basic building block. A stream cipher is not actually the symbol of what it's built on top. So you take your key as your input to your PRF and you XOR that against the plain text, which is what we saw on the previous slide. So that allows you to expand the key. So I can pass you perhaps a key that's maybe 16 bytes and then I can send you the 10 terabyte message and you can use the PRF to expand that key into 10 terabytes to do the XOR operation. Okay, now again, no stream cipher is actually this simple because there are definite problems with this approach, but it's the basic idea behind stream ciphers. And we want to ask the question again, is it secure? What's that? I'm sorry. Yes, thank you. We have actually three principles now. The basic principle is that it is secure if the seed is random, if the seed is never reused, and if the PRF is a true PRF. In other words, the output of that function is truly indistinguishable from right. So here's some examples of some stream ciphers. The first one we have is Arc4. It's spelled this way, which is actually a trademark issue. They trademarked it and you can't use that. So you call it Arc4. Salsa 20 is another one. But generally speaking, we don't use stream ciphers that heavily today. There are some that are secure, but most of the stream ciphers do have known attacks. So the very simple principles, we have to start building more complex layers on top of them. So let's talk about key reuse. So one of our principles is that we definitely can't reuse the key. If we reuse the key, we can begin to do statistical analysis and we can reverse engineer the message that's being sent. So one of the things we need to talk about is this very simple question. If you know the answer to this question, don't answer. What are the odds that two people in this room have the same birthday? Anybody know? He said if we know, we can answer. Well, if you think you know, you can answer. If you're not sure. More than 50%. Who said that? Great. It is more than 50%. Well, yeah, we're probably more than 50% now or we're very close to it. Let's go around the room. Let's try this as an exercise because it's kind of fun. December 31st. Talk louder. Yeah, talk louder and if you hear your birthday, yell out. What? 9th of April. Tomorrow? Happy birthday. That's yours. We got it. Okay, so yes, we did actually achieve it. The birthday paradox is that what you would think is you would kind of like divide up the people in the room to figure out how many people there were against the number of days in the year and that's actually incorrect. Your intuition is incorrect. The birthday paradox is that you have an exponential increase actually based upon the number of people that are in the room how likely it is for you to get someone else with the same exact birthday. So we don't even have to get close to, say, 180 days in order to get 50% certainty. We get 50% certainty with just 23 people in the room which is about what we have here. And you saw that played out that there was, in fact, someone in here with the same birthday. So this is a real problem in cryptography because if your principle is that you can't reuse the key and very, very quickly, the more times that you use a random key how likely are you to hit the same key again. And the answer is very likely. So there's a lot of techniques that go into making sure that that is not the case and a lot of care is given around making sure key reuse is one of the most fundamental errors in cryptography. It's the reason, for instance, why WEP on Wi-Fi is completely insecure because they reuse the keys. And so all you have to do is observe a certain number of packets flowing over the wireless network. You can just observe it by snipping them. Once you have a certain number you can use the birthday paradox to be able to reduce your odds and you can guess the WEP key very, very quickly. Next let's talk about block ciphers. Block ciphers are a lot about stream ciphers except instead of operating on a continual stream of data they operate in blocks. They have a fixed size and this is a fixed size for both the keys and the blocks. I'll give you two examples here because it's often misunderstood that AES-256 has a 256-bit block size which it does not. Actually they have the same block size. What changes is the size of the key that you use in order to protect the data. Now block ciphers can be used on their own but one of the problems that we will run into is the question of key reuse. So if you have a bunch of blocks of data and you're using the same key over all of those blocks of data how quickly are we going to find another key? Well the birthday paradox tells us that we're going to find another key pretty quickly. So what we actually need to do is we need to have modes that prevent this. Modes also prevent the leakage of data based upon the broad structure of the blocks and I'm going to use an image to illustrate that in just a second. I've given two modes here. The common mode you might immediately think of is this one, the ECB where you have some plain text and you have a key and you give it to whatever your cipher is like AES for instance and out comes your cipher text and then you move on to the next block and you do the same thing, the plain text for operation along with the key and you get another block out and then you just kind of append them all together, right? Seems sensible. Well the problem is this is what ECB mode encryption looks like. So if you were to do that technique you can see that all we've really done is kind of obscure the data. We haven't hidden it. So is it obvious to everyone in the room what this is, right? So is this what you want your banking stuff secured with? The answer is no. You don't. So instead we use other modes that hopefully produce something that looks like that on the other side. One of those modes is called CBC and there's many of these. So and the way that this works is it adds an initialization vector and then each block uses the previous block in its output so that there's a trailing effect which obscures the data all throughout the encryption. This often requires an initialization vector which is usually random and as it's done in cryptography you usually just append the IV to the beginning of what the message you're sending and then you have the encrypted blocks afterwards. So again here what we're trying to do is this not this. Now is ECB mode insecure? It's a good guess but that's incorrect. There's no IV in ECB mode because you're just doing each block independently. The answer is yes. ECB mode is secure when you're doing one. One block. Right? Because if you're using only one block then you can't see these relationships between the blocks. So if you're wanting to do some very simple encryption of some very small data then ECB mode is fine. Use it for just that limited case just make sure you don't reuse your key. What's that? It is yes, it is essentially a one-time file. So nearly have the problem of we've talked about stream ciphers we've talked about block ciphers and one of the problems that then arises is how do we send the key or how do we communicate a key for encryption without sending it over the wire? Right? Because if I just send you the key and the ciphertext well then anybody can decrypt it because they just take the key and decrypt the ciphertext. So we have to have some other way of transmitting a key or at least agreeing upon a key. One way we might do this is the classic espionage method. I take a big USB key I put a bunch of keys on it I keep it under a park bench and I walk away and you come up in your fedora and your jacket and your glasses on and you look like a spy and you grab it only you're actually an attacker. You're not the person who is intended to receive the key. So now any message that I send is completely insecure, right? So if you can do a personal transfer of key it works but it's very, very high latency. We want something that's much more efficient and so we actually have two brilliant guys, Daphne Hellman great cryptographers they came up with this idea of key exchange or key agreement and basically the way that this works is it allows two parties to use a mathematical operation to agree upon a key and the key is random and nobody who is listening to it can figure out what the key is so this is one of the other bed rocks of cryptography so let's start by looking over on the left or I guess your right, my left we'll look at the paint colors so Alice and Bob both start with some yellow paint and they know that yellow paint is the same and that yellow paint doesn't have to be a secret so that's the huge advantage here they don't have to meet in secret somewhere to say we're going to use this color for the yellow paint, right? They can announce it to the world, we're using yellow paint now they generate a random number in this case the random number is going to be their secret color and they're not going to tell anybody what their secret color is they're not even going to tell each other what their secret color is so Alice is going to mix red in with yellow and she's going to get orange Bob is going to mix cyan or whatever that is and come up with blue now these two colors as the result of the operation so they send them out over the wire in plain text Alice says Bob, I'm using orange and Bob says to Alice, I'm using blue and so now they both have those colors now the assumption here is that it's very cheap and easy to mix the colors but it's very difficult and expensive to unmix the colors so because if Alice now mixes her red in with Bob's blue she will get brown and if Bob mixes Alice's orange with his cyan he will also get brown so notice that they have the same common secret when it's done but nobody who's listening into this exchange so if somebody were listening into what colors they were sharing here no one would be able to calculate brown because they don't know the private colors the point is is that on both sides Alice Alice has added both so Bob added blue in here or cyan in here to create blue and then Alice added red so it's yellow, cyan and red and then the same exact operation happened just in a different order on the other side so the resulting brown nobody else knows except for Alice and Bob even though Alice and Bob are announced in public I'm using orange and I'm using blue and so the resulting key now, the brown can be used for encryption and nobody who was listening into the conversation can use the key and decrypt the data so this is the actual mathematical operation that's actually very simple it uses a concept of finite secret groups I'm not going to explain that here I want to know more about this you can see the material at the end so basically you create the random number which is a private that's the private color that they're keeping secrets then you have the generator which is the yellow color and you raise the generator to the power of the private number that creates the public number you share the public number over the network both sides do this and once each side has the public number you simply raise the public number it's the private number and because the order of operations doesn't matter in the case we have the community the community property here the end result is that they both get K no matter which way they've done the operation no no they do not they only need to start with the same G that's the yellow and G can be public because effectively they both have G to the G raised to the print one times print two because when you raise something of the power to another power it's just going to apply to the powers that's correct yeah so yellow is agreed upon beforehand everyone in the world knows about yellow when you're doing when you're doing this a particular SIPLAC group will typically have a defined generator so it's just literally listed in a document somewhere saying the generator is one, two, three, four and that generator is then used for all cryptographic operations so the G does not need to be public in fact, or does not need to be private it's completely public the private number that you generate from random is the one you have to keep secret yes yes, yes it is you are exactly correct all of this cryptography both here and the next thing we're going to talk about which is asymmetric cryptography is based on Fermat's theorem and Euler's theorem we have a problem with Tiffy-Hummond K-Exchange so it's a really, really great little trick it's completely secure from anyone who's listening in but it's not secure from active attacks this means that someone that can actually intercept the message and then send their own message and the reason for this is that they'll just pretend to be Bob on the other side and they'll complete the exchange and then they'll pretend to be Bob to Alice and then they'll pretend to be Alice to Bob and the end result is you have someone in the middle now the communications would be encrypted from Alice to the attacker and then from the attacker to Bob but now you have a party in the middle that's listening in so the Tiffy-Hummond is a great technique but it's a building block to other things that are going to happen one of those other things that are going to happen is authenticated K-Exchange and authenticated K-Exchange is a variant of the Tiffy-Hummond and there's a lot of these here I've listed what five of them here some of these are patented by the way so the it's a variant of the Tiffy-Hummond exchange and the basic idea behind the authenticated K-Exchange is that it uses some other secret in order to prove that each side knows each other and it gets mixed into the operation and there's lots of different ways to do this generally speaking these are all called PAKES password authenticated K-Exchange because you can use a password on either side you mix it into the algorithm and then at the end of the operation down here where we get brown if you're doing authenticated K-Exchange you'll only both sides will get brown only if they both have the same password okay so in that case you actually protect against a man in the middle attack like I said some of these are patented which is why they have broadly not been used these three on the bottom the I am not aware of any patents on them and both of these are standardized in various different ways including SRPV6 standardized in some RFCs so and we're going to be using SPIC in Kerberos that's the algorithm we're going to be using in Kerberos to strengthen up some things so the nice thing about the authenticated K-Exchange is that it protects against the active attacks and another method for instance of doing this which is the way TLS works is that you can sign one of the public keys and then by verifying the signature you can actually prove that the other parties where they say they are and you can trust their public key one last thing to note about authenticated K-Exchange is that it's actually really useful for another technique which is to increase password strength so if you want to encrypt something using a password a password is very low entropy right because there's only 26 letters in the English alphabet and there's only so many combinations of those letters that you can have that form words for instance which is what everybody does for the passwords because everyone's insecure sometimes the password has been used as a password for instance if you try to encrypt something using a low entropy low entropy password like that it will be fairly easy to do a brute force attack an offline dictionary attack against the packets that are sent and recover the data but using authenticated K-Exchange you can actually use the passwords only to prove the public keys and then the public keys generate a very very strong key session key that you used for the encryption so it's helpful to actually increase password strength the last topic we're going to talk about today is asymmetric encryption because I told you we're not going to dive too deep in any of these topics there's more resources at the end asymmetric encryption is what you're actually using both symmetric and asymmetric everything we've talked about up to this point has been symmetric that the encryption was done based upon a key that both sides know in this case we're going to break that assumption this was discovered by a Brit Clifford Cox who was working for the British Secret Service or whatever it's called and so he actually discovered it but his discovery was classified they did not publish it and simultaneously in the United States it was discovered by these three guys which is where we get RSA so RSA the company and RSA the encryption the asymmetric algorithm this is built on principles from the Diffie-Hummond so the idea the basic idea with Diffie-Hummond of course is that we have a one-way function it's very easy to mix the colors together but it's very hard to un-mix them and what these guys came up with was that if there was a way to have some secret knowledge so that it would be easy to mix them and hard to un-mix them but easy to un-mix them if you knew something secret that's called a trap door function and so asymmetric encryption is built on this principle of trap door function and then you actually have two keys like in the Diffie-Hummond if you go back to the Diffie-Hummond you remember we have a public and a private two separate keys there's the private, there's the public the same exact thing in asymmetric keys so the public key can be used for encryption and the private key can be used for decryption here's a nice little chart from Wikipedia so if you look on the left hand side this is all the encryption we've been talking about up until this point and what RSA does is it actually splits this one key into two keys so that you can use the public key for doing encryption and the private key for doing decryption now this has a very marked advantage if we were just doing symmetric encryption and I wanted you all to send me an encrypted message all of you are going to send a message to one person so many to one now I would have to go to each of you and I'd have to exchange a unique key because out between you and between you and between you I would need separate communication keys in order to keep the data private with asymmetric encryption however I just put that key into two and I can say everyone in the room this is my public key and then you can all encrypt to me using that one public key and in response I can encrypt all of your messages using my private key but I've never told you my private key so only I can decrypt it and this is the bedrock foundation of TLS or SSL when you go to a website what you're doing is you're actually validating the public key of that website so we're actually doing well on time here we're almost done and I wanted to leave time for questions there's a lot of topics I have not covered I've not covered things like message integrity so in other words proving that the message did not change in transport I've not covered things like signing or verification or lots of other topics so there's a lot more to learn and cryptography is a very very big topic there are two courses which I can recommend both of these are online and free and have a substantial amount of data and I can share these slides out with people or you can just google learn cryptography and I'm sure that these will be in the top list so there's one from Khan Academy and there's one from Stanford the Stanford one assumes that you have a lot of the math so if all of the stuff that I've talked about was fairly mathematically easy for you I mean when I'm talking about when I say financial groups then go take the Stanford cryptography course it's a great place to start if you don't know what that stuff is don't feel bad start at the Khan Academy course because they can actually give you the math as well along with the cryptography so that's all I have for today are there any questions actually we were going to come back to this to see if anybody solved it but we had a smart aleck in the room and the answer was the world and the shift was 7 so that's all questions yes which is the private key the difference is that in the Diffie-Hellman you both are agreeing on the same it's an exchange and you're both agreeing on the same key at the bottom and you're using that key to do symmetric encryption you cannot reverse Diffie-Hellman that's correct in technical speak a Diffie-Hellman is a one-way function and a the asymmetric encryption is a trapdoor function because it's one way unless you know the secret information that you can open the trapdoor and reverse it there is actually a way to do that it's called Hughes Spake excuse me Hughes Hughes Diffie-Hellman there's a variant of the Diffie-Hellman algorithm that can be used to transfer a key from one party to another it's generally not used because it's not applicable in a lot of situations but Diffie-Hellman has the advantage that both sides contribute entropy which means that I contribute randomness and you contribute randomness so if an attacker were saying we're trying to do part of that exchange and we're not going to use a random number it wouldn't matter because I'm still mixing in my random number with theirs so both sides could contribute entropy and the resulting key we know then has the entropy that both sides contribute to so you're not sending a key from A to B you're actually mutually agreeing upon a key any other questions I know I didn't explain it that well what's that oh you didn't answer what's the big deal about elliptic curves and are you a secret agent oh okay I can answer those two questions okay so elliptic curves I did not put them in this talk you're right so let's go actually we're on the right side so most of this math you can see is just going to be done with integers right well the problem is that our computers are getting really fast and the size of these keys are getting larger and larger so that computers can't crack them and this only has or this has a negative benefit or a negative result because now we have to send all these larger keys over the wire and they are growing exponentially larger as computers get faster and faster so elliptic curves is another finite cyclic group one finite cyclic group is just a set of numbers underneath the prime number which is what we're using in this case but in the case of an elliptic curve you can actually use an elliptic curve on a graph and do mathematical operations on that curve it allows you to keep your key sizes very small but it's very expensive for computers to calculate so you can send much smaller key sizes and this has this has a benefit of protecting your security the answer to the other one which was MIS secret agent I could tell you but I have to tell you yes if you don't want to keep anything you have secrets yeah no you should definitely not do your own cryptography you should always rely on what has number one been proven in an academic paper which is undisputed that's provably secure and second of all don't use your own implementation of those algorithms because your own implementation will almost always contain a flaw so use the standard libraries that you're given and even then make sure you research a lot and if you're designing anything just don't design correct that's correct in fact you're probably designing a system that won't work but if you can get it to work it's probably incredibly insecure so the way you design good systems is by open peer review this is the bedrock of cryptography it's the bedrock of all scholarship in general and it wants to obviously have a much broader introduction to cryptography than the half an hour I've given you you probably want to have a degree in it as well as a degree in mathematics you will then publish what you propose as a design you will publish openly and you will accept the critique from anywhere and so you allow it to be to sit out in public you'll get critiques you may have to mitigate some of those critiques you may tweak the algorithm a little bit here or there but after substantial peer review there's usually multiple years of peer review if there's no attacks found then at that point it's considered secure and at that point people start implementing it in the insecure ways so then you have to deal with the fact that most of the implementations of these algorithms are also insecure that has to mature over time as well hopefully with public peer review never use closed source cryptography just don't do it ever because it means it hasn't had the open peer review that you need to make sure that it's secure so rely on open source code for all cryptography so thank you very much