 Does that work better? Is it good enough? Alright, so I'm gonna talk about exploring cryptography. I like to put XKCD comics in my slides, so I'll give you a minute. I heard a story once that, I can't attribute the person because I forget who it is, but they said they like, they like strong cryptography because they know what parts of the system not to attack. But it's an interesting thing to study. I am not a cryptography expert. I've taken a class and I know a couple of things and I thought I'd share it with you because that helps me learn about it. I'm a software engineer at Yodel. If you want to know a little bit about what my team does, come talk to me or hit me up on Twitter. I'm a perpetual grad student, so I'm always taking a class. I'm on Twitter at Johnny Downs, so I want to know about you guys. Are there any people who know more about cryptography than I do? Have any of you had any exposure to abstract algebra? Do you know what groups are? Have you read a paper on cryptography? Anybody used encrypted service? Yeah, probably. WhatsApp, an iPhone. Does everyone know what public and private key encryption is? Does anybody not? So I'm going to actually show an implementation of the Algumal encryption scheme, but I need to tell you don't roll your own cryptography. You'll get it wrong. There's an implementation of Algumal. It's 300 bytes of code because there's all these edge cases you have to deal with. There are timing attacks, a bunch of things that I don't know about. So if you use the encryption scheme that I'm going to show you today and you actually want to secure a message with it, that's a really bad idea and I'm not responsible if you do. So every encryption scheme consists of three algorithms. Every encryption scheme that I know about consists of three algorithms. There's a generate keys, an encrypt and a decrypt. It's pretty simple. The generate keys sometimes takes a security parameter that says this is how big I want my keys. So you might have heard of in the comic it's at 128 bits of RSA encryption and that's the large security parameter. The idea is that if you increase the size of the security parameter, it takes longer to maybe process, but it makes breaking it even harder. The generate keys algorithm spits out one or two keys, the encryption key and the decryption key. Sometimes they can be the same, sometimes they can be different. So when we talk about public and private key encryption, that's two keys. One you can share freely on the internet and the other you have to keep on a sticky on your monitor. I'm kidding. The encrypt takes that encryption key. So this is usually your public key and a message that you want to encrypt. So I have some secret that says I'm talking about cryptography at the UN today. And I'll use my encryption key and run this algorithm. I'll get a cipher and that's some scrambled message that hopefully you won't be able to crack. But particularly in the time that that message is relevant. So maybe with the scheme I have here, if I were to encrypt that, you wouldn't know until I'm gone. And decrypt takes your decryption key, which is your private key that you have on your sticky. And the cipher that the encrypt produced. And then you get the same message back. So it's a two-way function. So classic cryptography is a variation on this dial here. So you can turn this dial to get different letters to show up and essentially what you're doing is you're rotating the alphabet. And so this is good for hiding Game of Thrones spoilers. You've used Roth 13 to not tell about who died in the next season. So this is the same sort of algorithm. Classically this is known as the Caesar cipher. To generate the key we're just going to make the, we're going to rotate the alphabet. And return a cycle that we can use to match up with the message that we've got. Encryption is just looking up the letter in your message into the cipher and putting that together. And decryption is just the reverse. So this is symmetric key. Your encryption key and your decryption key are the same. And there's a number of weaknesses here. It's susceptible to frequency analysis. So like in English the most common letter is E. So if you find what the most common letter in the message is, you probably can figure out how much that was rotated by just by counting the number of letters. And then you've broken the cipher. You know what the secret key is. It's a really small key space. So even if you fail to do frequency analysis, you could just try it all the different keys and see if you got a sensible message. And it doesn't scramble the message. It just rotates it, rotates it neatly. So I'm going to talk about perfect secrecy really briefly. This is also something we don't use. There's an idea of a one-time pad. And with a one-time pad, if you implement it correctly, you actually produce a ciphertext that's indistinguishable from any other message. So the thing that I've encrypted, there's an equal probability that it could be any other message. So you can't actually tell what it is. You would just be guessing. The problem is it's really expensive to do this. You have to generate a real random number. I don't have an implementation for this because I don't know how to generate real random numbers. But if you have a key from a real random number generator, it has to be the same length as your message. And all you do is X or the numbers. And that gives you a ciphertext. One of the things that makes this difficult is the length of the key has to be the same length as the message. You can't generate a shorter key. So you have to generate a long random string. You need to never use it again. Otherwise it's not secure anymore. Decryption is just the opposite. But like I said, it requires true randomness. You can only use it once. Otherwise you might leak some information. The keys have to be really long if you want long messages. And it's only as strong as the key sharing mechanism. If I were to read off my key to you guys, anybody who was listening in here would have it. That's not very secure. So I need to find a way to get that to you in a way that's safe. And even if the actual encryption mechanism is really strong, it's no better than if someone can use it. So let's talk about modern cryptography. This is a statue that's outside of the NSA, I think. It's some cipher that an artist built. I don't know what all they are. Three of them have been decrypted. The fourth message is still a bit of a mystery. So this is the thing that I wanted to talk about is computational security. Are there any questions about classic ciphers or one-time pads before I go on? So with computational security, this is the way we do modern cryptography. We tie solving a hard problem to breaking the encryption. So when we do a proof of a security scheme to say whether it's secure or not, we say that if you can break this encryption scheme, you can also solve this really hard problem. And because we know that there's no efficient way to solve this really hard problem, it's a contradiction to say that you can efficiently break this encryption scheme. And the probability of guessing the solution is negligible. Negligible is a technical term. It means that as the key space gets larger, the probability of an adversary guessing the correct solution to your decryption scheme decays faster than any polynomial grows. So that means that as the key space gets larger, it gets harder. So if we want to improve the security of our encryption scheme, we just need to double the key size. And then it takes you maybe factorially as long as before to break it. Symmetric key cryptography is still used. And the reason for it is it's faster than asymmetric algorithms. So it's preferable to use. Because if you're sending encrypted messages back and forth, like with WhatsApp, if you're texting someone, you're going to be sending a lot of messages. And if there's a lot of overhead, then you want something that's faster. You're willing to say that we can use something that's maybe less secure because we have to find a way to share that key. But we still have to figure out a way to securely share them. Asymmetric key cryptography uses a public key and a private key, the public key encrypts a message. So I'll share my public key with you, and you'll use that to send me an encrypted message. And I have my secret key. None of you can see it. So I'm the only one who's able to decrypt it. This idea came about, I don't know when, but it was the Diffie-Hulman key exchange protocol that they figured out that they can use this as a secure way to share keys. So the public key doesn't actually give you any information about my private key, so you can't use it to break the security scheme. One idea in modern cryptography is that we can't keep the algorithm secure, so it's okay for you guys to know what algorithm I'm using to encrypt my messages with. Because you could find out. You could overhear me talking about it, you could look on my computer, you could guess. And if my encryption isn't secure anymore, once you find out what my algorithm is, then it's not a very good algorithm. So we try to reduce the number of things that we're trying to keep secure. So this public key exchange can then be used for an initial message, an initial transmission of the symmetric key, so we can use another encryption scheme that's faster. And I have now a secure channel to share that initial key with. So I want to talk about the Elgamol encryption scheme. This is an encryption scheme based on that Diffie-Hulman key exchange. It's going to use private and a public key. And the implementation is surprisingly easy in this not secure way that I've made it, so we can understand it. You may not remember logarithms from math class a long time ago. I know I had to look it up. So a log is the inverse of exponentiation. So if I have some number to the exponent and that equals some other number, the log is just the inverse of that. So if I say 64, that's x, b is 2, and y is 6, then the log base 2 of 64 is 6. So this gives me the exponent. Any questions? So there's a hard problem. It's the discrete log problem. So given some b mod q, which I'll talk about in a second, and if you have another number that has the structure b of n mod q, you find n, it turns out that's kind of a hard problem. There was a recent improvement on the algorithms we have to solve this problem, I think. This was released in a paper this year. It's still difficult enough that we can use this as a basis for some encryption schemes. So mod, what that means is we're taking the module of the result. So if we say something mod 12, we end up getting the numbers 1 to 12. So this is like doing clock math. And if you get 24, once you do that mod, it's 12. So the module operator is the remainder after you do division, like you learned in elementary school. What's up? Yes, maybe. That may be true. But what makes it a hard problem is we think it's a hard problem. This means that nobody has found an algorithm to find it quickly. So you would have to do an exhaustive search to find it, and that's the best we know how to. I don't know. That's an open question. I have a number of open questions through this because I'm not an expert. So the discrete log problem, it is in NP. So this is a complexity class. Does everyone know what P and NP mean? So P means that we have a class of problems that we can solve in a certain amount of time. So things that are NP are solvable in polynomial time, which means that the time that it takes is described by some polynomial in the input size. So if my input is, say, 12 bits, and I'm doing matrix multiplication, that might be n squared. So it would be 12 times 12, which I think is 144. There's a larger group of complexity classes that contains P called NP. NP means non-deterministic polynomial time. So the idea is that a computer that's non-deterministic could solve this in polynomial time. I don't know what a non-deterministic computer is. We don't have any. And the hardest problems there are we think that we can't solve them in polynomial time. It's an open question as to whether that's actually true, but a lot of cryptography depends on it, so I really hope it is. So things that are NP hard, and discrete log is not quite NP hard, but for the purposes of this, we're going to say it kind of is. So it's hard to compute. It requires an exhaustive search. But the interesting thing about it is it's easy to verify. There's always a witness that you can use to check that the answer that you got is correct. So once I found this exponent, this N, all I have to do is if I know B, and I can type in B to the N in my calculator, and I can see really quickly that it's the right answer. So this is important that you can verify these things quickly. That's what makes encryption schemes plausible when you still put them behind this hard problem. If you use some other problem that's outside of this space where it's really hard to check the answer, your algorithm is going to be unusable. Sometimes you'll see this in the notes. You'll see the triple equals. That means congruent, I think, is the right term. Because we're doing modular arithmetic, the notion of equality isn't quite right, so we used a third bar to say that it's the same digit on the clock. So there's another hard problem called the Decisional Diffie-Hellman assumption. So I'm saying that I have three integers, A, B, and C. The Z just means the set of integers. G is a group. I'll talk about that in a second. If I have triples of this form, they're computationally indistinguishable. What that means is you can find out the difference between them if you do an exhaustive search. The problem is that exhaustive search is going to take you longer than you want to spend on it. If the key space is big enough, the universe might not exist anymore by the time your computation is done. So it's the last item in each of the two triples that's interesting. So the GAB, those are the same exponents on the first one. That makes some new number, and you will see, once this is computed, as three numbers. It's going to take you a really long time to find out whether that last piece is made up of the same exponents as the first two, or whether it's something different. So you have to solve the discrete log problem twice to be able to figure this out. So because discrete log is hard, this is hard. So you have to solve a hard problem twice. Are there any questions about this? Is this clear-ish? So I said these are computationally indistinguishable. So let's talk about key generation. So for the algomal encryption scheme, this is the key generation algorithm written out in math. So we have a cyclic group of prime-order Q. I'm going to talk about that in a second. And in that group, so for the purposes of this demonstration, this group is going to be the numbers from 1 to 0 to 10, I think. G is a generator. I'll talk about what a generator is in a minute. So essentially I pick a random integer between 0 and Q. And I exponentiate that generator. And that's my public key. So the actual group that I'm using, the order, that's the size of that group, that's the size of that set, that's public, the generator is public, and my public key is public. You see, contained in the public key is my private key, that's x. So because the discrete log problem is hard, I can share that with you. And you won't be able to find it. But I still actually, I remember what my secret key is. Do you have a question? So groups. A group is a set with an operation. The idea is that it's some abstract operation because groups can be things that aren't numbers. In our case, the operation might be addition or multiplication. They're associative, they have an identity element, an inverse element, and the order is just the size of it. So the identity element is the other element in the group that if you multiply A by the identity element, you get back A. And A has an inverse. So if you multiply A by the inverse of A, you get 1. So this is, you can think about fractions, that if you multiply 1 third by 3, 1 third is the inverse of 3, you get 1. And the group is of prime order Q. So we're saying that the size of it, it has a prime number of elements. And I think that has some special properties that makes generators easy to find. So it's a cyclic group. So this means that I can efficiently describe the set. If I take some element in the group that's in fact a generator, I can just keep exponentiating that and I get elements of that set. So if I have the set from 0 to 10, that's the integers model 11 and my generator is 2, it will produce all of the items in that set. And then once you've come back to that first element, it repeats. So it's cyclic. And an interesting thing here is the different generators still sort of scramble the numbers. If you have really large prime numbers, the order of the group appears random. So are there any questions about cyclic groups? It's really just the definition that you need to know. You can go study abstract algebra and learn proofs about it, but when you're reading a cryptography paper, you're going to see this. And for a long time I struggled with it, like what is this? I can say it when I write a proof, but I don't know what I'm doing and I can't implement it until I understand it. So this definition is everything I needed to know to be able to actually do the implementation. So this is the key generation algorithm written out in Python as a transcription of the math. When I originally learned about Algamol, I read it from a paper in a textbook. When I went back to do these slides, I was curious whether Wikipedia had a good explanation of it and the math is pretty clear and that's where I stole it from in my previous slide. So you can go on to Wikipedia and look at these. I don't recommend it as an academic source, but if you're curious and just want to play around, it's easy. So I make the group. It's just the range. It has some order. I make a generator. I pick a random number and then I do the exponentiation modq. And I return the public key as a dictionary and the secret key as a... I return the public key as a tuple and the secret key and then I'll go and publish the secret... not the secret key, the public key on my website. So this is what it looks like if I actually run it and don't tell anyone what my secret key is. We can black this out, right? So now I want to encrypt a message. I have the key. Well, you want to encrypt a message. You want to tell me a secret. So you have my public key and some message that you want to encrypt. Your message has to be a member of this group. So you can only send me the numbers from 0 to 11. I picked that because it fit on the slides when I showed you the cycle. But if your group has 256 characters, you could encrypt ASCII messages and send that to me. You're just encrypting each letter at one at a time. And then mapping that onto the group. So again, we're going to pick a random value and make what's called an ephemeral key and use that to start to make the encrypted message. So the cipher is in two parts. I take the generator and I exponentiate it to this random value I've picked, which again is... cracking it is a discrete-law problem. And then I'll use your public key and that same value and exponentiate that again to make a shared secret and multiply that shared secret by the message to make the second part of the cipher. So the message is hidden here. The key that we've used and ultimately my private key is hidden here as well. And what ends up happening is behind this there's a discrete Diffie-Hellman Decisional triple. So I end up getting a cipher that's a tuple and it has that structure there. So this is that written out as the math. And I think I have one that's actually readable. So the public key, we know what G is from the order, so I've left that out. I unpack the private key. I just pick an ephemeral key because I know it's a random integer. I generate the cipher, the POW function in Python in the standard library actually does modular exponentiation for us. So we can use that. It takes the base, the exponent, and the mod and gives you the answer. And it's a little neater than writing it out as symbols. The second cipher is the message times the shared secret, mod, whatever, and we return that tuple. So my message that I don't want anyone to know about but me is nine. And so I encrypt it and I get a cipher. The way I've written this slide is a little bit of a lie. The cipher would actually be 8,2 and I just showed you what the intermediary values are. So I can make a second cipher and the encryption is different. So cipher one has 8,2 is the encryption and in cipher two it's 2,2. So the idea is I can encrypt a message multiple times and every time it has a different cipher text. So if you figure out what the decryption for some particular message is, my scheme is still secure, which is not true with the Caesar cipher that I talked about earlier. Yeah, so if you look here, you see c1 is 8 and c2 is 2, and I've just encrypted my message. Now I've encrypted the same message just again and c1 is 2 and c2 is 2. So it has a different cipher text. So every time you see a cipher text, so if you've seen a particular cipher text and somehow managed to guess what it is correctly, that doesn't help you the next time you see that same message. That is exactly why. Yeah, so that's really important because that means your scheme is still secure. So in modern cryptography, that random number generation is really important. I'm not really going to talk about this, but you've heard the term pseudo-random number generators. There's an idea of cryptographically secure pseudo-random number generators. A random number generator uses a smaller space than all of the numbers that a real random process would use. And if it's cryptographically secure, you can't tell the difference between the two in any sort of efficient way. So now I want to find out the message that you sent me. I have a cipher text and a secret key. I compute s. It turns out if you want to do the algebra, you'll see that the first piece of the tuple to the x will give me the shared secret. And then if I take the inverse of that shared secret and multiply that by the second part of that cipher, it gives me back the message. I was working on these slides late at night and I didn't know how to do the inverse of a group, so I looked it up on the Internet. And somebody said that if you use this on line six, if you take the order minus two, that somehow does the inverse. I don't know. It worked. But the idea of when you're doing modular arithmetic, when we think about fractions, inverse is taking one over the thing. When you're doing group arithmetic, the inverse is the other element that if you multiply them together and then take the mod, it's one. So I think in this group, ten is its own inverse. So if you multiply ten by ten, mod eleven, you get one. So this is what it looks like written out nicely. I'm going to use the secret key together with the public key. I'm just going to extract the order from the public key because I need to know what my group size is. I unpack the cipher. I compute the shared secret by using my secret key and the first part of the cipher. I compute the inverse shared secret and then multiply that by the second part. And mod the order and that gives me back the original message. So if I decrypt the messages that I did before, you can see that they both come out the same. So this is actually the message that I had encrypted in both cases. If you don't believe me, I have it open in a terminal and I can show you. Can you guys see that? Okay. Legible. So I'm going to say that my message is seven. And so I get a, if I show you c2, oh, c1. So I've run this a couple of times and you see the output is different, but it's the same input. Let me bring that up a little higher. So the original message was seven and I got seven back. So this has a weakness, the algomal encryption scheme. It's weak against the chosen cipher text attack. So the thing that it's secure against is somebody snooping your network traffic. But if you're adversary, so we have an idea of this adversary who is trying to break your encryption scheme and find out what your messages are. If the adversary can get a decryption for a known cipher text, so let's say they send you an encrypted message using your public key and you leave your computer unlocked at lunchtime. This is known as a lunchtime attack. And they decrypt that message. They can find out what that encryption-decryption pair is and if they do that a couple of times, they're able to use that to maybe decrypt a different cipher text or to find your private key. I didn't actually try this. I don't have code to show it to you. But the idea is that the algomal ciphers have some structure that I think I'm going to say this correctly. But if you have two messages that are encrypted, the encryption of M1 and M2, if you multiply those, that's the same as the encryption of M1 times M2. And because of that, you can subtract one of the messages to get the other message. So that's a weakness here. There are other encryption schemes that are stronger. They usually mean they're more computationally intensive. But you'll choose the encryption scheme based on your processing needs and the kind of adversary that you want to protect against. And that is everything I have. Are there any questions? Is it by session? So if I wanted to... Let's say I'm using an SSL certificate to actually connect to a website. Maybe that's not right. Let's say I'm using encrypted email. And so I'll use using a very large prime number. I wouldn't use 11 because we could in five minutes compute all of the possibilities, but I might use something with a thousand bits. I'll get the public key and I'll publish that on my website. And you can then use that to send me a secret message. And because I have my private key hidden away somewhere where nobody else can find it, only I can use that to decrypt the message. So if we're talking about websites talking to each other... So usually if I'm generating SSH keys to go log into different machines, I'll restrict the write access. So you have to log in with my user account. And if you're not using my user account, you can't read the file. So it's with UNIX file permissions or something. Or if it's really secret, I may keep it on a flash drive that I only use on a computer that I never connect to the internet. Or I might write it down on a sticky note and put it on my monitor. Yeah, yes. Yeah, you reuse them. Maybe it makes sense to rotate them every now and then in case you left your computer unlocked at lunchtime and somebody wrote down your key. But the public key will be published somewhere. If I'm using it internally on my system, I may have a key store that lets you look up the public key for a service. But each service knows its own private key. And that's stored in a place where only that service should be able to read. Hi, my name is Edward. I was just wondering about... You said you took a graduate. You're doing your master's degree in this. Are you presenting exactly what you learned or do you do some research into it? No. So I took a cryptography class at the SUNY Graduate Center. It was really good. I highly recommend it. Nelly Fazio teaches it. It was all reading proofs for it. And we never actually implemented any of the encryption schemes. So I was curious, could I actually pull this off? And I wanted to see what it looked like in code. Because that process of writing the code is actually a constructive proof. And if I write out a proof, then I have to get all of you to check it and find out whether it's right or not. Or I can run it on the computer and see that it actually works. So this is something just to scratch my own curiosity. So this is a quick follow-up question then. When you demonstrated you were running the code, was that the demonstration of the El Gamal algorithm? Yes. Do you research how to implement it in code or in Python? Or do you already know how to do that? Because you write all those proofs for your course. I read the mathematical description of the algorithm. And used that to implement it. The mathematical description was straightforward enough that it was do this computation, assign it to this, do this computation, assign it to this. And to talk about it, I need to figure out what X and Y and G and S, what all those things are so I can give them good names. So you already had a background in programming in Python and did that make it easier? Or do you have to learn Python to be able to code this? I could have done this in any programming language that I was familiar with. It doesn't use any advanced features of Python. It's mostly multiplication and assignment, which are some of the first things you learn. So you can do this without a rich background. So if you can do some, if you can multiply numbers together with Python and write maybe a for loop, then you can probably do this. All right, thank you. Yeah. It depends on who your adversary is. So another apocryphal story I heard is that the NSA has cracked one of the groups for a lot of crypto systems. It turns out that a lot of crypto systems, they're based on picking some prime number and they all use that same prime number. And so they spent two years taking all their computational power and figuring out what the group was. If you're not worried about the NSA and you're worried about me snooping on you, 2048 is probably a good key size. It depends on the computational power of your adversary. But I don't know what the current recommendation is, but I think that would probably keep your things, keep your keys secure, especially if you're rotating them every six months. You could actually, you could compute that and maybe if you know how long roughly an operation takes on a GPU, you could figure out what the polynomial is and figure out how often you should rotate your keys. Do you know, like, do you specify that when you're actually, when you're running the algorithm, but the default, I think it's 2048. Any other questions? Thank you.