 Is it good now? Hello, everyone. Thanks for coming. I'm Amralli. I'm a computer science PhD candidate at Northeastern working on security and privacy. And today we're going to talk about cryptography in Python. So cryptography is very ubiquitous today. We use it everywhere, from your mobile phone apps to full hard disk encryption and even your wireless connection at home. It's almost everywhere. Every respectful program language has some sort of support for encryption and decryption or cryptographic libraries. And it's even embedded inside your CPUs. What I mean by being embedded inside CPUs right now, for example, Intel and even AMD CPUs have support for AS encryption, which means by one AS, by one assembly instruction, you can just encrypt or decrypt your messages. And to be honest, it's really not hard to do crypto in the right way. But there are a lot of crypto failures in real world. For example, in case of password breaches in LinkedIn, Adobe, and Ashley Madison, they were either not solving their password or encrypting in ECB format. Wi-Fi, they have something called the VEP, which was completely broken because they were reusing their IV or initialization vector. Snapchat, they were hard coding the encryption and decryption key inside the app. And they were using ECB mod, which means it's completely broken. And even in the case of Debian Linux, they had a weak random number generation, which resulted in breaking a lot of SSH keys. So before me, there was a talk on cryptography in general. And John gave a very good introduction to whole crypto. So I'm not going to go to the classic crypto anymore. I'm just going to talk about modern crypto. So modern crypto, we have two forms, the symmetric encryption and asymmetric encryption. As you can see, in symmetric encryption, we use the same key for both encryption and decryption. And the hard part in symmetric is to transfer this key in a secure way. And if you have a way to transmit a very long secure key, then you can either use that for even sending your messages. More modern innovation in cryptography was asymmetric encryption or what we call public encryption. The first instance was in the 1970s, Diffie Hellman. And the way it works, you have two keys, a public key and a private key. A public key can be public, and you announce it to the world. And you have a private key that you have to keep it yourself. What you can do, you can encrypt the message with your public key and decrypt the private key. And if you share your public key with the world, they cannot infer what is your private key. So it's one-way function. From the private key, you can get your public key. But from the public key, you cannot infer the private key. The differences between the symmetric and asymmetric encryption is that symmetric encryption only at software level is about 1,000 times faster. And right now, when you have Harvard support, it's going to be even much faster. The other problem is you don't, for symmetric key, you need to share your secret key. But in public and private, you can just announce your public key. And it has support in CPU. And for in every real world system, like when you use SSL or SSH, you have a combination of public encryption plus symmetric encryption. You combine the both. The most famous or the currently used symmetric encryption algorithm is called AES. It's Advanced Encryption Standard, also known as Randall. It was invited by two Belgian cryptographers. And it was part of the National Institute of Standard and Technology, or NIST competition to find the next cryptographic algorithm. It had a couple of requirements. One of them was it should be fast in both software and hardware implementation. For example, something like this was really fast in hardware implementation, so it was slower in software. And the other requirement that the block size should be 28 bits, and the key sizes should be 128, 192, and 256. This algorithm was first published in 1998, and there are a couple of competition. In 2001, it was finally standardized, finalized, and published to the public. The other candidates were Mars, RC6, Serpent, and TwoFish. They are already implemented in the real world, but no one used them that much. AES is something called a cipher block, which means it operates on a block of cipher. And that blocks us 128 bits. Your data should be 128 bits, and it just works on that block of data. It just transforms that data to something G-Bridge. And later on, you can convert that G-Bridge to something meaningful. If you have a message that's longer than 128 bits, you just need to encrypt each block and combine them in a way together. So exactly like making a coffee, you have coffee beans, you have water, and milk. It depends how you're going to combine them together. At the end, you're going to get coffee. But if you just eat everything separately, it's not going to be a real coffee. There is a procedure. Some of them are better, some of them are worse, and some of them are just wrong. In terms of a module of operation for crypto, ECB is like drinking every separate piece of your coffee separately. It's just wrong. Never use it. If you see somewhere, it's wrong. The CBC and CTR, both of them are secure. CTR, or the counter mode, has more functionality. It means you can do parallel encryption and decryption. So it's going to be faster. And you don't need to go sequential. You don't need to encrypt or decrypt your messages sequentially. And to give you a graphical representation of what's the difference between ECB and CBC encryption, for example, the original version, if that's Tux, Linux, Image. If you encrypt it with using something ECB, after encryption, you're still going to see the patterns inside your data. So it means it's going to be broken. I won't be able to tell exactly what was the message from just looking at the data, but I can see the patterns inside it. But if you use something like CBC or CTR, it's going to completely get rid of the pattern that you have in data. So it's going to completely loop random. That's why you should never use ECB. The RSA, or in medicine by Rivas, Shamir, and Aeloman, is the first practical public encryption system that we have. It was published in 1977, almost 40 years ago. And it was patented until 2001. That's why, for example, something like Elgamon that was mentioned in the previous talk and something called the DSA, the digital signature algorithm based on Elgamon, were being used in many protocols. Because RSA was patented until 2001. And it's based on the hardness of factoring problem and modular arithmetic. So the definition of RSA is you have a message, M, and the decryption, D, and ciphertext C. The way it works, you just choose a random number, E, and you exponent your message to E, and then do a mod operation. You get the ciphertext. And if you do the same thing with the private key and cipher, you get M. This encryption is very simplistic. It's called textbook RSA, and it's not secure in something called the inCPA, or chosen plaintext attack. But you get two big random numbers, a P and a Q. And you get N, which is the multiplication of your P and Q, and you calculate the phi of N, this P minus 1 and Q minus 1. We're going to see an example after this. And then you calculate your encryption key, which is a co-prime to your phi of N. And it's between 1 to phi of N. And you can get the inverse, and your private key is going to be the inverse of your public key or D in mod N. And one of the property of these RSAs is that if you do encryption of the encryption of a message or encryption of a decryption of message, at the end you're going to end up with N. It's based on Euler's theorem, because E to the power of D is equal to 1, so N to the power of 1 is going to be 1 at the end. So to give you a better view of an example, for example, if we take P and Q as random numbers. In a real world, these random numbers are much bigger. So think of about 400 digits instead of one, two, three digits. In real world, these numbers are going to be more than 400 digits. But for example, imagine we have P and Q, both are prime numbers. N would be 55, 5 times 11, 55. And our phi of N would be 10 times 4 or 4D. Then we choose the encryption key or E, something that's prime and between 1, 2, phi of N, 3. So to calculate the D or the private key, you need to find the inverse of your encryption key or E inside the phi of N. When you do E times D mod of N, as we had in previous talk, you should get 1. So if E is 3, then D be 27, and then you're going to end up with 81. And 81 mod of 80 is 1. So that's why 27 is inverse of E in phi of N. And your public key is going to be the E and the number N. And your private key is going to be D and your number N. And then if your message, you can represent each message like when they were asking a number as a number. So everything is a byte. All bytes are a number. So you can just do exponential and numbers. So if your message is 2, then the encryption of your message is going to be 8, and the decryption of message 8 to the power of 27 is going to be 2 again. You get the same message. The other crypto primitive that we have is hashing function. What it does, you take along a big message, and you can generate a digest, which is much shorter. One of the reasons that we use this, for example, when you download a file from internet, you see something called the hash of the file, when you want to make sure your file is exactly the same. So if you need any modification to 1 bit in the file, you're going to get a completely different hash. This way, you can make sure the integrity of your file. And a hashing function should have three properties. One of them is pre-image, the second pre-image, and the collision. Which means if I give you a hash of message, it should be inverse. So you shouldn't be able to tell me what was the message that I just produced the hash of. The other thing is if I give you M and you have the hash of it, you shouldn't be able to find another message M prime that's not equal to M and has the same hash. And the other properties, if you shouldn't be able to choose any two messages that are not equal and have the same hash, these are the properties that we want. And right now, the recommendation is to use SHA-2 or SHA-3. So fortunately, you don't need to implement any of these cryptographic primitives or know much about them. You just need to know the basic idea behind them and what properties they provide and what they are. Right now, in Python, like two, three years ago, it wasn't that great to support. But right now, we have more than half a dozen cryptographic libraries in Python that do everything correctly. They are created by people who know how to actually implement crypto protocols in a secure way. Because just encrypting some basic modular arithmetic is easy. But making sure if it's secure against timing attacks and et cetera, it's much harder. So PyCrypto is the first oldest and mostly wide cryptographic library in the wide right now. But it's not that updated anymore. And the code is kind of ad hoc-based. So the C-based code that the Python calls is created by the person himself. M2Crypto has bindings, the SWEAK bindings, to OpenSSL code. So the C code is secure. But the API and documentation are not as great. The alternative that I really like, and it's a newer project, it's called Cryptography, is created by PyCA, or the Python Cryptography Authority. They are a group very involved and very active. The other properties as cryptography works on Python 2, Python 3, and even Python implementation of Python. So you can use it with almost everything. The other benefit is it has OpenSSL CIFI bindings. So CIFI is a newer wrapper kind of functionality that you can call your C code from your Python or other scripting languages. And the code is very readable. They have a good documentation with caution. For example, you don't use this cryptographic protocol. It's broken, or how to use cryptographic protocol in a secure way. And there are other examples like PyNCL or NSS that uses the Firefox. But cryptography, I think, is like if I had to recommend one cryptographic protocol, it would be the cryptographic protocol. To give you some examples of how to use the protocols, for example, if you want to make a hash of your messages, what you need to do, you just need to import your default backend. The default backend would be OpenSSL. And also, you need to tell from your primitives what you want to do. So all these cryptographic function actions that we talked are primitive. You have to say, I want to import the hashing function from my primitives. And then just to create a hash, you create a digest of your message. You say, from the hashes, I want to create a hash. And what kind of hashing mechanism that you want to use? We are using SHA-2, and the size is 256 bytes. And we say, as a backend, use the default backend. And at the end, you can just say your digest.update, the message that you want to produce the digest of. And you finalize, you say, OK, I'm done with the message that I want to create a digest. And you get a digest. And for example, this is the hash of Pygathom 2016 in base 64 encoding. So to do something as like AES is as simple as a few lines of code, you import OS. OS has some functionality called uRandom for generating random numbers. The random numbers that it generates are cryptographically secure. And it's not blocking. Because if you do OS.Random is a blocking function. It means it waits for the CPU to generate enough entropy. But uRandom just uses entropy inside the random to use as a seed and uses a pseudo random number generator to create a random number. So for every application, you can just use uRandom. It's completely secure. Always use OS.urandom to generate keys or your random numbers. And you can specify the size that you want in bytes. So we said AES is 128 bits. So we say 16 bytes. And you can choose your cipher. You say, I want to use this algorithm with this key. I'm using the CBC mode that we talked about. We had the ECB, CBC, and CTR. The CBC one was secure. We use the CBC with this IV. And we say, use the default back end. And then you just call your encryptor function with whatever you want to encrypt and you finalize it. And to decrypt the message, you just pass your cipher text and you call your decryptor function. For using something like RSA, again, you don't need to delve into the key generation yourself. You can just call the key generation functionality inside the library. What you do, you say, from your primitives, RSA was asymmetric or public encryption. You import RSA from the asymmetric. And you say, you want to generate a private key with public exponent that these are standards. Like this public exponent is a standard public exponent. Your public exponent can be something that you use many times. And the private key is the important one and it's private. And you determine your key size. Right now, 248 is anything above 124. It should be definitely above 124. But even 1024 is not that secure. So you have to use something around 2048, which means 2048 bits of security, which gives you a equivalent of 128-bit symmetric security. And then to get a public key, after you generate your private key, you just call the public key and you get the public key of your private key. To encrypt a message, if you remember to talk, the textbook RSA is insecure because if you have the same message, the encryption is always going to be the same. So there is something called the OAP padding, which introduces a randomness. If you have two same messages, the encryption each time is going to be different. You can still decrypt the message, but the encryption each time is going to be different. So you're not going to leak any information. That's called the OAP. You say what is your message. You create your encryptor by using your public key to encrypt the message. I want to use my public key to encrypt the message. The padding, the OAP, is to mitigate this leakage attack. And then you say what hashing function you want to use. So these are the standards. Again, anything after the padding.oap, you can just copy this code exactly. You don't need to change anything here, this standard code. And for decrypting the same message, you just use your private key, decrypt the ciphertext that you just created here, and everything is exactly the same. If you just want to encrypt, for example, your files on your computer or something like that, the cryptography has some functionality or some recipe called Vernet. It does the AES in CVC mode, 128-bit key size. And it also produces a Mac to make sure your files haven't been modified. So you can just use that. You don't need to implement anything. Just one line of code. And you just have to, for example, one of the ways that you can create or remember this random key is that you have a simple password. And then by some hashing function, you expand your simple password to a key size, and then use as your key. So you can just pass it your key. You can either randomly generate it or pass it your own key that you are using. And then you encrypt whatever you want. You get a token. And that token is the encryption of your message plus hashing or the authentication code. And then you can just simply decrypt whatever you had that token exactly. So a few takeaways. And never invent your own cryptographic protocol or algorithm because there are people who are cryptographers and experts and have already produced these things. Everyone is using them. If they are good enough for top secret documents, they are good enough for us. Don't implement your own crypto library, even though all this arithmetic or mathematics is very simple, you need to take care of side channel attacks. For example, timing. One of the cases in terms of Cesar crypto library, they were using the normal Python comparison. If you do an equal between two strings, the first time that the strings are not equal is going to just quit comparing. But you always for crypto, you need to do a constant time comparison of the strings. Otherwise, you can decrypt a message without knowing the key by using some attacks. That's why you have to take care of all these subtle things that you won't know of unless you're implement your own cryptographic libraries. So never implement your own crypto library. To be honest, doing it crypto is not hard. It's very easy. As we saw, if you just follow this footstep, you would have a secure cryptographic protocol. However, most of the documentation that you find online are either outdated or most of the time they're wrong. For example, they use as a secret key. For example, they use password as an example because they cannot just put some random number there. You just use very simplistic. And if you just copy paste that code, you're producing a wrong code. Or they sometimes suggest using ECB. And right now we know that ECB is completely broken. So don't trust all the documentation that you find online. If there is something called SSL, most of you are familiar with it. For example, whenever you visit a website that has HTTPS, it means it's running SSL. So if you have data in transit, mean real-time communication between a network, you can just, whatever program that you have, wrap it inside SSL, inside Python, you have the support. And just wrap everything inside SSL and send it. You don't need to think about cryptographic yourself. And if you want to store some data on your computer, for example, hard disk encryption, encrypting your files, this kind of stuff, there is another thing called PGP. Again, it's a secure protocol. We can just use that. You don't need to worry or think about implementing your own cryptographic libraries. That's it. Thank you and are there any questions?