 So my name is Smolsovche, I work for the relcrypto team, and talk is why you shouldn't write your cryptographic algorithms yourself. Let's get to it. So everyone tells you that, right? You shouldn't write your own crypto. But usually don't tell you why. And I could tell you. So instead, this morning we're going to, I think, a little as excise how we could write the RSA function ourselves. I choose RSA because I recently had to refresh my memory only due to a nice little CD that came in. Let's get started. So this is fundamentally RSA. It's a pretty straightforward function. This is the encryption function. So C is your Cypher text. And all you need to do is take your message, expose it, shade it with the E exponent, which is called the public exponent, and then do modules. This is also, by the way, the way you do signatures at the same time. And this is how you decrypt the message. You take the Cypher text, you generate it, expose it with the private exponent, modules again, and you get back your message. Right? And I show you, there are no tricks. That's really what is the base of RSA. It's just that simple function. Or is it? So we need to look a little bit at those user's details people talk about a little bit, you know, when trying to implement crypto stuff. So first of all, we need to figure out if the function we saw is enough. And it isn't. Like, there are mathematical attacks on that kind of function. And we need to be aware of them to be able to actually build something that is secure. And very basic things are attacks like common modules. So when you want to create a private key in RSA, you create two big numbers called p and q, then multiply it together, you get modules, and then you derive your public and private exponent and stuff like that. In the end, what you do, you create two keys, a public and a private key. And because, at least back in the time, it was kind of time consuming to build a private key, some people thought, well, maybe we share some of the parameters we can speed up. Key creation will have someone to distribute the private keys to people and, you know, that will be faster. But that was broken. So never we use parameters. Always generate new fresh keys. Another thing is that people want to be able to decrypt fast. Because oftentimes, especially if you store stuff on this, you write once. So encryption is not that big of a deal, but you want to read many times. So if the encryption is fast, it's fast. And you can do that by having a small private exponent. Because then you don't need to exponentiate it often to get your message. Except that if you use a small exponent, you break the cryptosystem. So you have to have a larger private exponent to do that. And again, when you create your private keys, you have to be a little careful. Conversely, a small public store. If you want to have fast encryption, then maybe I have a small phone there so that it would be fast to encrypt stuff to me. Again, if it's too small, like three, which was used for quite a while, like literally the number three, then it's kind of broken. But if you use at least two to the 16, it's fine. And that's what's commonly actually used. However, that's not enough. Because RSA has a fixed size of the message it can encrypt, and that corresponds to the length of the key, you also have to do some padding. And better if you randomize, because if you don't, you break RSA again. So there are a bunch of details. And you can search for this paper for 20 years of attacks on the RSA cryptosystem. And you'll get all the details you want to know and all the pointers to other papers to go in deep in the mathematics if you want to. So if you follow all of these things, we can do it, right? I'm going to write it. Wait a second. First of all, the equation is very simple, but I didn't say that it uses really, really, really big numbers. So you cannot really use your floating point units in a CPU, for example, to deal with it, because you need perfect precision, and floating point units are imprecise. And so you have to write or find an arbitrary precision library to deal with these huge numbers. And we are talking numbers allow 1,000 bits, 2,000 bits, 4,000 bits, or 8,000 bits or more. So they're really, really big, several, several hundred bytes for each single number. You also need a prime number generator, because you need to generate keys if you want to do any work. So, and that has to be a good one, has to have good tests for primality to make sure that it's actually a prime number, because if you don't use prime numbers in it, say, it's perfect. And you also need a cryptographic secure set of random generator normally, because we use more random during the encryption of the encryption. So, assuming we will do all that, and we cheat a little bit, because we choose GMP, a new multi-precision arithmetic library, I choose this one, because the library is using NETL, which is a project I was working for to fix the CDE. Then we cheat a little bit, because the math is handled there, and this is how you do the crypt. We better say, pretty simple, right? Just a single function call, done. Right? Good idea. But this is a bit slow, and your users might get a little bit of patience if you adjust this one. So, you go ahead and say, okay, let's try to make it a little bit faster, so at least people can actually use it, and use a little bit of math. I'm not going to explain what it is, but it has to do with trying to remind the theorem and other thing to make the private exponent small again, but in a non-probably secure way. Not probably in the sense that nobody has proven it's broken, but there's also no proof that it's not broken, but in 30 years nobody found a problem with this approach, and so we are fine. And so, now our simple function is just a little bit longer, 10 times as long, right? It's okay, right? It's still not a big deal. With this, we still have the crypt or sign, a message, so I think we're fine, right? Okay, maybe you're fine, but maybe not. I think we should go back to the paper I mentioned and read a little bit more, because we read about the math, but we didn't read what happens when you actually implement stuff the way I showed. So, there are a few things you need to consider when writing something like this. Math can be used not only to break the math system, but also to analyze the implementation of the implementation as points, and then use those points to recover keys. So, there are at least three famous attacks that I'm just going to mention very briefly. One is timing attacks, meaning that if you just run the function on math, depending on what the key is, you will have different things happening inside the CPU, and an attacker can actually time how you do things and recover your key. So, Rivest, which is one of the guys behind the ACRIN RSA, immediately found a way to defeat this attack, and using blinding and will fail later with blinding means. So, that can be taken care of. Another problem is random faults. So, believe it or not, computers sometimes have bugs in their CPUs. Mathematic libraries have bugs, and people make mistakes in general. The problem is with RSA, if you have a mistake in the math and you get the wrong signature, for example, in case you're making a signature rather than decrypting, and you then send out this signature broken, then you broke an RSA. With enough bad signature, you can recover the practice again. So, you have to be careful, but that's easy. You have to check that the signature is right, and you're done. And finally, one of the most famous Blakenbacher, I hope I said it right, attacks on PCS-1. PCS-1 is basically one of the standard ways of doing padding. As I said before, RSA can only sign or encrypt messages to a fixed size, but you usually want to send random stuff. So what you do, you just add some padding to your message, so you can reach the size of the key. So even if your message is small, or if your message is large, you can put in pieces, but at some point you will have to put some padding to complete the size of the key. Depending on how you do that, and how you handle errors in case of encryption, you may get into trouble, and Blakenbacher finds ways to basically recover enough information on the state-of-the-art machine to recover the keys again. So you really have to be careful about these things. And our function, which now is in here, basically, that applies a code, is not enough. So you have to do a little bit more stuff. So we cheat a little bit again. We say that we have just one function that will generate a very good random to do the blinding. And blinding is a way to avoid timing attacks. How timing attacks work. If you have a server like TLM, it means you can keep sending stuff to the server, because the server is there to serve requests by clients. And that means you have an oracle. It means that the server will keep trying to decrypt clients and stuff to them. And if you can repeat the operation over and over, you can basically time what happens into the machine, statistically, not a single person. And because you can send always the same message, you can time precisely always the same decryption if you don't do anything, or the same signature verification, depending on the protocol. So in order to have always a different computation, such that the state cannot be correlated with statistical methods, you basically multiply stuff with a random number, and then you do your computation, and then you multiply by the inverse. And so a bunch of math, a bunch of math, and yourself. And now our code is twice as big as before. But still, yeah, okay. Then we have to also check our signatures for random faults and stuff like that. So more or less as before, we just do some checking. And luckily for us, because the public exponent is small, we can just do an exponentiation without needing any tricks to make it fast. And we just check that our signature is okay. And so we've done that as well. We've called it two extractions more, which is a cheat, because underneath there is the mathematical library that's a bunch of stuff. But let's say it's just plus two. And then from Bleckenbacher, one of the defense that is using TLS is pretty simple. If you get an error in the decryption, because TLS uses this, let's say, masterly just to share session keys, we just pretend that the decryption was actually successful, we generate a random session key, and we keep using it, and the attacker doesn't know whether this is the right key or not. And so it cannot gain a lot of information about the state. We've done it? No. You're not the best one. So now comes the fun stuff that you had to deal with. Something that, and as also mentioned, side-channel attacks. If you've seen recent CPU issues with speculation, you don't have to go all the way to speculation to the side-channel attacks. With modern CPUs, caches are actually shared within a core, and you can run nice tricks like forcing the CPU to dump caches and then inspect whether a cache line was actually loaded by trying to load it and see how much time it takes. Basically, you can measure what another process is doing if you really know what to look for, and that measuring when it comes to our say is bad, because basically just like the timing attack, another attack like that. So we kind of go back to the drawing board a little bit, and a few researchers, late last year, came out with this paper, and the nine lives of Black and Black with Pat, which showed that basically most TLS implementations were affected or could be broken locally on the local machine. You're going to do that over the network. However, if you use maybe even virtualization, but if you use containers, or if you have root processes running a TLS server but you also have users on the same system, you are in the local case. And so you can attack other processes with these methods. So how do we go and try to defeat cache and time attacks? The problem with these attacks is that your CPU is doing work that can be measured by someone else. So we need to make the CPU do work so that every time you measure it, it's always exactly the same regardless of what you're doing, meaning regardless of the inputs that the attacker is sending you. So the attack is trying to send, you know, messages to a TLS server in another process and trying to measure what happens on the CPU at the same time. So, well, we need to go all the way down to the Mathematic Library, first of all, to solve these issues, because if the Mathematic Library takes one millisecond to do one exponentiation and 10 milliseconds to do another, depending on what are the inputs, that's wrong, because, as I said, they have to take the same time. Luckily for us, in the sense of me, when I was writing fixes, the GMP library already has a bunch of functions that are lower level and not the more abstract interface. They're called underbar second underbar, which hopefully means that they are saved from the point of view of always taking the same time with the condition that the input is the same size. But for our size, it's not a problem because we use just always the same key in the TLS server, so always the same key to the decryption. So the size of the key and the size of the exponent is always the same, so every time the attacker tries to send us something, we use always the same size. So with that, we should be kind of fine. So I'm not going through the details of that. But what happened is that we went from one function to compute a signature or a decryption to a function of about 10 lines, as we saw in the example, to eight function for a total of 100 lines. So that's kind of tenfold again. And to change the padding function, which was also one of the easiest to break side-channel-resistant functions, so about 20 lines to two functions for the lines. So by the end of their work, which lasted, it'll be more than a month, whenever we're almost 40 minutes with the up-sewn container, that's part of this CD. And I just want to show you a small example of what we had to do, because this was actually really hard, I have to say. Really hard, not because the code is necessarily any harder than any other code, but because you have to really put yourself in a mental state where you consider every possibility. When I say that you cannot run stuff that has two different times, it means you cannot use conditions. You cannot say if the key is larger than this or if the key has a one, and then do something or do something else, because you have taken two branches in the code, and that is measurable. The attacker could force the code out of the memory, just one of the two branches, and then measure which branch you took in the CPU, so you will know if the first bit was 1 or 0, if the second was 1 or 0, blah, blah, blah, blah, and then you can probably keep. This is simplified. It doesn't work like that, but that's what happens. So you have to basically break your mind and find ways to do conditionals without doing conditionals. And this is kind of deceiving, and I'm going to show you why. Men copy is conditional here. You have to realize this condition and then try to fix. So this is just what happens at the end of the PKCS1 padding or depadding function in this case when you want to remove the pad and then find a message. As I said, because I would say you use this fixed size key, you have to use some padding, and then the rest is the message, okay? And the way the PKCS1 did that is really, really bad. Don't use PKCS1 anymore. There's other stuff. But it basically has this little header. Then there's a bunch of padding that you don't know how long it is. Then there's a terminator, let's say. This is your byte, and then you have your message. Now, if you just go normally like we did before and you say, bunch of functions to find with the terminator is all those functions are half timing attacks. So you have to resolve that. But assume you resolve that. Then you're left with, oh, I need to copy the message in the buffer to return to the application, right? But if you do that, I'm measurable. Because if you cause the decrypted message before the padding is done, to be forced out of memory or out of the cache, then this mem copy will just load the last part of the buffer in memory. And then again, the attacker can find out the length and from the length can find a bunch of information. So we have to copy that buffer into the destination buffer always with the same amount of operation regardless of the length of the message. The message could belong, you know, the whole thing or just be one byte. And so we end up with this thing, which I don't know how to describe, but it's really not nice. Notice that I'm cheating. I'm using also things called con men copy, which are special men copy functions that will, depending on the first parameter, copy to the destination buffer. What's in the destination buffer or what's in the source buffer? Again, in a way that an attacker cannot check from where you're copying it. So that's already cheating here. But again, in the end, what happens if you have a logarithmic time function where you end up copying the whole buffer, one byte, two bytes, four bytes, 16 bytes, and so on until you copy it all. And hopefully, if you got it right, it always takes the same time regardless of the length of the buffer. And so, yeah. Like, just look at this one. This one, it was an if then else statement. And now it's hopefully not conditional and still provides the length of the other condition. So, yeah. It's bad. So, from a naive to reasonably secure implementation, we increased, you know, from the just the basic RSI function all the way to the latest handling. That's toward a managed more code, which is quite a bit. And the takeaway for me is, you know the saying, you know, good, cheap, fast, choose two. In the security case, it's almost like choose one. It's either fast, fast, fast, very, very, very secure, or very, very simple. But you can't have even two. Or rather, you have to have a compromise between speed and security. But every time, you know, security is broken, you just have to give up speed to it. You just have to deal with it. And so, this is it for me. Questions? Thank you very much for your presentation. Just a little notification, but a small remark. Although Oven Assistant did not issue, didn't raise the CVE, they immediately fixed the, said, share our model attack. Well, if I'm not mistaken, the commit was on the same day when the paper was published. So the question, the remark was, although Oven Assistant did not publish the CVE, they actually fixed it the same day. So, we were in contact as well with the same researchers. And your statement is true. I will have to say, though, that Oven Assistant decided not raise the CVE by changing the definition of their threat model, saying, you know, local attacks are not a threat model for Oven Assistant. I reviewed their changes. They were lucky that they already had done a bunch of side channel resistant changes in their math library before. Also in RSA, so they had less code to write. However, I'm not 100% confident that they also did get right to the patches they did. I didn't look at the assembly. I looked at the patches. And they basically ended up doing what I would have done to some degree in my first revision that I did for Nettle, but we did the second revision for Nettle before we published. So, it's really hard to get this thing right. I don't think we got it right in Nettle either. There's probably some compiler optimization that we were not able to defeat. Maybe we defeated them in exactly 664, but not in other CPUs. It's really, really hard. And one of the issues that we don't have any test to figure out whether timing attacks like this are really defeated or not, because it's very hard to also write attacks like this. So, yeah, that's right, but they changed the definition. I think we're out of time. Thank you.