 Please welcome Daniel and Tanya for the talk about high assurance crypto software. Thank you So why is high assurance crypto software a thing? Why do we worry about the correctness of software or the quality of software? So just some recent news results about well Crypto getting broken so badly that the private keys leak these are just reports from October and November and This one nicely puns at the pond saying timing is everything So these were some timing attacks which completely broke elliptic curve base cryptography so badly that the private keys came out Timing tags are not a new thing. So back in the days when you Logging into a server and that is looking at your password. It might be doing a character by character check So for instance, you will starting by well, let's find out what the first character is. So you're sending a a a well BBB CB CCC now, of course, none of them will actually work This will be very surprising if this was the right password But you do observe the time it takes to say this password is wrong And then you notice that CCC takes a little bit longer to fail. Yay, CCC then you're trying CAA CBB and so on and so on still fails and they all take about the same time to fail until you're getting to COO and Then you're trying again for the third character and so on and so on and so on so eventually get the password And this was actually a thing that was 1974 the Tendix system So it was an operating system where you well back in 1974 before most of us were born here You could log in and it would be just doing this character by character checking So timing attacks to break cryptography or break security have been known for a long time But of course things are getting more subtle when you go into cryptography rather than just character by character comparisons so for instance if you're Implementing your favorite crypto system be it if you how many and find fields or be it RSA then each time you have to compute an exponentiation and then you go back to your crypto 101 lectures and Don't compute C times C times D times But you remember that there was a square and multiply algorithm Which tells you okay, you look in at the bits of the exponent. So here I'm going for the representation of D in bits I'm looking the length in bits and then I start by initializing so you can't run this this just sage code initializing your Finally gonna be the message in RSA and then for each step you're doing a squaring and If the bit is set you're doing a multiplication and this runs from well the first bit We've dealt with already so the second highest bit till all the way to two to the zero Which okay in sage this is well like in python So this is the range is not included and then you output the ciphertext at the plaintext Now there's some problems with this So if you're an attacker and you know that your user is using this to compute an RSA decryption Then you observe that this loop length gives you some information on D Namely the length of D in bits this L was defined as well. How many bits does D have and Then some D's well this D looks rather short This D is much shorter than N. It's much shorter than fire fan So this will be an unusual D and so it would leak that it's a little bit shorter also Here we have a branch. I go if This bit is one and not then while I just continue So depending on the level of fine-grained access somebody could even see whether I'm going for a multiplication Or just moving on to the next squaring in the worst case Somebody could read out the pattern of zeros and ones by just knowing whether I entered this branch here or not Now if you're a remote attacker, you only see the overall time Now I have a picture from something similar. It's not an expensation for RSA. It's a look to curve scalar multiplication This is from a recent paper from TPM fail And they observe how long it takes to compute while the equivalent on elliptic curves is scalar multiplication So you do double and add instead of square and multiply and you can nicely see that the bulk of the computation well for most Exponents slash scalars it takes this long and if you have some leading zero bits It's much faster to still somewhat faster now There's some variability and It could be faster because your D is very sparse It could be just one and a whole bunch of zeros and then another one That would also be as fast as something which is few bits shorter, but then more dense So that you don't know exactly whether it was short and therefore fast or whether it was sparse So few multiplications and therefore fast But if it's a lot faster, it's probably both The top bits are missing and therefore you don't have multiplications there So if you're very fast, you have a good guess that this thing was actually short and So there's a strong dependency on the length. There is also dependency on the density Now typically your implementation will actually not be doing it I was surprised that this paper included actually found an example where you really going bit by bit because of most of the Time our time is very precious and so we want to speed up things. So if you're taking your favorite number like 1419 also known as 363 and you write this in binary and then you want to do your multiplication while your Scale of application starting from the top bit all the way down similar to the RSA loop but you want to save on the number of multiplication slash squirrings then You would be grabbing two bits at once You pre-compute a few values you can pre-compute your C C square and C cubed and then to compute this explanation You kind of doing two squirrings at once So you're doing two of these positions at once the first one you can skip because it's both zeros the next one The one one gives you a cube. So we're starting with C cubed which we nice enough have pre-computed Now we're moving on by two bits. So instead of squaring once we square twice So that's why the fourth come in here next position is one next position is two. So we see a multiplication by C Computing to fourth power by C squared. So and Everything is like in the previous loop except for be doing two bits at a time and So instead of saying oh if the next bit is set we do a multiplication We look at the value of the next bit and then select from C C squared C cubed or no multiplication And this definitely reduces the number of multiplications This would have normally taken seven multiplications and this way we only take four. It doesn't change the number of squirrings now This will smoothen out the effects of having a sparse integer because well as long as there is a single one in the window We call this thing a window here, then it will cost the multiplication Whereas normally would be one multiplication and no multiplication here. So when you look at the traces from this Then you get much more accentuated boxes Whether something is zero at the beginning or not because most of the other steps in particular for larger windows This seems to be taking four bits at once. So for the larger windows most of the time I mean if you have four bits, then there are 15 cases out of 16 where you have to do multiplication and only one out of those 16 Where you don't need a multiplication? So it accentuates the length issue and kind of you have no idea whether the bits were set or not I mean how many bits were set? Now how much does these few bits do? Of course, you don't want to leak your secret RSA exponent You don't want to leak your Divi Haman, but it's it's not too bad There is it's really you would need a lot of knowledge You need to have either extremely short or extremely sparse to actually break something with this It's kind of worse if you're having RSA with Chinese remainder theorem decryption because then you can do some combination tricks. So there's something we gave a few years ago how you can combine information if you know a few bits of D Of D model P and of D model Q and then you combine these so then you learn more When it really goes bad is if you're doing signatures DSA signatures or easy DSA signatures so these are systems which are extremely fragile and it's a strange thing that by just looking at the scalar multiplication for a Number which you will use only one single time. So your Signature generation starts with you pick a random number you do an exponentiation or scalar multiplication and Then you do something else with this number These one-time numbers these one-time exponents if there are somewhat biased for instance If you know that the top four bits or top eight bits are always zero You're getting the secret key So it is very very strange system that this is possible and this was exploited in the two papers from which I showed you the the news coverage. So there was a TPM fail this November by Daniel Moghimi, Berksuna, Thomas Eisenbach, Nadja Henninger who showed that doing this to typical TPM Implementations of TPM cryptography. So you have Your TPM in your computer in charge of doing the signatures say for VPN connection Then they could get the keys out of the TPM remotely There was another paper called Minerva attack. I haven't seen the paper yet, but they have a very nice informative web page Jan Ganga, Peter Svendor and Vladimir Statlach Where they're doing the same for smart cards, which were actually certified. So this is Really bad attack and it's something where all these implementations both the TPM and the smart cards for the signatures should have been Tested for this before But apparently people didn't test or didn't realize that the small bias has such a youth effect There are more attacks. So this is just the basic you see the overall timing But this already broke lots of library smart cards and TPMs If you're getting more detailed information if you are on the same process They're like you're running a hyper threading attack or cash timing attack Then you might even be able to figure out when you're doing the lookup for the pre-computed values Where this lookup went so you're kind of booby trapping the the table where you're gonna look up something and then Depending whether the processor fetches this entry which is in cash or that entry which is not in cash You're learning actually something about the exponent and you can even recover the exponent from these things Now This should be a constructive talk. We said this is like 80% constructive. So let's jump in. How would you fix this? So one thing is for all our crypto implementation. We actually know kind of an upper bound This is not an arbitrary Expansion it's an rsa decryption We know that our rsa keys Well, we know a bound on the end people pick 2048 or 90 4096 bit keys. So we have a good upper bound on how long this D will be So why don't we just use that so we're making This loop length Independent of what D actually is but just saying well, we take the number of bits in N rather than D All right, so the problem then is Before that we started initialize other message with the ciphertext and then we kept squaring and multiplying But there's an easy way out. We just initialize at one and Well, if you square one it stays one until well, you have reached the bits that are actually occupied by D so when we expand our D we're padding to the full length and we initializing it one so we have a fixed length loop and This one takes care of not modifying our values and then we do the normal thing We square and we conditionally multiply except for we don't want to have this if else I mentioned cash timing attacks and in general. Well, you might be leaking Well, you're definitely leaking how many bits there are and you might be leaking more depending on how Different your multiplication looks from the squaring and so what we typically then do is we give up on a bit of performance We do a multiplication for every bit not just if the bit is zero and Then we conditionally select which of the two to take the one which we just computed the age here That is with multiplication or the one without multiplication And we don't want an if and else there Because an attacker could observe the branch could see whether we're going or we're not going and so we do this whole thing by Rhythmic well, okay Let's briefly run through this if the bit is zero then have one minus zero times M So I'm getting M great and then if this bit is zero. I'm taking zero times H So for the zero bit, I'm computing M, which is what I wanted and Now for the one bit I should be grabbing H and yeah, of course if I have one minus one. This is zero times M plus one times H so This small modification to the code comes at a bit of a cost So it's as slow as a worst case both in the length So our loop has gotten longer. We're doing more modifications more squareings and we're doing a squaring and a multiplication for each bit Which otherwise it would have just been the worst case Now I've been saying elliptic curves and I've been saying if you have an RSA So if you're doing different harm and fine field, you can do exactly the same thing you can initialize at one and then Multiply by your generator. So that's cool. Now if you're doing the same thing with elliptic curves Then things get a little bit more iffy if you were here by now It's five years ago at 31 C3 then I'm giving a talk about elliptic curves And we've been ranting a little bit about wire stress curves and one of the things that makes wire stress curves Nasty to implement is that you always have to talk about this extra point So you have a nice curve and then there's an extra point called the point of infinity and This point of infinity messes up most of our arithmetic So we don't have nice formulas. We just say hey Well, this is our neutral element and so we start at this and then we keep on doubling this Most of the formulas have this exact exact this exact point as an exception So we can't initialize there. There are ways around it, but the default is not as easy. There's some nicer curves We've been advertising Edwards curves and Montgomery curves So on Edwards curves the neutral element is nice This was the starfish shape and you have this ponded 12 o'clock and you can just do it So for Edwards curves everything is nice and I should also do a shout out to Montgomery curves For instance the famous curve to 519 Which you might have in your browser or in a cell phone that is using formulas due to Peter Montgomery Which are very happy about this data flow and you even get any discount So you're doing basically one doubling and one addition for each bit same as a squaring modification just It's called addition here So it's like okay then would comment that probably this is just to make sure that the mathematicians keep lifetime employment Because nobody could possibly understand it otherwise. So it's all the same It's called addition doubling sorry for that not my fault came before me But you're getting this combo of addition and the doubling for less than it would take otherwise So it's not as bad as it looks like for RSA of a different harm in In fact the reason that to 519 gets used with these Montgomery formulas is that for that bit size It's the fastest way you can implement scalar multiplication. So it has a nice feature of being constant time and Is cheaper If you needed some more motivation for wanting constant time There is an additional benefit of constant time namely you have figured out how long something should take and so certain things Should not happen. So if you know that you're arithmetic does finish in time X You will not enter an infinite loop as the Microsoft cryptography library did recently for Windows 10 oops Yeah constructive talk you said so now we've had bug bug bug and a little bit of fixing and maybe you believe that yeah Okay, these timing attacks are a problem and sometimes people do believe like open SSL They've got a few subroutines which are labeled claimed to be constant time and there's some other crypto libraries Which say yeah, we've got some constant time code. It's not pervasive by any means But some people say they're trying to do things in constant time and avoid these attacks and then well Is that true? I mean people make claims about crypto all the time like RC for Which was used until pretty recently sorry that's bad crypto it is bad crypto But it was introduced with a claim that it's great crypto RC for is the crypto you want to use and somebody gives you RC for software well originally It was proprietary, but eventually it was leaked and then everybody had RC for software and this software Yeah, it's a strong cipher and it's constant time. And well, how do we check these things? Well the the strength of the cipher? That's outside the scope of this talk. It's not a good cipher yet Don't use RC for but how about the constant time property of RC for something else? Well, this example actually uses RC for this code is maybe a little bit weird So let me first look at what this code does and then say what Valgrind is doing with it This code is sort of the start of code to encrypt using RC for and what it does you see there's some key running around There's some space allocated for the key 32 bytes Maybe you want to use a different length you can try this for different lengths of key But 32 is a reasonable length of keys And then okay, the key is allocated the program will abort if the malloc failed and then at the end It frees up the key properly. It doesn't ever initialize the key So this is maybe a work in progress program, but okay, there's some space for a key and then it expands the key RC for There's a key expansion process into what open SS all calls this RC for key structure And then oh, there's supposed to be some encryption after this Well, that didn't quite happen yet either But this program is still something you can compile and the compiler with normal optimization Optimization options and without link Tom optimization and so on it won't realize that this program is doing nothing I mean, maybe the RC for a set key is I don't know crashing your system or producing some output or something So the compiler will call malloc and RC for a set key and then you can try running this program under Valgrind Now I'm sure lots of people you here have used Valgrind or address sanitizer for checking for memory problems Valgrind has the advantage that it works on binaries And maybe some people were in room C for the previous talk about fuzzing and it's really helpful to have tools that work on Binary so you don't have to worry about getting into the whole software engineering process compilation process You just take the code that you're gonna run like the RC for set key from open SSL You don't have to recompile or redo anything with open SSL You can just run Valgrind on your compiled code and it will run this program and allocate space for a key and Then it'll call the RC for set key and run through Valgrind Interprets every machine instruction and while it's doing that it keeps track of all right Which memory is actually memory that's you know, you're allowed to talk to you For instance the malloc is setting aside this amount of space on the heap and then well You're not supposed to be going before or after that and Valgrind is trying to keep track of what your pointers are pointing to and then when You read and write data then it's gonna say Oh, you're not allowed to do that and one of the things that it checks is suppose that you have some uninitialized data and you use that as a pointer or You use that as a branch condition and you try to do an if or an x bracket i where i is not initialized Then Valgrind will give you an error and that actually happens in this code that the malloc Array well, that's uninitialized and then Valgrind will track all of the Uninitialized data like paint tracking it'll figure out all the other computations on uninitialized data Inside RC for set key and then it will complain so you get an error message looks like this Usual kind of Valgrind error for the people who've used this tool and then it'll say there's some use of uninitialized value Which is not just any use this means that you've been doing some array access x bracket i where i is uninitialized and That's where valgrind. I mean what would it mean it makes sense that valgrind would say oh I don't want to continue past that x bracket i and and just guess I mean It's trying to figure out if you're accessing some wrong spot in memory and if i is uninitialized It just says that's an error you're not supposed to be doing that and similarly if you do an if Based on if i where i is uninitialized then valgrind will say no you're not allowed to do that And it'll produce one not exactly the same error But something else another complaint about a branch based on uninitialized data And hey, that's exactly what we want to do to check whether this rc4 set key is Constant time code. This is checking is Any of the key information is that being used for a branch? Anything derived from the key is that being used for a branch or is it being used for X bracket i which could be Getting into one of those cache timing attacks And so if valgrind says there's a problem then you investigate and you fix it by throwing away rc4 Or whatever you have to do to fix your your software and if valgrind says there's no problem then hey cool all done All right, so it is a happy talk after all so we're we're in a situation that we have constant time Exponential scalar multiplication. We have constant time rc4 Under a few of the conditions. So one of the things is that you're arithmetic I mean in the end you have to implement your computations on the elliptic curve or you have to implement your arithmetic Modular rsa modulus and you have to do this long integer arithmetic in constant time But you can again check this one out grand will every single machine instruction. It's going to follow through so valgrind awesome tool Well, there's another condition That the processor doesn't screw you over The processor tells you well, okay. I'm going to do a multiplication. I'm going to do division I'm going to do an addition now if you have something where the processor is going to say I do this in one single clock cycle And that's probably okay. So you check the manual to check the reference code and sorry the reference manual And you look of how long it takes for one application for one addition You look at like when you have access to it and so on so that's fine But how about other processes? So I had a poor student look at the cortex m3 so that's a low-end arm processor and we asked them to do some nice implementation of elliptic curve cryptography and If you're having a multiplication, which is taking two thirty two bits words and producing a 64 bit output So twice as long as multiplication should be doing and then you look at these cycles count then you're seeing Three to seven cycles footnote C Footnote C tells you that this might be Aborting or terminating early depending on the size of the source value There's a worst-case latency of one cycle and something more and up to seven cycles So alright, so student with lots of time. It's only two times 32 bits inputs, right? You can just know you can't test all of them But you can have a student test a whole bunch and so he came up with this flow chart so This cortex m3 gives you values between three and seven cycles It gives you actually every Number between three and seven cycles except for four Depending on which of those branches you are taking whether something is a special operand Yes, or no and whether both are special and one is zero In that situation Volgren won't help you and that situation is basically by different processor Yeah, this is what happens when We decide to do a constructive talk 80% positive. This is gonna be defense Yeah, and then we talk about it and realize yeah, it's all broken. It's kind of sad. It's actually even worse I mean, it's not just that the tools don't do what we want them to do for checking for things being constant time But they also don't check that the code is correct I mean we spent half the talk now talking about timing attacks And there's all these vulnerabilities to timing attacks appearing in the news and lots more that don't appear in the news And then suddenly there's more CVE's which are not just timing is going wrong This is like okay. Let me explain this first of all Crypto memcump this is a function inside open SSL which says it would be really embarrassing if open SSL had that one byte at a time Password checking or authenticator checking Inside the code so crypto memcump has been there forever in open SSL and that is doing supposedly constant time comparison of two byte strings now It's not that open SSL does everything in constant time But it does this in constant time and the PA risk implementation because of an implementation bug The PA risk crypto memcump function only compares the least significant bit of each byte now This is from May 2016 somebody thought that they knew what PA risk processors are And it's a good idea for open SSL to have assembly code for comparison of two byte arrays on The PA risk I don't know how many how many people actually have ever used a PA risk processor. Can I see? Okay, I see at least 10 hands being grazed. I'm impressed This is not the world's most popular processor But it exists and you can write assembly code for it And maybe you can even find some machines where you can run this code and actually it's not crazy that open SSL is Doing assembly code for things because compilers screw things up pretty frequently On the other hand as this particular implementation illustrates humans also screw things up pretty frequently. So what does this bug do? What is the impact? Well, let's look at the advisory It says this allows an attacker to forge messages that would be considered as Authenticated in an amount of tries lower than guaranteed by the security claims of the scheme Okay, let's figure out what this means. So you've got a message and it's got typically a 16 byte 128-bit Authenticator at the end of the message and then this crypto memcump is being used to check It recomputs the authenticator with whatever mathematical function and then checks is the result equal with crypto memcump to You know what's coming in from the network and if somebody's modified the message then hopefully they can't compute that same 128-bit result and now we try with this PA risk crypto memcump on your PA risk server We try comparing The authenticator that's computed 128 bits to the 128-bit correct authenticator And it's only comparing the bottom bit of each of the 16 bytes So it compares 16 bits Which means that instead of a 2 to the minus 128 chance of forgery It's a 2 to the minus 16 chance of forgery So the attacker just tries to the 16 messages and one of the forgeries is going to work. It's lower. Sorry. It's lower It is lower. Yes 2 to the 16 is a lower security level than 2 to the 128 This is like classic British understatement of your security level is perhaps not quite what you wanted Okay PA risk. Let's forget about PA risk just like the whole computer industry has and focus on Intel Now okay, I mean PA risk. It was a nice idea at the time You know son used to have these spark processors now Fujitsu still makes spark processors I don't think anybody's trying to preserve PA risk But people have these ideas of making new instruction sets and you know at some point Intel made a new instruction set And it got really popular and they kept extending it and kept selling nice fast processors. So here's using AVX to There's some that's this 256 bit vector instruction set on current Intel and AMD processors There's an implementation of 1024 bit modular Exponentiation in open SSL and the code from July 2013 was discovered in 2017 or announced in CVE 2017 37 38 to have an overflow bug now. What's the consequences of this bug? Let me guess. It's a lower security level Well, they say attacks against DH 1024 are considered just feasible Now yeah, you shouldn't be using DH 1024 in the first place But if you are then if the attacks are getting just feasible does that mean to the 16 Computations does it mean a day on your laptop does it mean a year on a cluster? I want to know how hard the attack is and well This isn't really answered and there's more that's not really answered in the advisory because they say well You're probably not using DH 1024 But maybe back when this was announced you were still using RSA 1024 or maybe DSA 1024 And those would also use the same sub routine and then the advisory said that the attacks Against RSA 1024 as a result of this bug would be very difficult to perform and are not believed likely Now what's happened here? there's the original crypto system and then there's a bug which basically makes a new crypto system doing a wrong computation and I mean this notion of wrong as well It's a different crypto system and then is that new crypto system this slightly different RSA 1024 it shouldn't criticize it It's just you know differently Abled RSA 1024 and that's it's you know, maybe it's secure and they say it's gonna be fine and well Has anybody you really looked at this? Normally if we have a new crypto system then we wanted to go through a whole lot of review because it's lots of things You can do wrong and it's really important to get these things right lots of people were using RSA 1024 You shouldn't of course You should use something much bigger or switch over to elliptic curves But if people were using RSA 1024 and they had these wrong computations happening. What are the consequences? Where's the papers analyzing this widely deployed crypto system? It's like saying yeah, we've deployed this crypto system and we've looked at it ourselves and we've decided attacks aren't likely and Just believe us well people say this sort of thing if you can't break it's gonna be fine Yeah, okay. Yeah, you're right. You're right. Everything's fine And similarly a few weeks ago an open SSL put out another advisory for another bug like this CV 2019 1551 Not I mean it's not that we knew this was coming and we figured we would have a talk about Bugs in crypto software just to make fun of this. I mean this is these things happen so often There's so many of these bugs and we don't know what the consequences of all these bugs are they just keep happening Here's an example of just one piece of the patch for the 2017 bug Yeah, the code before already has correct it. Oh, you're right the patch adds some lines saying correct. Oh, okay That's clear enough next slide How about post-quantum crypto, you know, you got these top Hundreds of cryptographers around the world fighting against the threat of quantum computers and putting together the next generation of cryptographic software Protected against quantum threats. That's gonna be carefully evaluated, right? Everything's gonna be just fine. Well, no Falcon for example, this is one of the round two signature competitors in the NIST post-quantum cryptography competition Well post-quantum cryptography standardization project and this one there was an announcement in September 2019 which said here's quote from the author of the software saying well, there's some bugs and the consequences are the signatures leak information on the private key and They also make the software sound faster than correct software would have been and interestingly all of the Implementations all of the latest Falcon round two implementations that were released had the same bug They all had the same test vectors. They all were you could cross-check them and they were doing the same leaking wrong oops, sorry different cryptographic computation which Presumably has lower security and then are we gonna spend a lot of time figuring out how low the security level is The author also commented the fact that these bugs existed shows that the traditional development methodology being super careful has failed All right, so What can we take from here so mathematical Complications and cryptography is thinking of like elliptic curse what I mentioned if you have these special points that need extra treatments You have to watch out. Are you even allowed to add those? Can you even represent this in your in your software? Those do make your software more complex something that we saw in the Falcon implementation It was a new system and it hasn't been studied for long. So Something which has been studied for a long time like ours a and easy see we still see issues with it Things get worse if you have such long counter measures So as one example you saw how I was trying to avoid the leaking that whether the bit is one or zero by introducing arithmetic Now if you have to review that code Before we just says if the bit is one do this if the bit is zero don't do anything Versus the code which has the arithmetic instruction. It's more complicated. So such long counter measures also add more complexity or Then was mentioning this Comparison bug in the PR wrist. This was also because people were trying to make the comparison Not leak information through timing. So they made a new implementation and introduced new bugs And then of course with post-quantum cryptography We're getting a whole bunch of systems like Falcon which have been less studied where we have less experience and how to implement it secure We don't even know all the pitfalls that add to the complexity where we have to watch out So it is a problem to review cryptography another problem with crypto is that we have this driver speed Crypto runs through large volumes of data So we have to be really careful. We have very small code. We have to run it many many many times and We will optimize the hell out of this doing like One squaring for each step in one modification which step read is annoying. So we're trying to squeeze it here So that makes it more error prone as well and also you're getting a huge amplification of implementations You will have Implementation for your x86 architecture. You're going to see some with AVX to instructions some without AVX to instructions Well, you might go all the way down to PA with special instructions So there is for each CPU. There is a dedicated mutation not just the reference code and then If you look at for instance cat track, which was the winner of the char 3 competition So it's a relatively new hash function Which only exists for like new platforms But still they're having more than 20 implementations for different platforms in their library or Google in order to get speed for Hardness encryption on some lower-end smartphones. So if you have a cheap smartphone It might have just in codex a7 rather than the the new architectures Which have AS support and so they were like, okay Well, maybe we should at full dis encryption, but what we have sitting around is just to slow doing AS without the hardware support would drain too much battery and so they went even down the road of of taking a Cypher specified by the NSA the spec cypher and they put it on there because it seemed to be the only thing that was Satisfying the speed requirements now They did switch up to some public outcry in particular Jason Donfeld did a lot of work there to make them switch to something better And they then did another implementation. So yet another code base to having a Recently designed combination, which they call adiant on this ex-chacha So there's a lot of code to review and a lot of places to get errors. So how do we deal with this? Well, maybe if the problem is all these complicated implementations of all this complicated math Maybe math is the solution. So here's help. Sorry. That seems to help. Yeah. Yeah more on plus one is to so that's Comment from the bottom here from this book from 1910 From this proposition, which has proven in very comprehensible language here It will follow after we've defined addition that one plus one equals two and this is on page 379 of this book called Principia Mathematica principles of mathematics I don't know if they do one plus two equals three that sounds more complicated But somehow they find it important to prove something and like in all this incredible detail It's some sort of like machine language for for proofs and then people complain that their machine language has bugs and well people have worked on this over the last more than a hundred years and They would actually recommend there's fans of this who say that yeah You should be going through this kind of pain. You should spend 379 pages proving that your code works So you should take your software and you should write down a formal proof in incredible detail Not just to convince yourself and some friends, but you should convince a computer that's doing automated checking of your proofs and that computer program says yes You have a correct proof that your software is computing the right thing and what is the right thing? Well, you have to carefully define that specify your language for your software specify what the input output relationship is supposed to be And then prove that you have that input output relationship for that software Assuming of course that you've gotten all that right then your proof if it's checked then yeah, everything should be fine These tools they work, but there's something just to give some context There's something that mathematicians who do proofs all the time don't like these tools Occasionally they'll use them But it's really not a popular set of tools because there's such a pain to use nevertheless They have enough fans that some amazing results have happened. So evercrypt This is something there's like 15 authors on the paper and they have a crypto library Which has implementations of all the crypto you need as long as you don't care about any of those NIST curves or Post-quantum crypto or well, here's the list of what they do support It's got some public key stuff and some signatures some symmetric functions It's got arguably enough to do you know like HTTPS you can do with these primitives and that's what evercrypt supports In the case of a yes, you need your CPU to have a yes instructions So maybe that's not so portable But okay, it does support some other ciphers on on any platforms and you can use this and there's proofs People have actually done the evercrypt papers reporting how they took some standard proof tools and actually formally went through this software It does exactly the right calculation, which is like okay That's some serious guarantee the good thing about this is that the code really has the maximum assurance of any code That we've seen for cryptography that it's it's really saying yes This code is computing the right thing if you use evercrypt then it will compute the right output for every input exactly what is specified Assuming the specifications correct assuming the processor is correct assuming the compiler is correct because a lot of it's written in C You need your C compiler to be correct, but well, okay You can deal with those problems separately. How do you verify CPUs? How do you verify compilers etc and have people review the cryptographic specifications? So it actually it's feeling like something serious is being accomplished here The only problem is that it's such a pain to do for every implementation You have to do quite a bit more work and people are getting lots of practice building better tools But it's still a ton of work to do these formally verify pieces of cryptographic software An example of how hard it is is just illustrated by that list of whatever crypt supports They've got some implementations of these functions They even have for Intel chips some fast implementations of some of the functions But if you want something that's fast on your smartphone or smartwatch where maybe performance is more of an issue than on your say big laptop Then no evercrypt doesn't give you fast implementations It does have something that works, but it's gonna be several times slower and nowhere near what Tanya was mentioning about trying to squeeze out All the last speed it's really far behind the state of the art in speed because it takes a lot of human time to take a new Implementation and prove something about it. So what do you do when you don't have proofs? Well, of course you test stuff now We could spend the whole hour about like how cool testing is and just I mean let me actually see a show of hands here How many people here? Have ever had this feeling of oh, I wish I had done some more tests of this code like the you know This bug that I had let me see a show of hands I see pretty much everybody in the audience raising their hands Okay, now how many people have ever had this feeling of oh, I did too many tests I See okay, maybe yeah, maybe about 20 people out there have this feeling out of about I don't know thousand something applications sorry who is in both camps People who thought they tested too much and too little yeah, yeah Yeah, of course testing is fantastic. You should test Everything and if you find that testing is something hard to do Then it's probably because you screwed up in your factoring of this software that you're trying to test You have too much cohesion you should be modularizing it more and have a piece that you can test if you do Test driven development then you will basically have working code all the time and that's a really nice feeling Except well, there's a little problem which I'll get to but something like the crypto memcump bug For example in open SSL for PA risk where it was only testing like the bottom bit and seeing what do the bottom bits match That's something which gives the wrong results for one out of two to the sixteen inputs Just random inputs will will give the wrong results So if they try two to the sixteen or tried millions billions of tests doesn't take that long And then it's gonna catch that or instead of trying just totally random tests the whole fuzzing philosophy says let's try to Smartly choose what you're gonna test and you try for instance You try a string and then try flipping a few bits here and there does it give the right result of a comparison? And that very quickly finds that the crypto memcump doesn't work And these are not just things that you you could think of doing retroactively This was actually implemented in the super cop crypto test framework Before the bug was introduced in open SSL and it's just well though the crypto code wasn't plugged into that framework So well we didn't catch the the bug but it's it's like we almost could have if there'd been a bit better organization of the testing effort in General if you see a bug then at least retroactively you should find yourself thinking all right how can I make tests that will catch that bug and Usually there's a pretty easy answer to that and then you add that to your regression test suite and make sure it Never happens again, and then if these regression test suites are shared enough then it never happens to anybody again And that's really effective except well the main thing that goes wrong with testing is that you're not Testing all of the possible inputs, and so we've seen time and time again millions of security holes Which are the attacker is finding some input which nobody thought of testing for it wasn't party or random testing It wasn't even caught by fuzzing It's just some obscure kind of input where the attacker says haha If I try an input of exactly that length after setting up the following condition Then the following weird thing is going to happen, and then I can take advantage of it like this There's some input that behaves in the wrong way Which you're never going to find through testing or even with the most advanced fuzzing that we have so How do you deal with this? Well? It's not so easy, and it's something which definitely affects crypto for instance November 2019 Nath and Sarkar said The fastest code for one of the standard elliptic curves out there curve 448 High security modern elliptic curve crypto. This is bigger than 255 19 if you want something at a higher security level There's a bug that randomly happens with probability one and two to the 64 Well, that's a lot of tests. That's yeah, I'm not gonna do two to the 64 tests I mean we've done some computations at that scale But it takes a lot of effort to set that up and it's not something you do with lots of different pieces of software again and again Could an attacker find those inputs? Well, it's clear in the paper announcing this Nath and Sarkar say alright Here's some inputs which make this operation fail inside like a subroutine Doesn't mean the attacker can find inputs to the whole crypto operation that make it fail Should there be more analysis of like how devastating this bug is or should we just get rid of the bug in the first place? They say alright certain kinds of inputs the code gives wrong results But it's very low probability so known answer tests are not gonna give not gonna find this And so they say you have to prove correctness and this is the dichotomy that people often say you have to either Well do some tests that'll find the common bugs or do proofs. That's a lot more work really painful But that'll find all of the the bugs There's another approach Which is? Symbolic testing and this is something where I'll take memory comparison example not for PA risk But on the left side here, let's not think about this is testing the code auditing on the left side here is crypto Memcum for normal Intel chips x86 64 inside open SSL And you can read through that well Maybe you can't because the font size is too small But you read through or maybe you're not fluent in assembly, but you read through and eventually you say here Is the computation that it's doing for some particular size of input? Let's take for example three byte inputs Of course you have to try every length that you care about like 16 bytes is very common So you for each length you figure out what does this code do and for instance for three byte inputs? It's taking x 0 x 1 x 2 comparing to y 0 y 1 y 2 and then well How is it doing that comparison in constant time? Well, it's exclusive oaring x 0 and y 0 together Exclusive oaring x 1 and y 1 together exclusive oaring x 2 and y 2 together so now it has three byte x oars It oars them together bitwise So that's the number between 0 and 255 where if the arrays matched You would get 0 0 0 and then everything else is going to be 0 if there's any difference Then the x oars will be between 1 and 255 and the or the bitwise or will also be between 1 and 255 And then you convert that to a 64 bit integer 56 extra zeros on it negate that integer shift right here if you think about this for a moment You see that you always get one if there's any differences in the inputs That's a logic you can go through as a code reviewer to say yes This does work correctly you start from the assembly you figure out what it means for each size you care about That's this graph this computation graph showing the arithmetic from these inputs to DAG a directed acyclic graph That shows how the inputs give you some output through a series of computations and then well you analyze this graph and say yes This works correctly and do it again for each length All right, there's tools which make this really really easy to do Let me highlight anger. It's not the only tool out there But it has the big advantage of working on binaries It starts from valgrin from libvex inside valgrin and then builds a lot of extra cool stuff on top of that What anger does is that whole red arrow inside the previous slide? It does that for you automatically it will take your binary and It will run through the instructions and tell you all right. Here's what that did for your Input arrays x and y and here's how the output is a graph from those inputs now This makes your code review easy because you don't have to think about memory access pointers You don't have to deal with the complications of the assembly instruction set the the output of anger is a much simpler instruction set There's no jumps. It's just completely unrolled this DAG that you get as output. There is a constraint Which is that anger if it reaches a branch? Based on one of the inputs x and y then it's going to say oh There's there's two possibilities Maybe you take the branch or maybe you don't and if you do an array access based on any of those variable inputs It'll similarly split that and say well, there's all these possibilities for which array Index you're accessing but hey, we were getting rid of that That was the first part of the talk is we don't want to be doing these variable time Instructions. We're just gonna have straight line computations Maybe you have some some loops, but it's based on public data not based on the the secrets that you're trying to work with So we get rid of this blow-up in crypto code anyway We want to do that to protect against timing attacks And then that means that anger runs really fast and it gives you the Unrolled code and then it can even sometimes check the correctness of that code for you So for instance this crypto memcump for three byte inputs. It will tell you yes, this works now. We have only Nine minutes left. So maybe I'll just very very quickly Show you what some code looks like but I'll skip a bunch of details This is a simple call to crypto memcump on some arrays of size n you define n to be three or sixteen or whichever length you care About do this again and again And this takes x and y those arrays compares them puts the result into z the compiler won't get rid of this code with normal Optimizations because who knows what happens after main maybe the exit is going to look at the z value because it's a global variable and Maybe it's going to do something with it So this crypto memcump will be called and then anger well Okay, there's some setup of grabbing the binary telling anger that the memory is all filled with zeros to begin with You tell anger all right. You're gonna run the code in anger, but Instead of having zeros coming in for the x and y let's replace those spots in memory with some variables Let's say that the x bracket zero in memory. That's going to be the x zero Variable where we don't know what that is it could be anything zero through 255 and same for x one x two and y zero And so on and then you run the program and you extract some z's out of all the possible universes You get at the end and for each one that all the magic is happening in the last lines here after angers run through the code You can just ask it do any of the following classes of bugs can they possibly happen and Then there's some automated tools called SMT solvers which can sometimes answer this question They might run a very very long time for but for this example it runs in under a minute And tells you yes the code always works All right last slide here What is missing what are the people who are trying this approach working on well You can always do if you have constant time code. You can always do this anger translation Another tool which will do this for you which I haven't used personally But manticore supposedly can do the same thing from binaries and comes with a lot of the same kinds of analyses But I've worked with anger works just fine. Also it has this cool gooey called anger management Um, so anger uh, yeah, it'll it'll always convert your code into this dag And then the the right the red arrow there It'll always give you the results of that and then all of the the interesting problem at that point is if the SMT solvers aren't smart enough to see that the resulting code works Then you have to build some new tools which will look at for instance You get one dag for your reference code which you're you've reviewed and you're sure it works You have another dag for your complicated fancy vectorized assembly Implementation and then you want to see are those doing the same computation And you have to kind of match up those decks and people will give these arguments say Yeah, yeah, this is why this is the same and and the whole game here is to build tools Which are doing this and this is maybe I'll I'll skip the sorting example aside from giving the url Just an example of doing this is for sorting code where right now the fastest Intel chip sorting code for integer arrays in memory is some new sorting code which is constant time It's like three times faster than Intel's integrated performance primitives library where they were trying to optimize sorting And it is verified to produce the right results with some tools which look at the dags coming out of anger and say Yes, that is a correct sorting program If you're interested in doing this sort of thing then you should say all right Here's some crypto code where nobody's claimed that it's verified which is most crypto code You can just take random examples and then say all right Let's make it constant time first if it's not then forget it It's going to be vulnerable to timing attacks throw it away but if it is constant time then use anger and Get this dag out of it and then say well, okay Why does that match some other code which is supposed to be doing the same computation like assembly versus some reference c code for example And then figure out how you can match up those dags and usually a little bit of python scripting will do that matching for you It'll tell you if there's any problems Sometimes you have to get into the details of how the crypto computations are done, but it's actually fun That's the great thing about this approach compared to all the the proving tools Doing symbolic testing symbolic execution with anger followed by matching up the dags analyzing the dags It's fun. It's actually a fun way to analyze crypto software At this point, we'll be happy to take questions. So thank you for your attention Thank you All right, we will do a very high speed q&a session. So please limit yourselves to questions not comments Microphone number one, please I think it's a very short question. Is constant time really the only mitigation against timing attacks Well, it's a mandatory thing. I mean You can do more you can do randomization and such things But you want to have that those don't depend on the secret data In a reproducible way All right mic number two, please Thanks for the talk Can any approaches from real-time operating systems can be applied to cryptography? So so real time is usually trying to say you want to make sure the operation finishes at most this amount of time But the thing that happens in fancier timing attacks like hyper threading attacks is that Before you've even reached near the end of the computation the attackers already extracted the secret data And so it's it's a different game that real-time operating systems are playing If you have constant time code that can be useful in the real-time context, but it's a stronger constraint. Okay, thanks All right back to one, please Thanks for this amazing talk. Um, I just had a question regarding evercrypt You said that the kerf 255 19 implementation was slower than the other stuff that's available From my knowledge the kerf 255 19 kerf 255 19 implementation evercrypt is actually one of the fastest ones On intel yes on arm. It's slower on intel. It's it's at the state of the art, but on arm. It's a few times slower Okay, thanks All right internet go That the formal verification of evercrypt also check for the lack of side channel attacks or just for the functional correctness It checked for functional correctness and a lack of timing attacks It's actually a constructive way of of producing code, which is constant time All right mic number seven You mentioned you mentioned compiler bugs How probably would it be that in bug and anger covers up a bug in your code? It's it's possible definitely the whole situation for testing is that you've got your original code Maybe you made a mistake there. You've got your test framework Maybe you made a mistake there and you're trying to have these be sort of independent tests and then ultimately of course if everything's proven then Yeah, you can imagine proving that anger works correctly do that once and then it works for everything But since it includes some kind of complicated stuff and all of python and smt solvers and so on Um, it's going to be some time before we're at that point. So yes, there's definitely a possibility of bugs in anger It's something to worry about as long as they're kind of independent of the bugs that you'll make in your crypto code Then you're reducing the risk of errors All right mic the signal angel, please is there progress in the formal proof of true randomness That's enough of an answer I think Mic number one, please What's the status of doing pairing friendly over arithmetic and pairing friendly curves in constant time? So this is again a little more complicated. So if you just want the scalar multiplication The tpm fail paper was also attacking the bn curves So just the same issue appeared there as well as the implementation were not constant time Even though they were inside the tpm code, which should have been validated The pairing on top of it is is looking a little bit like Explanation computation So the same tricks will work at this moment. The best reddit code apparently was not constant time All right microphone number one last question What about super scalar processors? Do they mess up your carefully crafted constant time algorithms or is that At least from a certain distance Not relevant anymore. They are short the the way to think about it is that you you want to have that There's like some isolated data Which is holding all your secrets and then there's never anything which is copied out of that Sort of safe environment into the metadata, which is controlling timing now If you have a super scalar processor, you've got multiple instructions happening in a cycle But as long as the decision of which instructions are being executed Is not based on the data that you're working with then you're good And that's where you have to be sure that the the processor is handling thing is you know Each instruction the time that it takes is only based on the other metadata outside the secure environment Okay, thank you for this great talk and please thank our speakers again