 Okay, let me start. Welcome to the first session of AsiaCrypt 2016. It's my great pleasure to introduce our first invite speaker, Professor Nadia Henever. She's an assistant professor in the computer and information science department at the University of Pennsylvania. She received her PhD in computer science in 2011 from Princeton University. She's working on mathematical and theoretical crypto and produced many deep results, but at the same time she also actively worked on applied and practical security. Her research fills the gap between theoretical crypto and practical security. Her joint work on analyzing the weakness of TLS and SSH in practice won the best paper of Eugenics Security 2012. In 2015, she won another best paper of HMCCS by analyzing practicality of the perfect board secrecy of DPMR. Please join me in welcoming Professor Nadia Henever. Thank you very much for inviting me to speak. I'm going to talk about the reality of cryptographic deployments on the Internet. I'm going to start with a little bit of sort of scene setting. You already mentioned that I'm sort of working in between sort of mathematical theory and reality, so let's start with just setting the ground stage for what reality actually looks like on the Internet. If you look at the papers that appear on ePrint or the papers that appear in a crypto conference, there's a huge variety of fabulous public key crypto schemes that have been in development for decades, but if you actually look at what people use on the Internet now, we've got sort of three major options for key exchange and three major options for signatures, and that's kind of it. I'm going to focus on public key crypto because I guess that's what I do. I guess you'll have to invite somebody else if you want to talk about the symmetric world. So I'm also going to focus on the sort of protocols that we can view as a sort of global scale by like scanning the entire Internet. So these are things that are publicly visible to us as researchers without privileged positions. So sort of the main crypto protocols that we've sort of thought of to look at are TLS, mostly in the form of HTTPS, but also for some other ports, SSH, and then IPsec, and in particular sort of IP1 and IP2 key exchange. So if you actually go and look at what kind of crypto people are using in these, so HTTPS RSA is still widely used, and then Elliptic Curve is becoming more common, and RSA signatures are basically global. And then for SSH, it turns out that Diffie-Hellman in particular, Finite Field Diffie-Hellman is very common for sort of historical reasons. IPsec is a little bit, it's much harder to measure, so take these numbers with a huge grain of salt. This is the number of hosts that preferred to speak Finite Field Diffie-Hellman in the Cypher ordering that we gave in a scan last one. So these numbers are highly dependent on what the client asks for. But just to give you an idea, unfortunately, you know, I think the recent experiments with Chrome were the only example of post-quantum crypto that has sort of ever seen the light of day in a real world deployment. So right now the world looks very much like people might have expected 15 years ago, 20 years ago, 30 years ago. There's not much new crypto. So the sort of methodology of my work has sort of... Well, I'll lay out the methodology of my work, and then you guys can all sort of take this and start producing the papers that I would have written. So the inspiration for this is... I have this quote from Adam Langley. He gave an invited talk at Crypto 2013 on why the web still runs on RC4, and he had this sort of reframe, which is much better in his British accent, but essentially the internet is vast and filled with bugs. And this kept coming up over and over and over again as he talked about the difficulties in getting Chrome to actually fix a lot of the major TLS vulnerabilities that are getting Chrome to be able to deploy fixes for the major TLS vulnerabilities that have been coming out for several years, things like beast and crime attack, and the only fix seemed to be to run on RC4, which, as we all know, is terribly broken. So the sort of methodology that I've been using is look at the entire global scale of what cryptographic deployments look like. So scan everything on the internet. This is possible now with... There's a number of high-speed scanners. ZMAP is the one that my collaborators and I have been... Well, they've developed it and we've been using it. There's also MassScan and several other options. So download every cryptographic, say, public key, secret key, maybe not secret keys until you compute them, the Cypher options, nonces, key exchanges, things like that, and look what happens. If we sort of... If you think of Murphy's Law, anything that can go wrong will go wrong. Well, the cryptographic version of this is if it's possible for an implementation to have made a mistake, someone on the internet has done it. So all we have to do is go fishing for these mistakes and then find someone who screwed it up and then write a paper about it. And if OpenSSL screwed it up, then we have impact and then we can publish the result. So that's kind of the research project for the past many years. And in this talk, I'm going to use Diffie-Hellman as a sort of case study, not because the other protocols or the other primitives that we have are not vulnerable to things, but just because I've been digging around in Diffie-Hellman a lot for the past several years. So this is, you know, kind of... We'll walk through all the various things that can go wrong with Diffie-Hellman and you can extrapolate to the other kinds of crypto that you are developing. So just to set the stage, sort of the divide between what practitioners believe about cryptography and what we believe about cryptography, some of these things are right and some of them are not. So here's one sort of stage. Diffie-Hellman got a lot more attention starting a few years ago when people became much more concerned about, say, NSA-scale surveillance of the Internet and how you could protect against this. At the time, you know, say in 2012, 2011, most of the Internet, HTTPS and TLS, at least, was using RSA Key Exchange. And so there was a lot of noise from the sort of applied crypto practitioners about, well, we should move to forward-secret cipher suites. And this is a collection of quotes from some people, some of them are random people on the Internet and some people really know what they're talking about and I respect them greatly. So perfect forward secrecy provides better security. Anyone possessing the private key and a wiretap of the Internet activity can decrypt nothing. 1024, but Diffie-Hellman is better than 2048, but RSA is much safer than any RSA cipher suite. So there was a lot of movement towards everybody should move to Diffie-Hellman Key Exchange and away from RSA. And sort of a couple years after that, there started to be some pushback, especially against, say, elliptic curve cryptography, which has been subject to, I don't know, people being worried about the safety of the curves that people are using. So here's a quote from Bruce Schneier, who wrote this after seeing some of the documents leaked by Snowden in 2013. He said, prefer conventional discrete log-based systems over elliptic curve systems because there's some risk of backdoor. And you can see in people, you know, sort of the hacker practitioner community, people are really worried about backdoors and elliptic curve cryptography. And this has really slowed the adoption of elliptic curves for a lot of the protocols that we're using. So, okay. What is Diffie-Hellman? I hope that we have all seen this before many, many times. This is the sort of textbook version of finite field Diffie-Hellman Key Exchange that we all showed our undergraduates. And this is what makes them love Diffie-Hellman Key Exchange so much. So we've got our typical crypto protagonist, Alice and Bob. We're going to share a key somehow in the presence of our adversary, Eve, the eavesdropper. In this case, she's passive. They get some parameters out of the sky. There's a prime P and there's a group generator G. Two is a good number. Just think of G equals two. Everybody else does. So, okay. Then Alice computes some secret A and sends G to the A. Bob computes some secret B, sends G to the B. G to the A, B, mod P. And poor Eve is stuck. She's seen all the traffic that's gone back and forth, but she can't compute the shared secret. This is very friendly-looking. Your undergrads all understand it. They think crypto is awesome after they've seen this. And this looks very clean, very elegant. It looks safe because it's so easy to understand. Of course, there's, you know, dragons swimming underneath the placid surface of this beautiful mathematical lake. So we'll get into those. Okay. So I talked about Alice and Bob and Eve, but I think there's another very important adversary who's not typically talked about in crypto proofs, which is the person who is implementing your cryptography. And I don't think this person is... You have to think of this person as an adversary because their goals are not necessarily aligned with yours. And so I'm going to call this person Kody the coder, who is implementing your crypto. I found a nice picture from XKCD for you. Okay, so Kody the coder, who has taken your undergraduate crypto class, is going to see the Diffie-Hellman, you know, this beautiful diagram that I just showed you, and his version of implementing Diffie-Hellman is going to look something like this pseudocode, Python-inspired pseudocode on the right here. So generate some primes, generate a group generator, compute a secret A, and send off our key exchange value. Looks good, solid. Excellent, we're done. That's the crypto for his product. Okay, so what's wrong with this? Well, as a sort of warm-up, I'll go through the sort of elementary cryptanalysis of discrete log-based crypto systems, and we'll sort of see how this ends up failing. So, okay. Lagrange's theorem tells us that the order of the multiplicative group generated by G divides P minus 1. There are subgroups of order any divisor of P minus 1. That's nice. Okay, so then we have some nice sort of generic discrete log algorithms that can solve the discrete log problem in time square root of the order of any subgroup, mod P, you know, in any generic group, really. Okay, so square root time to solve discrete log in a group of, say, prime order Q, or not prime, it doesn't matter. So then, once you know something about the structure of your subgroups, you can use the Poley-Hellman algorithm to solve discrete log. You can factor the group order. If, say, it has a bunch of small prime factors, you can solve the discrete log in each subgroup in time square root of the size of the subgroup, and then you can use the Chinese remainder theorem to reconstruct the log in your big group. Okay, okay. This is a fun exercise. I actually make my grad students in crypto implement this. Okay, so if, on the previous slide, Cody generated just a random prime, P-1 is likely to have many small prime factors. And so our passive adversary can then, you know, learn many bits of the exponent by applying this algorithm to the small prime factors that she was able to discover. But P-1, for, say, a 1024-bit prime, is likely to have some pretty large factors, say, 500, 300 bits. And so solving this free log in those subgroups is not going to be feasible, so she can't learn the entire exponent. But she can learn a lot of information about it much more than she should be able to. Okay, so this is sort of the basic elementary idea of, you know, things that can go wrong with Diffie Holland. However, you know, in practice, things aren't quite so easy. Modular exponentiation is really expensive, so many implementations use short exponents. I was actually surprised about this when I first heard about it, but this is pretty standard, so the standards say that you should use, say, 160-bit exponents with 1024-bit primes. This is to sort of match the square root time running time for attacks against the exponent length. Nevertheless, some implementations use 128 bits for some reason, you know. Okay, so Cody's modified implementation, which will be more efficient, will look something like this. It generates a 160-bit exponent, A. That's good. Okay. Except that now we are in seriously dangerous territory. There's a cute observation made by Ben Orschatt and Wiener in 1996, that if you have a short exponent, then for a random prime, you're likely to have enough small prime factors of P-1 to reconstruct your exponent using the Chinese remainder theorem because you only need enough small prime factors to get up to the size of your exponent, and then uniquely reconstruct it using the Chinese remainder theorem. Okay. So this has been known for 20 years. Who screwed it up on the Internet? Some people. Okay, so in 2015, this is a sort of small footnote that was contained in the logjam paper, which has a very large number of authors, so I sort of alphabetized them. And I'm doing this sort of for fun. Okay, so in 2015, we scanned the entire HTTPS space, port 443, for Diffie-Hellman Key Exchange. We found 3.4 million HTTPS servers that supported Diffie-Hellman. Of those, they used 70,000 distinct primes P. So I'll talk about their implications with that a little bit later. Of those, there were 4,800 primes that were not safe primes, where P-1 over 2 was not a prime. Those are our candidates for possible attack. So then we sort of opportunistically factored P-1 over 2 using the lifted curve method to find factors and looked at whether we found non-trivial factors of the order of the group generator that was being used. And in 750 groups, we learned that some non-trivial prime factors of the order of the group. These were used in 40,000 connections across our different scans. Then we tried to opportunistically apply the P-Hellman algorithm to reconstruct the exponents, and we won if we could check that the exponent was correct. And running this just on a few hundred cores for a couple weeks, we were able to compute the secret exponents for 159 exchanges. This is sort of like, I don't know, mass fishing for vulnerabilities. Most of these implementations, I mean it's random people implementing Diffie-Hellman. But this is to illustrate sort of what goes wrong. However, this has been known for decades. We know how to protect against this. Basically, the counter measure is that you should always do Diffie-Hellman in some group of large prime order Q. And if you want to maximize the size of your group order, you should use a safe prime and generate a group of order Q mod P. Okay, this has been known for decades. Every standard for Diffie-Hellman says that you should do something along these lines. So our developers are not reading the standards. Okay, so what do the primes that people use actually look like? If you look a little bit more closely in this sample of primes and groups used for Diffie-Hellman, you'll see that people totally got the message that they should use safe primes. But they didn't get the message that they should use like a prime order subgroup inside of that safe prime. So in fact, most of the people on the internet are actually using groups of composite order in particular that have a subgroup of order 2. So that means that the Decisional Diffie-Hellman assumption is not true in practice for TLS. The adversary can almost always learn one bit of information about the secret exponent. This is not really a vulnerability. It's only one bit. I can't think of how to exploit it, but it's a little bit interesting that there's this huge divergence between practice and theory. You'll also notice that there's actually a quite large number of non-safe primes in use, but most of the people using non-safe primes are actually pretty good about having a group that is a prime. So what's going on with that? Well, it turns out that there are some standards. Here's one that say not that you should use a safe prime, but that you should use a smaller order subgroup mod that prime. So we should match the length of our exponent to our desired size of subgroup. So for, say, 1024-bit prime, we want, say, a security strength of 80 bits. That means that we should use a 160-bit subgroup queue. This is fine. Okay, so for 2048 bits, you know, use a 224-bit group. So this is what the standard says, and a large number of people on the internet have followed the standard. So I actually, I have a question. I've been trying to figure out why this is the recommendation. Why not just use a short exponent mod a safe prime? And I've gotten various answers. Some people seem to think that a small subgroup is a safer assumption than a small exponent mod a safe prime. It's certainly a more mathematically beautiful assumption to say you have sort of this generic group structure, but I don't actually know why that recommendation is in place and nobody's been able to give me a good answer. But people in this room maybe know a good answer. So let me know if you know, like, how this came about. Okay. So the interesting thing about this is that using a smaller subgroup, particularly if you don't put any other restrictions on what's going on with p-1, opens you up to much, like, a whole variety of attacks. So these have been known for, again, decades. So there's a classic attack of Lim and Lee, where, okay, what if the adversary, instead of using a value that is in the correct group for Diffie-Hellman key exchange, chooses a different subgroup and sends that instead? So the Lim-Lee attack, you can... We have Eve, who is, well, now is malicious, so I guess it's Mallory, and we have Bob. And Eve is going to, instead of sending a proper Diffie-Hellman key exchange value that's in the right group that Bob thinks he's working in, is going to, say, send a generator of the subgroup of order three, mod p, assuming that p-1 has a factor of order three. And then Bob, when he computes... So Bob is gonna send a normal Diffie-Hellman key exchange value, g to the b. Does this work? No, it doesn't work. Okay. So I guess his shared secret is going to be the value that Eve sent g3 raised to the beef power, and this is going to be contained in a subgroup of order three. So many protocols will then, say, have some behavior that depends on the value of the negotiated shared secret. And if the behavior of the protocol or a value that Bob sends over depends on this value and leaks some information about that to Eve, Eve might be able to then compute the value of Bob's secret exponent, mod three. And this is going to take, at most, you know, time 0 of 3. If Eve can then repeat this for many small factors of p-1, she might be able to learn Bob's secret exponent over time. This is making the assumption that Bob is going to use the same exponent for many key exchanges, which is often true in practice. So this is sort of a classic attack, well-known. Many protocol specifications mention it and provide protections against it. There are other variants of small subgroup attacks, sort of these small subgroup confinement attacks. For example, a man in the middle might be able to simultaneously force Alice and Bob to both confine their shared secrets into some small subgroup, and they might line up, say, with probability one-third. And in that case, they might sort of have synchronized their state, and Eve can easily brute force this state and maybe convince them that they actually have a real connection that now she can decrypt everything for. So there's a whole variety of these attacks that have been known for decades. Okay. There is a well-known countermeasure against this, which is that all parties in the Diffie-Hellman Key Exchange should verify that they are contained in the correct subgroup. So if you're supposed to be in a subgroup of order two, well, okay, you should verify that you're not in a subgroup of order two, and if you're supposed to be working in a subgroup of order Q, you should verify that the value that you receive is, in fact, in a subgroup of order Q. So now you're good. You're protected against this. Every standard says this. However, if you actually go and ask people on the Internet who have implemented Diffie-Hellman, are they doing this, the answer is no. And many of them, in fact, don't believe that these attacks are real vulnerabilities, and in some cases there's good reasons for not implementing these checks, which is that most of the protocols out there have no way of actually specifying what the order of the subgroup is to the other parties. You know, say, in TLS, the server is the one who chooses the Diffie-Hellman values, and they just send over a generator and a prime, but there's no space for the order of the subgroup, so the client has no way of actually validating the values that they receive. The other sort of response to this is that these checks are perceived as sort of unnecessary, like, okay, who cares if the adversary can learn a bit of the key? With a number of my students and other collaborators, Antonio Sanzo and then Alex Halderman and some of his students from the University of Michigan, we actually went and measured the implementation behavior of a bunch of major protocols with respect to these kinds of attacks. It turns out that a large number of hosts are using primes that are not safe, okay, large, less than 15%, but still fairly significant, and essentially no one is validating group order, not even for well-known groups where the group order can be sort of specified by a parameter. Yeah. Yeah, so for HTTPS and TLS, there is no way for the client to verify Q. The server, however, presumably generated the group that they're working in, so they should know Q and they should be able to validate it, so this is measuring server behavior, so we're taking the perspective of the client and measuring what the servers are doing. So, yeah, for clients, there's often no way to know. For IPsec, IPsec chooses the groups that are used for Diffie-Hellman Key Exchange from a small list that has, I guess, 15 or so groups on it at this point, and that means that the subgroups are, the subgroup orders are well-known by both the clients and the servers, but we found that almost nobody, almost everybody was perfectly happy to accept the values that we sent in malicious small subgroups. Somewhat surprising is the number of hosts that are willing to accept subgroups of order two, say one or negative one, or even zero as a Key Exchange value. So that means that they're not doing any validation whatsoever. At least one implementation said that they were worried that this would break clients and they didn't want to break clients. Okay, so among the things that we found, OpenSSL, when they first implemented support for RFC 5.104, which is a standard that defines some of these extra groups used for IPsec, they messed up with respect to group validation and failed to do it, and so they were vulnerable to the sort of textbook, classic, limly attack. And the Amazon load balancer was vulnerable for similar reasons because of failing to validate. They got a little bit luckier in the prime factors of P-1, so the attack wasn't a full exponent recovery attack because they were not quite feasible because P-1, the next largest factor, had too many too many bits. And I just want to note in terms of bad implementation behavior, we didn't scan zero for IPsec hosts because it was known to cause a crash for some implementations. We've gotten quite a lot of it's a little bit sad what we learn when we do these scans of the amount of responses that we get of you caused our oil pipeline to shut down for three minutes. Sorry. So I want to sort of back up a little bit. I just sort of went through these attacks that are to be honest cryptographically boring. They're multi-decades old. I can't publish this stuff in a crypto conference because this is all super, super well known. Why is everybody vulnerable to these things still? Is there some way that we can actually like on a meta level protect the cryptographic protocols and primitives that are designed against the implementers who are actually implementing them? Are some things more fragile than others? Is there some systematic way to compare the fragility of different schemes against bugs introduced by implementations? I don't know. So is there some way to formalize the behavior of our sort of semi-malicious implementer as a character in one of these security games? I don't know what he's doing but he has some motivations which are to say run efficiently, not crash users, not break behavior and then I don't know he thinks he understands Diffie Helman pretty well so like he's not going to do stuff that's not necessary. Can this be modeled as some kind of character and should we be modeling these things in the security proofs that we have some kind of robustness against the actual implementers who are going to introduce bugs? And maybe this has already been done in some of the work on kleptography. So I mean if you start modifying protocols to introduce say non-visible but malicious behavior well the kleptographic adversary is very clever and our codie the implementer is not necessarily very clever maybe he's only going to say omit instructions or randomly modify instructions rather than trying to proactively leak keys. So I don't know if there's some way to sort of formalize this kind of problem. Sort of and on a more practical note I mean we do have these problems that a lot of the crypto that we're using is extremely brittle to implementation flaws. So an example of a brittle design is DSA. So every signature is randomized. You have you know your long-term private key and then you have like a random ephemeral key that's generated per signature both for finite field DSA and elliptic curve DSA. And this design of requiring randomness for every single signature well so the way DSA works if the adversary knows the randomness that was used to generate one signature they can compute the long-term secret key from just one signature. So this means that if you ever use your perfectly well generated key on some system that has a bug then you can lose the security of your long-term private key and this has come up over and over and over again in practice. So in 2008 there was the famous open SSL disaster where the DSA was compromised. They failed to add any entropy to it so there were only a few hundred thousand possible outputs that could be outputted from it. So that meant that if you had a perfectly good DSA key say on an SSH server and you updated to that version of Debian open SSL and used your perfectly good SSH key then your perfectly good SSH key was compromised because you used it on this vulnerable server. The person who used that version of Debian had their secret keys compromised even if their secret keys were generated years beforehand on a perfectly good system. In 2012 we did a study mining your P's and Q's looking at random number generator failures. Most people sort of focused on the impact of the random number generator failures on RSA but similarly for DSA we were able to compute 1% of SSH host keys just by finding people who reused the same randomness across two different signatures on two scans that we did of SSH hosts and we were able to compute 1% of SSH host private keys due to this vulnerability. More recently there have been there was a random number generator flaw on Android that resulted in people getting large numbers of bitcoin stolen because they had repeated ECDSA nonces that were posted to the bitcoin blockchain and people searched the bitcoin blockchain found their repeated signatures and then computed their private keys and stole all their bitcoins. This is a real vulnerability. This could have been avoided by saying having some kind of deterministic signature generation but it wasn't. Okay, so what kind of counter measures do we have? Can you formally verify every single implementation on every possible platform because often we have these very complicated interactions between different architectures and different libraries and constant and nonconstant running times of microarchitectural instructions that you need to understand in order to protect against say timing attacks that's probably not possible. Maybe we should have Dan Bernstein just implement all cryptography because he's the only programmer in the world who's known to write bug-free code. Or I don't know. I don't know what the right solution is but clearly we need some kind of more robust system in place for developing our cryptography. So I want to move on to a slightly more advanced adversary who is our policymaker. So we have Paul the policymaker. We're sort of in the midst certainly in the United States but I think also in Europe and I actually don't know what the political situation is in Asia of another round of crypto wars where law enforcement wants access to encrypted data and they want the tech companies and the crypto community to build in back doors so that law enforcement can get access to encrypted data. We've been through this before. We went through this in the 90's in the United States with arguments about law enforcement back doors and export grade cryptography. So here's a quote from 1997, the government must be worried of suffocating the encryption software industry with regulation in the new digital age but we must be able to strike a balance between the legitimate concerns of the law enforcement community and the needs of the marketplace. This sounds just like the things that are being said today. So here's a quote from President Obama this year, everybody is walking around of the Swiss bank account in their pocket so there has to be some concession to the need to get into that information somehow. So even Obama is calling for balance with our new administration I have no idea if these arguments will work anymore but we can at least make them to ourselves. So, alright so we can at least go back to the situation of United States export controls in cryptography from the 1990s and understand what the security impacts are now in order to understand what if we build in similar kinds of back doors now into our infrastructure what it might look like in another 20 years. So sort of a super brief history of United States export control pre 1994 cryptography was regulated as a munition the same kind of license that you needed for tanks and grenade launchers to export cryptography from the United States then these regulations were amended and 40-bit symmetric cryptography was understood to be allowed this was at the sort of very dawn of the internet so during the development of the early versions of SSL they built in say 40-bit export string ciphers and sort of unfortunately for the rest of the world here the United States policy ended up weakening cryptography for the entire rest of the world so in 1996 cryptography regulation was moved to the Department of Commerce and the sort of key size restrictions that were put in place there are actually still there so there were a series of export cipher suites for TLS SSL TLS that use 512-bit RSA and if you help me then in 2000 the regulations in the United States were lifted on mass market and open source software so export cipher suites for SSL TLS were no longer needed and everybody could use all the strong cryptography that we wanted so we won good are we done? of course we're not done here's another character for us backwards compatibility Beowulf so backwards compatibility Beowulf says only old browsers will negotiate export cipher suites so there's no harm in keeping them enabled on servers even though the political environment has changed some customer might come to your e-commerce site using a 1996 version of Netscape and they might really want to buy your products so you don't want to lose customers so of course you're not going to turn these things off and you're not going to have any harm in keeping them enabled well except that it turns out that there are so the near fact that servers had these they maintained support for these weakened export grade ciphers 15-16 years after they were no longer politically relevant turned out to result in a number of devastating attacks for perfectly modern secure clients that didn't even speak these export ciphers so there's been a series of these the first one was the freak attack this is an attack on 512 bit export RSA it was dependent on an implementation flaw that was almost universal among browsers and a man in the middle attacker could force a perfectly secure browser except it had this implementation flaw that would downgrade to weakened 512 bit export RSA and then decrypt the contents of the connection later a large number of collaborators and I developed a version of this attack for export Diffie Hellman which even there's still quite a lot of support even in multiple months after the freak attack was released for export cipher suites popular websites and then earlier this year another very large number of authors the number of authors is increasing developed a sort of much more complicated attack that allows passive decryption of TLS cipher suites using the fact that a server might support SSLV2 with export cipher suites so this is a slightly version a different kind of attack but there's been a sort of series of devastating attacks on modern implementations I'm going to zoom in a little bit on the logjam attack I think here is a diagram of what's going on in the protocol the main idea behind the logjam attack there's actually a protocol flaw and the protocol flaw is that the server key exchange looks the same for export cipher suites as normal cipher suites so the client has no way of telling if the server sent their Diffie-Hellman key exchange using a weakened export cipher suite or whether they legitimately just wanted to use a 512 bit Diffie-Hellman key for key exchange and there was still a fraction of a percent of hosts that were using 512 bit Diffie-Hellman for normal key exchange in 2015 so what the a man in the middle attacker is going to do is sort of rewrite various values in the key exchange in the TLS handshake leave the key exchange put because it's signed by the server's strong RSA key and then in order to not get caught in rewriting the connection to downgrade it they need to compute the discrete logs of one of the key exchange values online so in order to do that they need to be able to compute 512 bit discrete logs in real time before the connection times out so this attack works it should never have worked because it relies on a number of super unlikely sort of key exchange values the sort of 1990s era crypto front door sort of assumed that only the NSA had the power to break a 512 bit RSA or Diffie-Hellman key exchange this is no longer true the supercomputer of yesterday is the commodity computer of today you can rent a supercomputer by the hour on Amazon EC2 there was a protocol flaw in SSL and TLS Diffie-Hellman key exchange that was unnoticed for 20 years despite the amount of importance and attention placed on this and then finally the actual implementation behavior required computing 512 bit discrete log in real time so what actually enables this last bit so we need to understand a little bit more about what the number field sieve algorithm actually looks like so this is the best algorithm if you want to generically compute discrete logs mod say primes and you don't have any of the other stupid vulnerabilities we talked about in the first section okay so the number field sieve algorithm it has multiple steps we don't necessarily care about what these steps are this is sort of diagram of what the algorithm flow looks like it can be parallelized but generally relation collection and linear algebra are the most expensive parts and then we have this individual log step that actually computes the real log that you want so how long does this algorithm take it takes sub-exponential time in the size of the prime this is usually denoted using this L notation so it's L of 1 third with this constant 1.923 which is like the cube root of 64 over 9 so this is exponential in some constant log to the 1 third log log to the 2 thirds okay so it takes sub-exponential running time okay however this doesn't mean we have to run this for every single individual log we want to do it turns out that only the final step depends on the actual target so if you knew the prime that somebody was going to use beforehand you can do this whole pre-computation step and get up to here and then you only have to do this final stage online and it turns out this final stage is much cheaper so the constant in the exponent is smaller okay so in order to understand sort of the feasibility of these kinds of computations here's a sort of series of records for mod p discrete log the current record was announced earlier this year it's a 768 bit discrete log okay it's a 212 bit computation the really relevant part is that okay a constant and exponent that's a big deal but it turns out that the constant is the difference between two weeks of computation on thousands of cores and a minute of computation on a few dozen cores so the final stage is actually fast enough that it can be done in real time and if somebody does this pre-computation once it's a little bit more different key exchanges using the same prime so this is what enabled the log jam downgrade attack to work but it actually has much more interesting implications for the rest of sort of more secure um Diffie Helman so okay sort of some lessons from the export cipher downgrade attacks the big one is that obsolete cryptography should be considered harmful because it's bad for everybody we should be sort of proactively removing bad options in response to the export attacks browsers have raised minimum Diffie Helman sizes proactively disabled export ciphers and so we're making progress but it was too bad that we had to demonstrate concrete attacks in order for these changes to be made there was sort of a general problem with the protocol design that they were too complicated and the extra complexity from implementing export cipher suites versus non-export cipher suites seem to cause more implementation flaws and protocol vulnerabilities than there would be otherwise um and sort of more generally for policy um we have this problem with backwards compatibility that technical backdoors in the infrastructure don't go away even when the political environment changes people don't want to turn things off they don't want to break their customers and we can't assign cryptography based on nationality it's really closed minded for one government to say okay we need to backdoor the internet so that we can do something to our own citizens and finally the sort of technological technological evidence shows that backdoor and cryptography has a lot of unintended consequences okay so I want to talk about compatibility um export cipher suites are 20 years ago what are people up to now so the really interesting implications of the sort of pre-computation of fact for the number field sieve algorithm come for 1024 but Diffie-Hellman so still even now 1024 but Diffie-Hellman is really common and one of them is backwards compatibility Java 7 and earlier versions hard coded a maximum 1024 bit size for Diffie-Hellman parameter so a lot of sites on the internet did not want to increase their Diffie-Hellman size because they would break their customers there were also a number of 1024 bit groups that have been hard coded into standards and implementations and nobody saw any reason to change them these facts have been changed since so fact number one is certainly within the range of governments and it likely has been for a while these are back of the envelope estimates this one is particularly sketchy the estimates that we had in the log jam paper were too conservative so this is a slight update but still take this with a massive grain of salt I got this from asking period go drain Emmanuel Tomei for help not that long ago but extra asterisk but this is a scarily small number even though it's beyond the capabilities of any computer that we have now and it's scarily small because with a small algorithmic improvement or with special purpose hardware it becomes well within the range of a government with millions of dollars of budget a year to compute 1024 bit discrete logs these are back of the envelope estimates so an open problem is to make these rigorous so there have been sort of persistent rumors that the United States government the NSA has made some kind of massive crypt analytic breakthrough for many years that even predating the Snowden leaks so here's an article from 2012 from James Bamford and Lyard saying that rumors repeating rumors that the NSA has made some kind of enormous breakthrough several years ago in crypt analysis everybody's a target this computing breakthrough is going to give them the ability to track current public encryption he speculated that this was something having to do with AES that seems a little bit unlikely so we now know from the Snowden leaks that say the NSA has billions of dollars a year in computing resources and that they have priorities naturally of investing in groundbreaking crypt analytic capabilities finally I mentioned that many parameters for Diffie Helman are widely used so two prominent examples Oakley Group 2 which was generated in the 90s is baked into the SSH spec so a large number of SSH servers and a majority VPN servers in fact prefer to use this group for Diffie Helman key exchange for TLS Apache 2.2 generated a 1024 bit prime that was just baked into a large number of servers so I want to just highlight a couple of documents from the Snowden release showing that in fact at least suggesting that the NSA does have some kind of passive decryption capability for particularly VPNs so they're very clear that they do have the ability to decrypt passively VPN traffic they're happy about it so I feel this way when I decrypt traffic too academically so there are a number of requirements that they have for being able to decrypt this I'm going to skip through these the summary is that sort of the explanations technically for what might be going on there's a few different one one of them is that computing a 1024 bit discreet log is possible and the requirements for doing decryption by a discreet log 1024 bit discreet log computation match up very well with the requirements that they state that they have for decryption there are a few other options like say the dual EC DRBG or some other random number generator vulnerabilities this can't all be sort of custom to make decryption a little bit easier on yourself if you were doing that so sort of the big issue here is sort of 1024 bit Diffie Hellman should have been deprecated years ago and it wasn't because of multiple reasons one of them was sort of a disconnect between theory and practice the practitioners were unaware of the impact of pre computation breaking discrete logs. And the mathematical community was sort of unaware of the implementation choices that everybody was just using a handful of pretty small Diffie-Hellman groups for in the real world. And I mean the backwards compatibility thing is a real issue. So after browsers increase minimum limits on Diffie-Hellman key sizes, this breaks the internet for many users. So here's an example of someone complaining about a super obscure sounding error message on like, oh, your Diffie-Hellman key exchange size is too small. I mean these poor users, I feel bad for them. Okay, so I should probably finish now, right? So maybe I will skip the last thing that I was going to talk about. So the sort of good, I wanna end on like a slightly positive note, which is that there is a bright spot. We do know how to implement these things properly. So TLS 1.3 is making many good choices and they are proactively removing insecure things like RSA key exchange and making good choices for Diffie-Hellman, if it must still be done. And of course, this is causing complaints from people on the internet. So here's an email to the TLS mailing list from a couple months ago saying, no, you can't turn off RSA key exchange. It will break our products from the banking industry and I want to highlight Kenny Patterson's response because I think it's fantastic. My view concerning your request, no, rationale. We're trying to build a more secure internet. So this is, it's great that he did this. And it's great that we know how to do these things right so we should force people to do the right thing even if it's going to be painful. So I will end with that. Here is sort of all the papers that I mentioned. Kristen, so comment. So you kind of jokingly commented that we should have Dan Bernstein implement everything, but part of the problem is we shouldn't have thousands of people implementing cryptography on the internet. Like, is there a real venue to making crypto implementation more of a valuable exercise for people in this community and something they get credit for so that you don't have this huge disconnect? That would be good. There is a little bit of a trade-off which is that if everybody on the internet is using the same implementation, which is kind of the case for open SSL, then when there's a vulnerability in open SSL, then everybody is vulnerable to the same thing. So it's nice to have- But we're already in that state. Yeah, we're in that state. I don't know. I mean, we have free speech in the United States that you're going to tell people they can't implement crypto. I mean, we tell people that they shouldn't and then they sort of believe us sort of don't. I mean, I don't know if sort of militaristically like banning implementations until they've been vetted is a good thing or not. But yeah, I mean, at least we're in the state now that there are a few really dominant implementations and the rest is sort of small fry. I think the real long-term problem is that there's a lot of implementations that will never be updated. So like most of the problems that we're finding, you know, okay, I'm saying, look, 10% of the internet is vulnerable. That 10% of the internet that's vulnerable is not like banking sites because they actually keep open SSL up to date. It's like little routers that are never ever going to be updated. The problems are never going to be fixed for those guys. Any other questions? Actually, I have one. Now we are preparing for next standard for post-quantum crypto. What do you think are important for secure use of this post-quantum crypto? So I think that sort of focusing on the problems that we've had in the past for the, you know, non-post-quantum crypto that have been in deployment and if we sort of crystallize the different issues that implementers are going to make mistakes and there will be random number generation vulnerabilities. We don't want to have opaque parameters that are generated. We don't want to have the possibility of building in backdoors. Like you want to just avoid that entirely. And we want to be able to compensate for the possibility of pre-computation attacks and just like set parameters so that any kind of pre-computation is like entirely out of the question. So I think like we can sort of, I tried to sort of crystallize like a few observations for what went wrong here that we could, you know, use moving forward. And I think the, at least the papers, you know, proposing sort of practical versions of post-quantum schemes are taking a lot of these properties into account when trying to design them. But this should be sort of like formal design criteria. Yeah. So is there a CDC for the internet? I think the closest thing that we have is CERT organizations like the Cyber Emergency Response Team. So if there's some kind of security vulnerability on the internet, CERT will, can often, you know, coordinate between different vendors and, you know, provide sort of a central clearinghouse for information for users. This is a little bit sort of an imperfect role. And obviously they can't, they have no way to force people to do anything. So I've been involved in like a large number of security disclosures that have gone by a CERT. So often, you know, CERT doesn't get a response just like we don't get a response from people who have problems. I mean, there is a solution which is to have, you know, the kind of a driver's license for the internet that our governments would love to have so much. And I'm a little bit hesitant to ask for that kind of thing. So it's great to have an open internet. It comes with some serious trade-offs. And I don't know where the right way, the right place to sort of divide the line between authoritarianism when it comes to doing the right thing cryptographically and avoiding authoritarianism when it comes to doing the wrong thing cryptographically is. Okay. Oh. Yeah, sure. Put Kenny Patterson's quote. It sounds great, but I'm afraid that it is going to be counterproductive. The internet was not constructed in order to be secure. It was constructed in order to be useful. In this community, are trying to change the order of usefulness versus security. We think that the most important thing is to be absolutely secure and then usability issues are secondary. This is what is implied by Kenny Patterson. If we continue along this line and recommend things that implementers will find impossible to use, we are going to lose the battle, not to win the battle. People are going to ignore our advice if we go too far in recommending security measures which the rest of the world cannot really accept. So while I like a lot, it sounds great, I would recommend being more accommodating to request by real world users. Yeah. I mean, ideally there shouldn't be a trade-off between security and usability. It would be great if there weren't. I don't, yeah. We have to argue amongst ourselves in order to find the right trade-off in the real world because obviously we are having a bad impact on users. For these guys, I mean, if you're your man in the middle in RSA key exchange, I mean, like Kenny, I don't have a huge amount of sympathy for these guys, but this does mean that they will probably stick with TLS 1.2 forever. Yep. Thanks to the speaker Nadia again for insightful talk.