 Hello, everybody. I'm Nick. I'm the security engineering lead at Cloudflare. And I'm going to talk to you today about Heartbleed, if you guys aren't sick of it already after this long year. So specifically, I'm going to talk about my personal experience or my experience at Cloudflare with what happened during Heartbleed when it came out and what happened after the fact. So has anybody read this article? This is a snippet from The Verge. And it is a kind of a semi-fictional account of what happened during the disclosure of Heartbleed. I can kind of paraphrase a little bit how that conversation went, but it kind of went a little bit like this. So do you use DTLS? No, not that I know of. Does anybody use DTLS? And that's for websites, nobody used Datagram DTLS. How about TLS Heartbeats? Well, well, my answer was, well, what's a TLS Heartbeat? I mean, at this point, barely anybody knew what this was. It was a very obscure feature. And the answer here was, oh, there's the stupid and there's a bug. You should turn them off. Right, so recompile OpenSSL with OpenSSL no Heartbeats. And that was kind of that was that. So Heartbeats were off for Cloudflare. We have a pretty simple architecture, so deployed it really quickly. And the answer was, OK, looks good. Public disclosure should be around April 9th. So I guess a lot of people had this question, but why tell Cloudflare? Well, let me just go real quick and describe what Cloudflare does. So it's a reverse proxy. So if you have a website, Cloudflare can sit in front of it and block malicious traffic. That's the sort of red X up there, as well as send cached content, static content. That's the bright orange. So it reduces anybody getting to your website. That's malicious and reduces the load on your website. And for this to work, this cloud has to be closer to the visitor than it does have to be to your website. So we have this global network. So our nodes are closer to the visitors, and that's how it works. And there's over a million sites on Cloudflare, including banks, government websites, Bitcoin exchanges, almost every Bitcoin exchange is on Cloudflare, the IETF's website, Reddit. I can go on and on and on. So lots of sites, but what Cloudflare does is very simple. It's essentially three services or two. DNS, HTTP and HTTPS, which is powered by OpenSSL and Nginx. So the architecture is very simple in that every machine that we have can serve every site. So thinking back to Heartlead, this is actually, you can see why this would be a really, really bad situation for Heartlead to hit. Anyways, so it happened early, April 7th. 1027, OpenSSL published their advisory and that hit hacker news really quickly after that. Within half an hour, it hit the front page and about an hour later, we posted our standard Cloudflare. Customer sites are patched, you don't have to worry about it, sort of post. And this was, it was a thing. It was a bug and it was starting to gain some steam and then about an hour after that, there was a tweet from CodeNomicon and I think everyone can know what this is. That's the next site. This Heartlead itself, it was branded and it came out to mass media. So this became a really big deal. Heartlead.com had a logo, hit the mainstream press, Heartlead virus, I don't know if you guys remember that but people were saying, oh, there's a Heartlead virus out there and I knew it got really bad when my mother called me and said, what's going on? So this was kind of a big deal and well, we were finished patching so well, we had some time to kill. What are we gonna do? At this point, we decided on three things and one was to help keep this scanner that Filippo had from falling over. The second was to turn our network into a large honeypot to see what type of attacks or what type of scans are happening. And then the third was to decide or to figure out what we're gonna do about our certificates. We have quite a few websites that use our service and many of them use SSL and we had about 100,000 certificates and revoking them was not a really, at this point, the day after disclosure, it wasn't absolutely clear that this was something that you had to do. So first let's talk a little bit about this Heartlead scanner. So Filippo, who's now a Cloudflare engineer, wrote this server in Go and you type in your host name and it scans it for whether or not it answers Heartbeat requests that are malformed. So these are small ones around 100 bytes so that it doesn't leak anything beyond your standard frame. It shouldn't leak any information. He put it on AWS and then put it behind Cloudflare and shout out to Kyle Isom from our team for helping keep this up. But this is kind of what from Filippo's server, what it looked like, there was April 8th, up to 2,000 requests per minute. So this was a very highly used tool and that's nothing because this is the next two weeks. 2,000 is the bottom tick right there. So up to 10 to 20,000 a day, 10 to 20,000 scans per minute for the next two weeks and we held it up to 200 million tests in the first two weeks. So with the scanner up and running. Yeah. And thank you to Filippo, wherever he is, he's somewhere in the room. Stand up, this is thanks for the tool. I don't see him, he's somewhere. But in any case, this is what he found. In terms of domains, it was really bad at first. This is the ninth, this is two days after Heartbleed was originally announced, up to 30% of the sites that he scanned were vulnerable and this luckily cut down and a lot of people use this in their automated testing to validate sites, but yeah, it got down to a low number pretty quickly. So now that that was up and running, back at Cloudflare we decide, well, what can we do? Well, we can log every heartbeat that we see that comes in with a bad size and well, we can put that data in a shelf until now and that's kind of what we did. So here are logs from the ninth and 69% of them had a message size of 16384 which you might know as the largest power of two. You can fit into a signed 16-bit integer, but you might also recognize it as the hard-toded value in the SSL test Python tool that came out the first day. 20% were 121 and that was actually from Flippo site and University of Michigan, maybe, were you guys scanning on April 9th? Or in any case, there were a lot of requests that used the zero-length packet, which is another way to just check to see if your site is vulnerable, but about a week later, it was around the same. So if there were people who were mass exploiting this against sites, they were probably using just the basic SSL test Python script and around 20% were still Flippo's tests. So if you can do the math here, it seems that a lot of people were just scanning with Flippo's tests. It wasn't a lot of mass exploitation and flipping the numbers back, you see that 1% of Flippo scans were actually against sites on Cloudflare. So this is what the map looks like for where the attacks were coming from. I know IP maps are not really that interesting, they don't tell you that much, but this is where it is. There's some strange spikes like in Iceland, but don't read too much into it. Now the question is, and this is what we were thinking when it first came out was, why was it really so dangerous? Why was Heartbleed so bad? Well, it's kind of a layer six request that doesn't necessarily get logged. People don't log parts of handshakes very often, unless you have a specific IDS rule or something like that. And it's really bad in that 64K of server memory can be exfiltrated by one request. And this has login info, session cookies and perhaps TLS private keys, we didn't know. And if you kind of look at the diagram here, this is, that's the heap right there. Everything that's above the request, if you have a new request comes in, it gets put on the heap and anything that was previously removed from the heap is still sitting there. So we know that they're passwords cookies, people were finding this right away. And the question was, would the key be there? Is the key gonna sit above one of these requests? So we looked at the code, right? And what did the code say? Well, it said this can't happen, at least not in Nginx, the key gets loaded right away and therefore gets at the bottom of the stack and any time that you do allocations or for requests coming in, they're gonna be higher up on the stack and they're not gonna be able to read the original key, right? And Nginx itself was single threaded, so if you have this, it's not gonna be able to catch something halfway in the middle of another thread doing an operation. So OpenSSL has a big number library that they use and they clear the memory when they're done. So if you're doing a handshake for TLS, all of the cryptographic material is gonna be cleared by OpenSSL. At least that's what we thought, right? And we weren't sure, I mean, I just looked at some code and what do I know? So we launched the Cloudflare Heartbleed Challenge. So this was something that we did to crowdsource an answer. So we set up a standard Nginx, which was outside of Cloudflare, was on a third-party VPS and it had the vulnerable version of OpenSSL and we said to all of you guys, come and find it, see what you can do. And to show proof, give us a message signed with that private key. What did we find? Well, for the first couple hours, there was trolling and as you can... This is so many clopping because they did this, but basically anything that you post onto the page is going to be put into memory in the Nginx. So people were posting private keys in there and they were posting what looked like, you can see my name there, Nick, what looked like a passwords file. So everyone was getting really confused, getting all this, there's a private key in my Heartbleed request and but nobody was actually getting the key until we saw this tweet from Fedor and we took a look, this is Cloudflare office, that's me pointing out a television screen and yeah, he solved it. So congrats to Fedor. And so he wasn't the only one. This was in the first 24 hours or so, there were 12 people in the first 48, about 25 people had solved it and got the real key and sent proof. So can you steal private keys? Absolutely yes. And it was solved in under 10 hours and private keys can definitely be vulnerable but another thing we did was we logged where in memory the Heartbleed requests were coming in and we compared that relative to where the private key was initially allocated and they never overlapped. So how did this, how did it actually get solved? Well, there was a second bug in OpenSSL. Who would have known looking through that code? If you dump the memory of the request, all the places in red are where private keys did exist at one point and it turns out some temporary variables were not wiped. This is the code to clean up the mess, there's big num free versus big num clear free, this is just, yeah, it was just in certain cases in the Montgomery multiplication they didn't clear up, they didn't clean the partial pieces. We can do a little bit of math just to show how people actually solved it with this but RSA you have a couple different things, you have a public exponent E, you have two primes multiplied together and they make the public key and a private exponent D and if you get any one of PQ or D you get the whole private key. So what people did was they took every 128 bit block that they saw in exfiltrated Harpley data and they just tried to divide it into the modulus and if it divided, there you go, it's factored and this is how nine out of the 10 people solved it, turns out that one of the prime factors is just sitting there on the heap after tens of thousands of requests you might look upon it but one enterprising gentleman Ruben Zoo who's at University of Cambridge, he used a much cleverer method which is Coppersmith's attack which is a lattice reduction attack where you only really need about 60% of one of the private keys to find it out and this depends on the fact that the public exponent is small. So for performance reasons the public exponent in RSA is small so that any public operation is really fast but he solved it in only 50 requests so that was actually really interesting. So private keys are gone, right? What does that mean? Revocation time. You know, the internet was built for this, right? I mean, people who designed the PKI, they said, yeah, you know, people are gonna revoke 100,000 certificates in 24 hours. This is just, this is how we designed the system, right? And this is what Sands kind of reported, that's mostly us actually, that's all the internet but that's mostly Cloudflare and this is what it looked like. The blue line is revoke certificates and as you can see, that's April 7th. This is after Harpley. This is when everybody was revoking and then that green spike is Cloudflare revoking but once we revoked all these certificates we found out that, well, it didn't really mean that browsers wouldn't accept these certificates anymore and I can go into that really quickly. There are three methods to do this that were built for handling revocation certificates in X509 and first is the certificate revocation list and this is just a flat file with a list of certificates that are revoked and did us revoking 100,000 certificates break CRLs? Heck yeah, it did. So the CRL from Global Sign grew from 22 to five megs. Yeah, this basically did us the CRL server and lucky for Global Sign, Cloudflare was in front of their CRL server but unlucky for us. We're used to that kind of traffic but you can see here every three hours there are waves and this had to do with the cycles in which CRLs were updated and Microsoft Internet Explorer downloading them and yeah, it was pretty rough. Anyways, CRLs broken. How about OCSP? OCSP, well, this is the online certificate status protocol. This is a kind of question answer. Is this certificate revoked? Yes or no? And well, it's really broken and Chrome has kind of known this for a while and stopped checking but if you have hard fails, it does disallow you from using certain networks especially with captive portals and if you have a soft fail, somebody on the network can just kind of drop the OCSP response and voila, the site is no longer revoked. So it really doesn't work or scale to the degree that we'd want it to in plain OCSP being requested from the browser in that sense. How about CRL sets? This is Google's proprietary method that they basically collect all the CRLs that they can from different certificate authorities and put them together and install them in the browser. So you get updates with the browser of these sets of CRLs. I shouldn't say all the CRLs but it's specific certificates and this is kind of what we found out. They only do EV certs and special EV certs and if the browser doesn't get updated then you're not gonna get an update in your CRL set which is, it's kind of bad. So cloudflarechallenge.com, once it was solved we revoked it and the way it was not an EV cert but Chrome did mark it as revoked because well they added it manually into a JSON file. So none of the 100,000 Cloudflare certificates were being marked as revoked by Google Chrome. So we basically made a hack and I don't know if you can read this but it's the most efficient four lines of C++ revocation. This is in Chromium, it revokes all of our certificates. This is a hack, this is not scalable, you shouldn't do it this way but it was how it had to get done because there were no valid ways of doing revocation. So yeah, revocation is pretty much broken and what can we do? Well there's shorter certificate expiration periods that could help, that would at least help with the CRLs because you won't have to be holding on to old certificates for very long. You can sort of shrink the size of the sets pretty quickly. OCSP must staple is an extension that requires you to send the server to send the OCSP response in the handshake. That can help too, certificate transparency, something else has thrown out there to solve this but none of these are widespread and none of these have been implemented so something has to work for revocation. So I guess in summary, there are three things that we did after Heartbleed. We kept the scanner from falling over, turned our site into a honeypot and while we definitively answered that, yes, you have to revoke your certificates and there's no excuse. So there's a lot of takeaways from Heartbleed, right? I don't want to be the one to sort of tell everybody how to take away what to learn from this but open source disclosure is hard and this really was the first one of the year of what turned out to be many open source disclosures and we did learn a lot of lessons of how to do that correctly. Other things that pointed out, which seem obvious to people but weren't obvious or weren't in OpenSSL which is features should be disabled by default. Nobody who installed OpenSSL 1.01 wanted heartbeats necessarily so turn off features by default. Another thing is well, expect the unexpected that's sort of obvious in computer security. We didn't really learn that from Heartbleed but it was definitely a shock when it came. Other things, these attacks, a lot of the attacks on real sites that Cloudflare saw, we saw quite a few attacks as I mentioned but a lot of them were just scans from people just trying to see if their sites were alive so that's a reassuring sign. Crowdsourcing was effective for the Heartbleed challenge. I couldn't find the private keys and luckily there are very smart people out there who were able to find it. Is anybody here in the auditorium a winner of the Cloudflare challenge? Is anybody here? I don't see any hands. But anyways, I don't have my glasses so congratulations wherever you are. And the last thing is revocation needs a solution and the last conclusion from this is really support OpenSSL. It's, I messed up my microphone there but thanks. No really, I mean this is part of critical infrastructure for the world and for not only websites but as Zach here was telling in embedded devices and these guys need support. So please support OpenSSL and I'm done, thank you. Okay, quick announcement before we start the Q&A if you are going to leave the room, please get up now, go out quietly without talking so we can do the Q&A nice and quickly. Also, you will be only able to leave the room at this point. Yes, if you have any questions, please line up at the microphones. We'll start with microphone number one. Thanks, not really a question, thanks for the talk. We really followed the progress even before the talk with your paper you published and we just wanna contribute a piece of missing information. So the first attacks, as you called them, 24 hours after the disclosure that were us. We weren't really attacking. We started working on a scanner about 20 minutes after the public information and it was a top scanner in the region. So it was a scanner and used multiple different methods. As we went, we came to the conclusion that there may be firewalls be deployed in between which may stop some of the packets. So we used many different packets in one, in every request. So that's also a scanner there. Is your mic not on? No. Quick interruption, if you're leaving on the ground floor, please only use the front left and the front right door. If you're up there, you can use any door you want, but if you're down here, please only use this door and this door to leave the room. Thank you. Does your mic work yet? Hello? Yes, great. Yep, thanks for the comment. I don't think I saw anything specific to your account, but... Yeah, no, no, it's great to know. So, I mean, it's one of those things that's really hard to tell from our perspective whether something's gonna attack, whether someone's malicious or whether there's someone just scanning. A lot of the hosts that we see don't have any information that identify what the intent is. Microphone number two, please. Hello? I have a question to CloudFair. Could you stop blocking toy users, please? You make the internet more central every day. All the fancy homepages which they're and come on the door network is so small, you could handle it with that finger-snip, all of the bandwidth and you're blocking all the end nodes, all the time. We're looking into solutions for Tor. The main problem is that there is a lot of spam on Tor for websites and right now filtering through the good and the bad is something that we're working on and we are focusing on that this year. So, look for something coming up this year for Tor users. Thank you. Next up, we have a question from our signal angel asked on IRC. Yeah, thank you. So, given that maximum TLS record length is 16 kilobyte, how is it possible to even get back 64 kilobytes? It's, you can split, you can split a heartbeat request over multiple records. I believe, I think that I'm pretty sure that's how it works. Microphone number one, please. What is your opinion on Libra SSL? Go for it. I'm not sure if I'm the most qualified person to answer that, to be honest. I think there's well intentioned, I think the community acknowledges that there are a lot of problems in open SSL in terms of its maintainability. It's one solution, I'm not sure if it's the correct one. Yeah, I think Libra SSL works for its goal, which is open BSD. They have a portable version by now, as well as they did with all their other forks, and it's much cleaner code. And still open SSL tries to support way old systems like Windows 16 bit and bullshit you really don't need. So, the un-maintainability will remain for a while. Yeah, I agree that open SSL does support quite a lot of different platforms and some people need that. But the different forks of open SSL that have come recently, Libra SSL, Boring SSL, are taking patches from each other. So, I think the more people looking at this project, the better. We have time for one more question. Microphone number three, please. Hello, and both your talks, you showed numbers of vulnerable sites decreasing and increasing again afterwards, can you explain that? So, I think there were small bumps that were just people coming and going. I don't think we saw a lot of websites that became vulnerable. I think a lot of it is just pieces of measurements, websites that came and go between different scans. I don't think there were any large jumps that we saw. When it comes to the data that I was showing, this is from Filippo's scanner and not everybody was scanning the same domains every day. So, that's just standard variance. Thank you very much, Sakira Nick. Please give our speakers a warm round of applause.