 Thanks, I'm Jesse Burns from ISAC Partners, and I'm Peter Eckersley from the Electronic Frontier Foundation, and we're here to talk about observing the SSLEverse. So this is a project that both EFF and ISAC have been working on thanks to some very generous assistance from the NLMAP Foundation. What we've done is we've downloaded a set of all the publicly visible HTTPS SSL certificates and where we've built a big database of them, and we're looking through to see what interesting things we can find. There's just a little overview of what we're going to talk about, why we need the observatory and then some of the results we found. We'll talk about conclusions and future work at the end. Surprising that. So as you've probably realized at this point, HTTPS is a pretty important protocol. We do almost everything over the web, and the only hope that the web has to not get completely owned by everything is to encrypt itself and use SSL. And HTTPS relies on certificates in order to ensure that the website you're talking to is actually the website you want to be talking to and not some some person in Bulgaria who's just put up a fake site that looks a lot like the real one. And the key component of this infrastructure for checking who you're talking to certificate authorities. They're these people who go around and they sign certificates saying this public key really belongs to Google.com. This public key really belongs to Microsoft.com. This public key really belongs to your blog. And as you should imagine any time you hear the word authority, you should question it. The best answer to authority is transparency and accountability. So we decided to apply some of that to the existing state of the CA world. Now there's a reason why we're doing that as well. And that is there's been a growing amount of concern that the way that SSL uses certificate authorities is inherently problematic. This is there's a growing amount of work of this kind Chris Sagoian and Sid Stam recently did some, but then last year there were three instances in which we observed attacks against HTTPS that were a result of mistakes or oversights by certificate authorities. Now one of those was as simple as saying we'll give a certificate to anyone who can answer, show that they answer administrative email at a domain. And here's a list of 20 email addresses at that domain. You know, Root, Admin, Administrator, Postmaster, et cetera, et cetera. If you can answer any one of these then we'll give you a certificate. And of course out there on the web there are some web mail providers who haven't reserved all of those names. So... But there are more sophisticated attacks. So there's the attack by Soterov et al where they observe that some CA's use MD5 as part of their certification process and then fail to randomize a serial number. And so they were able to find MD5 collision and also predict the serial number so that they could make the MD5 collision work in practice against the CA. And when that happens it completely breaks HTTPS for everyone. There was another example where certificate authorities were found to be signing certificates that had slash zeros, zero bytes inside the names of websites. And then of course, although the certificates use Pascal string, so it's okay to have a zero byte in the middle of the string somewhere, browsers you see. And so they'd interpret that slash zero as the end of the string and add a different semantics. These attacks, unfortunately, may pertain only to one CA. But there are a lot of them and the security you get for HTTPS is only as good as your weakest CA. And so it starts to matter how many of them there are and what they're really up to. And in a sense our project, which we're going to report on today, or at least initially report on, is to try and find out who these CA's are and what they're up to. So we download everything and see what we can learn from the results. All right, so our observatory has a little bit of infrastructure. Actually, we didn't need a lot of stuff to do this, but we did need a little patience. So we had three low-end Linux servers with only two gigs of RAM each, a good shared 100 megabit network connection with people that were supportive, NMAP with poor timing settings, some Python code, and some time. So we ran this thing and we did all the analysis, or most of the analysis, on a single year old I-920 CPU with a fast disk and a lot of RAM and two little laptops. So tying together OpenSSL, MySQL database, and lots of little Python scripts. Currently, unfortunately, our community contribution is very poor, but we intend on making a distribution of our database. We have a little problem where we've uncovered some vulnerabilities and we need to finish all our disclosure before we can do that. Yeah, so put it in another way. Hopefully in a few weeks, we'll have a website up at EFF, or a page up at EFF where you can download the entire data set that we got here and see if you can improve on our findings. It'll be a big, bit torrent download. And also some little web forms where you can say, enter a domain and say, okay, these are all the certificates that were observed in other weird shady parts of the internet that claim to answer for that domain. So the first thing we did, we said, well, we need to know where the SSL servers are. And so some people, there have been some pre-existing projects that have gone looking for them by, say, compiling a list of the top one million names and resolving them and then going to see if they answer over HTTBS. We took a sort of more brute force approach. We contacted every IPv4 address on port 443 using Nmap, send it a SIN. And if it came back with an AC, we just jotted down that fact. We did this with these little work units of the form number dot star dot star dot number, it's a bit like the work units used for SETI at home, except instead of observing the sky, observing SSL. You need to do that so that if something crashes halfway through, you don't have to start the whole scan again at the beginning. So you have the work units you've done and the ones that are still remaining. And then once we've seen an IP address reply, we then drop down that and come back later with little Python script that will send an SSL hello message and just record the reply that it gets. One of the things that's kind of nice about this moment in time, it's 2010, it's perhaps the last time that you can reasonably take this approach, we're running out of IPv4 address space, IPv6 is 128 bits, we're not going to be sending SINs to each of those. And there's these new technologies that are improving the internet, like server name indication, SNI and Spidey and other protocols that extend SSL and make it so that when you say hello to a server and ask it for a certificate chain, you have to know who it is that you want to talk to before it tells you who it is, that will make our lives harder. Yeah, another way of saying that is currently virtual hosting doesn't really work for HTTPS. It does work for modern browsers, but anyone who's deploying a website needs to support IA7, IA6, all these legacy browsers. Until those are gone, you can't do virtual hosting and SSL. So in a sense, we're relying on that in order to make this work. Extraction and certificates. So maybe what we could have done would have been easier if we'd have used something off the shelf, but what we decided to do was write a custom client with Python and construct. And part of the idea there was that we might have bugs in our understanding of the TLS protocol, but we want the bugs, the parsing errors to be bugs that we could understand. That was really important to us. So we didn't have to implement all of TLS in order to do this, just these little parts I've outlined here. And that allows us to detect failures. We went off the RFC definitions, and then we just had to kind of debug it with Wireshark and some test cases. Turns out that the RFC is a little ambiguous here and there, and you really have to dig around to find some places that work. But once we did that, we were able to collect SSL certificates without actually having to do the key agreement part. So that's very efficient. You just say hello, and it sends back some bytes, and you drop the connection when you see that they've given you everything you need to reconstruct your whole certificate chain. You don't even need to do any key agreement stuff at all. So we can't. No crypto math, basically. Yeah, no crypto, so it's nice and quick. And then if we make a mistake, we can go back after the fact and reanalyze. So at the end of this process, we are left with a big pile of X509 certificates. And as an aside or a reminder, it's worth dwelling on what X509 actually is. It's a standard, or at least a recommendation, that was promulgated by the International Telecommunications Union, sort of the International Club of Telcos, in the 1980s, back before the web existed and SSL existed. And the general advantage that the protocol has is it's amazingly flexible and extensible. It's completely abstract, completely general. You can do anything with it. These are also its disadvantages. And so the fact that it was able to be kind of retrospectively bolted onto the web is maybe a testament to its flexibility. And also, we might have been better served with a lean, narrowly defined way to do encryption for the web. But instead, what we have is this complicated infrastructure that has a lot of security features that are hard to understand, especially when you start to take the intersection of all of them. So we have these things. We need to pause them. There's no real right way to parse X509 certificates. I mean, there's ways that they say to do it in the specs. But one of the things that we're really interested in is how maybe two different implementations might look at an X509 certificate and see something different in there, like the great example of those nulls. So we didn't want to try and write it the right way. What we did was we decided to do it one of the wrong ways that was fairly easy and give us a lot of data fast. And that was to use OpenSSL's X509 text pretty printer and then to parse that. And we've actually experimented with some other ways of doing this because obviously there's information that shows up in certificates that your C client might not be the best thing to see. You can use a certificate parser. You can write your own code very similar to the OpenSSL verify command or the OpenSSL X509 command. All that source is available. It's easy to see how they're doing that and make their own little tweaked versions. But ideally, we want to have different views of the certificates. And then we can agree on fingerprints, which is just the hash of the dir encoded bytes of the certs, which should never change. And we can join all that together and conduct our analyses to see maybe which views are different. So in order to analyze this data that we just paused, we put it all in a gigantic set of MySQL tables. And then as we noticed interesting questions that we wanted to ask or interesting properties we wanted to examine, like the relationship between domain names. There might be many certs that in different ways claim to be relevant to a particular domain. You want to build separate tables to capture those mappings. And then once you've built this Big MySQL infrastructure, basically a question that you want to ask about the state of SSL turns into a fancy query you can write. And we have a couple of examples of these later on in the slides. And then once we're ready to publish this data, hopefully you can all go out and write your own queries against this data set. And in a way, this approach is sort of, it gives us the flexibility that you need to handle all the crazy internal complexity you've got inside x5 by 9. And then the first thing that we did with this set of tables was try to work out which of the certificates was valid. And that turned out to be quite tricky. It's very important, and there are some more, but perhaps not all of the gory details later. But what we found in general was, OK, there are 16.2 million machines on the internet, at least when we scanned, that replied on port 443 with an act. And then of those, if you actually looked at what they said back to you, 10.8 million, or they're about to them, gave an SSL response. The others, who knows what they were. They were random bytes that they came back with. And then of the 10.8 SSL handshakes, 4.3 or so million of those had a valid SSL search chain. The others were self-signed, expired, invalid in some way. And then if you look at the valid ones, a lot of them are the same chain repeated on multiple IP addresses. So you get down to 1.3 million valid leaves if you want to throw away the repeated chains. Which is very close to the number of valid certificates. There's a few extra ones in there we'll talk about later. So a little crash course on what it means to be a valid X519 certificate chain. When you put up a website, running HTTPS, and you just serve up the certificate you got signed by a CA, that will often work. Because the trust route chain is already that chain, the subordinate CA that signed your certificate is already present in the browser. Usually though, you send back a trust chain. So you send a couple of certificates back, and then the browser orders those and figures out if this chain really maps back to a trust route. The trust route is what grants you the authority. So this is a really important process, so I just want to give you the dirty version of it. When you check a link in the chain, you check that the issuer on the certificate you're looking at matches the subject of its parent. If there are these authority and subject key IDs in either of them, then they must match. You check that the dates are valid, and you check that the key usage is correct. So this isn't a code signing thing that I'm doing. This is an SSL chain that I'm verifying. I don't want to see your code signing certain here. Thanks. And then you also have to look and make sure that there's no critical property, something flagged in that certificate that you don't understand what it means. Because maybe that means, hey, this certificate isn't valid for you. And that stops us from missing important extensions that we need to be able to grok. That's a great question. And it would be a really easy bunch of SQL queries to do. So we're not going to finish this talk early, probably. But if we finish it early, we could have done the SQL queries for you, showing you live on the screen. Failing that, we can go and do it in the breakout session, or once the data set is public, it'll be pretty easy for you guys to just go and look at those properties yourself. He's pretty good at quickly turning those into pictures, too. So there's a big partition out there between valid and invalid sets. And there are a few that were on the gray line in between that we had to struggle with, as we learned a bit more about this. One example that we discovered is that sometimes a certificate, whether a certificate is valid in Firefox or not, depends on what other sites you've been to beforehand. That was non-intuitive. But what happens is whenever Firefox sees a subordinate or intermediate CA that it considers valid, it caches it. And that's part of the system. Those things are valid if they're signed by a route you trust, so you cache them. And so sometimes a chain will depend on an intermediate or subordinate CA that it doesn't contain. And so if you've already cached that middle cert when you go to the site, it's valid. If you haven't already cached it, it isn't. So there's crazy stuff like this. But aside from that thin gray line in the middle, you've got a world of black and white. And in the black world, the invalid world, there's all sorts of crazy stuff going on. There are lots of certs out on the internet that claim to be Microsoft, claim to be Google, claim to be Star. There are things that look like telcos in Southeast Asian countries. Maybe they're man in the middle and the user's more likely what it is is that they have a WAP gateway, which is the old mobile internet thing. And those require you to impersonate the server because they didn't have proper infrastructure for TLS in there. So there's all this crazy stuff out there. But most of this talk, unless otherwise noted, is about the white certs, the certs that are valid that check out when you go to them with a modern Firefox or IE, and they give you the little lock icon. And what kind of questions do we want to ask about them? How many are there? Who are they? What do they sign? Is there anyone impersonating anyone? So the number of trusted CAs is kind of a tricky question. So if we're going to figure out who browsers trust, we want to know what the trust routes are. And if you just fire up those browsers, maybe you look in your Mozilla trust route, you see 124 trust routes representing around 60 organizations. That's some indication. When you fire up Windows 7, open up IE, you see 19 trust routes. And you think, oh, wow, those guys have really tightened that down. Except what Microsoft actually does is when you're going to a website and your browser doesn't instantly check out the route isn't in your list of 19, IE actually goes and pings Microsoft and says, so should I really trust this route CA? And Microsoft has a list of about 300 of these certs that they might give you. And they come from 100 or so controlling organizations. So one really nasty thing about this user interface is if you care about who you trust and you go and look at the IE list, you don't get the real list. You get this little sanitized version of it. And so we had to go and get the real sorted complete IE list and use that as the. Although, to be fair, they do document this whole process and actually provide a lot of insight as to what they'll do if, say, breaks like the break that just happened last year happens to SHA-1, they have a whole plan for what they're going to do with eliminating certificate authorities say that don't bother to randomize their serial number. So they'll invalidate them long before they'll invalidate people that properly invalidate them. I didn't mean to call Microsoft sorted. So how many of these things are there? I mean, obviously you've got this set of routes that Microsoft and Firefox trust explicitly. But then of course, because so many of these are delegated and have the power to create new CA certificates, there's actually a larger set that we found. There are 1482 that were valid for either Windows and Firefox. Now, remember these are often multiple certs controlled by one organization. So the large number in and of itself isn't necessarily damning. In fact, it's good practice to keep your most powerful keys off the internet and then have one that has a shorter expiry date, perhaps on a server somewhere where you're actually signing things. But if you say how many issuer strings, which is the kind of magic thing that you look up the cert by, are there? There are 1,100. And if you look inside the issuer, there's a field that says organization. Sometimes, if you count those plus some fallback options, you get about 650 organizations that have the power to create, to sign a cert that your browser will trust to be Google.com. Now, some of those organizations might be owned by some of the others and the jurisdictions that they work in might overlap, it's complicated. But the general principle that you should remember underlying all of this is that if a CA cert can sign one domain, can sign every domain on the internet. So, each CA has many certificates, but the combination of the issuer and the authority key ID, which is actually the subject key ID in the issuer cert, that's a unique combination. So, you can look at the top CAs that we found in use on the internet, these are valid signatures. We found one GoDaddy cert, this is its fingerprint here. It's signed 300,000 certificates. The next biggest one was Equifax, they signed 244,000. When you get down to thought, you find one that doesn't have a subject key ID in it, which means that when you're looking it up, you have to look for every certificate in your trust route that has that issuer name and that can take a lot of time because you need to try, if there's multiple keys in there, you have to do expensive PKI or public key operations for each of those to validate that it's true and if one isn't correct, maybe the next one is, so you have to keep trying. And then, 85,000 plus from user trusts, four certs, all with the same skid, which is kind of a strange thing. It's possible that when you're doing certificate revocation, you might want to have kind of different domains. It's really a weird kind of world to see which ones have unique keys and which ones have unique skids. So one of the things about these validity assumptions here is that we're doing this with OpenSSL and a version of OpenSSL that isn't too fussy about MD5. It's a little bit older and ones that are valid based on the Firefox trust routes or on all the IE ones that you get and you can just download those and we have a contributor who did that. Oh, skid, skid is the subject key identifier. It's an X509B3 option. They showed up in like 1998. I'll talk about them a little more later, but it's one of the things that if it's present, they have to match. If it isn't present, they don't have to match and it should be the keys identifier. In fact, there's a suggestion that maybe it's the shaw one hash of the key, but it doesn't have to be. Yeah, that's a great question actually. So yeah, that's a really great question. The question was how do you know that we have all the Microsoft ones? Maybe they have secret ones. So there's a lot of stuff going on here where certificate like trust routes sign subordinates and those subordinates have the same powers basically as the route, right? They can sign anything they want. They can sign other subordinates. You can see how sometimes trust routes will actually say when you ask them, oh, could you tell us all of your subordinates see as you sign it? Oh, that's private information. You know, that's sort of... So maybe a better, a different interpretation of what you might have meant to ask. What we did is we like, one of actually an ISEC intern got a version of XP and stored all the service packs and this was before Windows had completely rolled out its magic update to the cert and so you actually got all the certs or a very large list of certs. We can't be sure that we got all of them. We got a very large list of certs that Microsoft would push as updates to Windows XP, but we can't be sure we've got that entire set. And it's almost certain that there are some certs in private domains that we can't see and also that Microsoft could indeed, if they got a court order and decided it was valid, which is a separate question, they could silently add another cert to their sort of trust routes and we might never know. Actually, any of these trust routes can do this though, that was my point, right? You don't have to worry about Microsoft. You can worry about this funny company in the UAE that has one, right? They can do the same thing and they don't need to add anything to your trust route and it's even kind of better because you don't see it in the trust route there. If Microsoft added a root cert, as soon as someone started using it, it'd become visible in your browser and you'd be able to see that's the chain, right? These intermediate ones are a little bit sneakier. I think we better not take any more questions right now because we've got a lot of material to cover and not a lot of time. Just pop this for one second. This is an example of the top 14 certs that are actually used to sign leaves on the internet and you can see that a lot of them are signed by the first two and then it kind of trickles off and then that big gray area there is the other and it gets kind of messy at the end. So a different way of breaking that down, this was literally which certs were sort of immediately signing things. This is counting by root CA's. So not just what is signed directly by the root CA but what is signed by the entire tree of things following and the 340 or so that are 343 that exist between Firefox and Windows. And so you see this extremely skewed distribution in fact you can't really see what's going on in this graph so we made a log version of it and cheated a little bit actually that's zero or not one but you can't have zero on a log scale. And so what you can see here is that about half of these root CA's aren't signing anything. That might be because they're code signing certs that aren't used for signing SSL leaf nodes and so when you look at the SSL leafers that's not what you see but some of them are definitely SSL signing certs. Yeah so some of these and some of them might be new certs that are about to be deployed but generally speaking half of the certs we trust don't sign anything and then there's another small group that sign one, two, three, four, up to 10 and then gradually you start to get certs that are significant. So we don't want to provide specific interpretation for this graph yet but it is quite interesting that it has this shape. We just actually talked about some of these. Yeah I'll blow through this quick but there's this good case for a leaf to be unused which is that when you have a more secure certificate you need to push it out there. So I've got this new trust route it's signed with a bigger key than the old one it's got a nice late expiration date. I'm gonna put it out there I'm not gonna sign anything with it until it's been widely adopted by all the different browser vendors. That's totally a legitimate case and it actually improves the overall security of the system so you can't just say oh this isn't used it's bad. Also you might need a backup route perhaps where it gets a little less legitimate is when you're thinking about subordinate CAs. I'm not sure why these would be unused. It seems like if you've got subordinate CAs those are pretty easy to get reissued as evidenced by the hundreds of them that have been issued. It seems like backup planning revocation should be an easier story for those. So if anyone has any good arguments for that I'd love to hear it in questions. So do you want to actually do it? Yeah sure, sure. So many valid CAs search share keys. So this is a funny thing you've got this cert it gives some authority maybe it's a CAs cert it's good for a period of time and it's got some power like it can be used for signing SSL certificates. And you know you've got another one right beside it and oh that's funny they have the same key which means if you break certificate A you broke certificate B right you can impersonate one with the other. So fortunately we didn't see any valid CAs keys that were used in non-CA keys because that would be terrifying. It would imply that you know some little edge sitting on a web server somewhere was sitting with a real CA cert and all you'd have to do is pop that thing and you'd be in a good spot to do some harm. We identified 80 distinct keys that were used in multiple CA certs and the most widely reused one was this verisign key from 2006 it's a strong 2048 bit key. The certs all share a subject and they lack subject or authority IDs. Four of them there's only four of these keys all expire simultaneously in 2021 but one of them doesn't expire until 2036. So when you look at that you think like what is going on here why is it okay for these first ones to last only until 2021 but that last one until 2036 and you see a lot of interesting behavior like that. So another place we looked at this was here's just another example of this phenomenon. So these keys were shared between an organization so we see certificate one and certificate two. Certificate two is a Komodo cert from the UK and certificate one is an optimum SSL cert from the United States and these two separate entities or two separate organizations have the same key in their cert. So maybe they own each other they have simultaneous exploration so it seems like they're designed to kind of be considered valid for the same period of time. The business interests of these entities are obviously intimately aligned because they're sharing key material. So they have the same skids and A-kids and key usages it's a really weird thing. Whenever you have two countries or two organizations with one public key it's kind of a fun example to look at because you know that there's something going on here and the issuing authority here is use of trust. So obviously whoever that is they're cool with those two people sharing their private keys. So here's another example where we've got three different certificates all sharing a key. The two SHA-1 fingerprints of the certificates in question oh yeah sorry so those SHA-1s are just the SHA of the DER encoding so you can look them up. You've got two different countries here and two organizations with one public key. The issuing authority is interestingly enough user trust again and so we've got this positive software in Komodo now sharing stuff across countries with their light SSL. The neat thing about this is that certificates one and two expire in 2020 and number three expires 10 months earlier. Two and three actually share start dates but over 44,000 certs are using this key ID. So some CAs are using these shared keys to delay the expiration of their certificates. So just looking from number two here you can tell that this short key has been used before and how long it's been available to attackers. Verisign had a 1024 bit key shared by four certs with a total lifetime of 7,000 days. When extensions were created for that key the new key was kept like at the old start time so that examining trust rates would give the user an accurate understanding of how long the attackers have had to brute force this key. It's not a very strong key, it's only 1,024 bits. The key start date is from 1997 which is before Akids and Skids started showing up those key identifiers that make validation more efficient. So this little trick here is adding 2,392 days to this key's lifetime which was only supposed to be 6,000 days originally. So kind of, I don't know if that's bad but it's interesting and it's a policy decision that you can only sort of start thinking about when you look at all the different keys that are out there and all the subordinate keys and the subordinate keys aren't always visible to you because when you look in the trust routes it doesn't tell you what all the subordinates are. All right, I wanna do this one too. So one of my little bug bears is that when you're using SSL you've got this idea, you've got a user that wants to connect to something and they should know kind of what that is like maybe I wanna talk to Google or maybe I wanna talk to EFF or to Peter, right? So there's a meeting of minds though between me between the certificate authority that asserts that Peter is who he says he is and Peter, right? So I connect to him. This RFC 1918 defines our nice little reserved IP addresses like the 10 and 192168 space. There's CAs out there that are handing out these addresses as certs. So they're saying, oh yeah, you're the real 192.168.2.1 or .1.2. So US Aquifax asserts that that address is in Texas and Belgian Global Sign actually puts it in the US too. Oh, but also in the UK and in Switzerland and in Belgium. Oh, and it also says that that's also 77.76.108.82. So you might not have known all that. Anyway, I think that's a terrible thing when you see people signing IP addresses that are in reserve spaces. They also sign unqualified names and that's kind of cute. We found one CA was signing, Komodo signed like 6,000 of these local host certs. And this is actually the most common valid certificate on the internet by name, local host. And some CAs like Cybertrust, Entrust, Aquifax and Microsoft and Verisign actually only signed one cert that we saw with the name local host. Almost like they kept track of the names that they were assigning. Which means these other ones are obviously... Yeah, of course, in addition to local host, you've got mail and lots of other kind of natural LAN names that shouldn't really be here. The subjective set. So yeah, this is... Oh yeah, some countries aren't even using their own CAs. So we found that Macau has its own 2048 bit CA that's in XP, but it doesn't use them. It doesn't even use a Chinese or a Portuguese CA. It uses, it signs its government's websites with commercial certificates from US and UK companies. So there are some funny week sets out there. There are some that have a 508 bit RSA keys. Just a couple of those, but they were valid. Those are the fingerprints or at least portions of them. And there are some vulnerabilities and other weird things. So the biggest thing we found, what we did is got a table of the blacklist of keys that resulted from the Debian OpenSSL bug. Debian had this bug that was present from 2006 to 2008 where their random number of the generator in SSL wasn't functioning correctly because of an attempt to fix a bug that wasn't really there. The end result was that only 15 to 17 bits of entropy in OpenSSL's keys on that platform at that time, which means that those private keys are not private. You can just go and make a list of them all. They're public private keys. So if you see one of these keys in use on the internet, it means that anyone can just go and jump into that SSL session and start messing around. And so here's a little query effectively that tells you select subject from certs, join against the blacklist where the char one of your cert is the same as the hash in the blacklist. And what do you find? Well, there are 28,000 certificates on the internet that have this bug in them. 28,000 certificates where the private key isn't private. Fortunately, only 500 of those 28,000 are actually valid. So we're not talking about quite as much of a problem, but of the other 28,000, 12,000 of those are CA certs. So unfortunately, probably what's going on with some of those is that they're being used for private PKI's inside organizations. And those organizations need to really replace their infrastructure really, well, two years ago. It's sort of bad that this is still happening. And really, okay, people can be excused for messing up and using an old version of something on their machines, but CA's cannot be excused for signing certificates for known weak keys. And they can't be excused for failing to revoke the certificates they've signed when they have a straightforward list that they could check against. Now, there are some CA's that did okay by this measure. Starfield five out of five revoked the certificates that had the blacklist keys. Commodore got 29 out of 30. I don't know what happened to that 30th one. It's actually kind of an important server. I'm not gonna say anything about what it is, but until it's fixed, but it's kind of bad that that 30th is there. User trust, similarly. And some CA's did a really bad job. Equifax none out of 140 of the weak keys were revoked. Cyber trust got four out of 125, which presumably means they were revoked for reasons other than the weakness of the key, et cetera, et cetera. So we just discovered this a day or two ago as we were finally cramming our slides together. Once we've had time to contact the people who need to switch out these keys, then we'll be ready to actually publish the database. We don't want to do that before that's done. There are also some other weird kinds of certificates out there. If you go looking at like all sorts of other queries you can ask and you get something strange. So here's an example of one. This is a slight stylization of the SQL query, but it's approximately how it works. And what it's asking is it's saying, do valid certificates agree? There are two parts of a certificate that pertain to whether it's a CA or not. Actually more, I'm simplifying when I say there are only two parts, but there's one part that says, am I a CA? True or false? And there's another part that says, what is this key used for? And one of the flags that can be in the key usage field is signing certificates. And so these two fields should agree. There is precisely one valid certificate on the internet where they do not agree. It's a certificate that's marked as not a CA and yet when you look inside the key usage field, it says it's allowed to sign certificates. Now, I have no idea whether all clients process the certificate correctly. Perhaps they do, but it should never have been signed in the first place. It was signed by Cuivadas, the CA based in Bermuda, and I guess they do things liberally in Bermuda. So we decided also to create some pretty pictures of the set of CA's and which ones sign subordinate and intermediate CA's for each other. What does that look like? And so feed all of that data set into graph is and you get this. That's not pretty. It's not pretty and it's kind of hard to see on this screen. So maybe we can try to zoom in. That's not even a root CA. That's an intermediate CA that signed a lot of things. Mostly universities in Germany, 247 of them. But you still can't see very much. You couldn't see what that name was from this graph, but you keep zooming in. It's hard to see what's going on. So if we have time, which we probably won't, maybe in the breakout session, we can let you zoom in and fly around this graph and see all the gory details. Yeah, there's a lot of complicated things here and it's not very instructional, unfortunately, but we can sit back and take a look at some of the more interesting subordinate CA's. Some of the ones that we saw that we thought were kind of cool was I didn't know that the DHS had a subordinate CA cert. And it operates an organization, I'm sure they signed up all the paperwork and we didn't see them doing anything abusive with it, for sure, but it's interesting that they had that. I didn't know that I was trusting them. CCNIC, I mean, I know that I trust them, but I didn't know I was trusting them in my browser. Yes, yes, that's what I meant to say. Then there's also the CNNIC, which there was a, sorry. Maybe CNNIC, that's Ms. Todd. That's right. That had this subordinate CA that in 2009 it got added to the trust route and a lot of people got excited about it. And it turns out that we found a, it had been signed by a subordinate CA in 2007, so you already trusted it before you had the whole fight over whether or not you were gonna trust it, so that's good. And then there's this company, Teller. ATSILAT, which is a company in the United Arab Emirates. This is not a discovery at all, thanks at least to Krista Goyan, and maybe other people before that, for observing. This company was known to have installed malware on the blackberries of 100,000 of its customers, and yet it's trusted by all of the world's browsers to sign any domain. And it's not trusted explicitly, it's trusted indirectly because other people have signed it. So you sign a little malware, and then people keep bugging you about it for years. It's terrible, poor guys. Anyway, UAE doesn't otherwise have any CA's operating in its country too, so that's its court's like little wedge on signing stuff, so great. Boozell and Hamilton, I like them because I met a cool guy there at a conference and he was giving a great talk about how you can use web attacks, and because he does that for his work, and that's good, but don't worry, he's doing it for the government so it's all legal. Gem and I, they're an observatory, and hey, I want a CA, sorry, that'd be cool. And then lots of... We're an observatory too. Yeah, yeah, totally, come on. So companies, Dell, Ford, Google, Marks and Spencer, Vodafone, some of those, it makes a lot of sense that Google has a CA, sir. They have a lot of stuff on the internet, but Ford, I was a little surprised by that one. And then there's hundreds more, that's kind of fun. So subordinate CA's, countries with valid CA's, we saw about 46 countries with valid CA's. The most prominent by CA's were the United States, South Africa because of thought, the UK, Belgium, Japan, Germany, the Netherlands, and Israel. Those top 10 countries each had over 10K valid certificates that we saw that were signed with their certs. Through subordinate CA's, the following countries didn't appear to have a root CA, but they kind of gained a CA. So the UAE doesn't have a CA for its country, root CA, but it has a subordinate CA in its jurisdiction, so maybe its courts can ask them to give them a little spying permission or something. Same with Iceland, Luxembourg, Macedonia, Malaysia, and the Russian Federation. Yeah, so 64 roots didn't include a country. Most of those are probably US based. If you have a big company in the US, maybe they don't feel like they have to say what country they're in because they're big. And then a little bit about the unwashed self-signed masses. So some people choose to use self-signed certs, and it's arguably quite reasonable if you don't need a trusted introducer, why pay for it, and then adopt all that complexity of the CA infrastructure. The cost and complexity of the model should then be lower, and it's already widely used for SSH, which is what we use to install our certs. So in principle, it kind of reduces your attack surface by eliminating the risk of random subordinates or trust routes, assuming that they're your web mail server. In practice though, it's a lot trickier. For one, certificates can have multiple names, and we wouldn't want to have to take one from one website and have it used on another. Modern browsers like IE Firefox and Chrome all track what sites self-signed certs were approved for. So even if subject alternate names also want to approve startoutgoogle.com or some other sensitive domain, the cert won't be accepted for those. Firefox even provides this nice UI that lets you go through and see those, and that might make you think that you could start implementing some of this tofu or persistence of key, sorry, tofu is trust on first use. It's kind of like the SSH security model. But unfortunately, even in that browser, when you go to a site, you say, yeah, this is the cert I want to use to identify it. I accept that, permanently store it, and then someone serves up a different certificate with a trust name that validates to a trust route. It lets that replace it. So it means you better not have self-signed certs for mail or 192.168.1 to two, because attackers with real CA sign certs, which they can get anywhere they want, could easily impersonate them. So there's a big picture question here, which is, okay, we've seen that there is far too much trust, far too much promiscuous trust out there in the SSL universe. And it obviously leads you to ask, so is HTTPS fundamentally broken here? Like what's going on? And I don't want to reach for that easy answer to say yes. Actually, despite all the baroque complexity, this system seems to be working surprisingly well for how bad it is. And in particular, we went looking for, we used one method to look for server impersonation or you're a man in the middle and other kinds of server impersonation attacks. The method we used was, okay, if you see a name that has multiple certs signing for it, like Google.com, is there sometimes a case where one of the certs signing for it is an obscure one that hardly signs for anything? And certainly some of those exist, but none of them look obviously like they're malicious. They might be, but we haven't tested that. So if there is CA certified server impersonation stuff going on, either. It's being done with the same kinds of widely used CA keys that everyone is using, that small tail that signs a lot of stuff. And certainly there is gonna be some that's like that. Or it's being deployed in a non-public way. You can't just go and see it happening by pinging port 443 on a public IP address. You have to be on the right private network and then it happens so you see the magic cert at that moment. And talk more about what we're planning to do to try and see some of that, get a window into some of that stuff in the future. But the question is, can we do any better? And it's not clear yet that we can. In particular, there are other systems that seem to work pretty well like SSH. But the fundamental thing about SSH is when a server's key changes, if you have a shell on that machine, you should have some way of finding out why the key changed and whether the new one's correct. Like maybe you're the admin or maybe you can call the admin at least and say, hey, is that key changing? Or you know that the machine crashed. So that model, the Tofu model works for SSH and other kind of small world deployments. It's not clear that it works if you're trying to visit a website on the other side of the world. You know, you're in Nigeria and you're a legitimate user of an online shopping site in the United States, whatever. And similarly, there are other protocols like OTR instant messaging does great key authentication, but it's hard to see how you could use that for the web. So what we want to do in the future, certainly as soon as we've done disclosure, we're going to be releasing the data set we've got this far. We're thinking about adding functionality to various Firefox extensions to spot unusual or unseen certificates. In particular, something like Tor button or something like HTTPS everywhere could have an option you can turn on that says, hey, in the background, check all my certs through Tor. Go through Tor and see if I'm seeing the publicly visible cert or not. And if I'm not, then go back through Tor again and send an anonymous report with this to our observatory. So we're thinking about implementing stuff like that and who knows what we'll find. Maybe that'll be a DEF CON talk next year. And then maybe we could put together some metrics to get a better picture of CA importance, right? There's a few legitimate reasons to have minimally used CAs, but I'd like to be able to maybe explain to people why they can cut one or why they can cut many. So do we have time for like one or two questions? All right, so questions will be in room 113.