 Okay, so our next talk is like Cat Joyce since who watches the watches in WebPKI? Hi! Yeah, I'm Cat. It's great to be here. Today I'm mostly going to be talking to you about something called certificate transparency. So, yeah, before we get on to what certificate transparency itself is, I want to sort of start off by talking a little bit about, oh, wait, how do I get it to change? Hold on a second. There we go. I want to start off by talking a little bit about the problem that certificate transparency was sort of intended to help with when it was sort of dreamt up. So I want to start with an example, and this is where I have to put my hands up and say, I do work at Google. The example is a Google example. It was the only way I could get my slides approved. Please bear with it. So we're going to start off. Say you are a Gmail user and you want to log into your email. What are you going to do? You're going to go to mail.google.com, hit enter, and type in your login details, and a web page is going to appear that probably looks something a little bit like this. Now, the more observant of you will notice that up in the top left-hand corner, the beginning of the URL has gone green, and there's this very reassuring little padlock symbol and the word secure. And what this means is that you have connected to the Google's web servers securely, so everything is encrypted. And so that anyone listening in on the connection can't tell what's being said. Okay, great. So you've got this secure connection. Yay. But what happens, you know, how do you know that it's actually Google you're talking to on the other end of this connection? How do you know it's not someone just pretending to be Google? How can you be sure that when you type in your login details, you're not actually sending them to an impersonator? Well, this is where we start talking about certificates. A certificate is a thing that a web browser sends to a web client, sorry, that a web server sends to a web client to prove that they are who they say they are. And a certificate contains two things. It contains a public key and it contains a domain name or a web address. And what a certificate means is that if you use this public key to set up a secure connection, you can be sure that you're talking to the real version of this web address. Now, for a certificate to be accepted by a browser, the certificate has to be signed by one of these trusted entities that we call Certificate Authorities or CAs. So going back to our example, you know, say you're Google and you've got this website, mail.google.com, that you want your users to be able to connect to securely. You're going to need a certificate to be able to present to your users when they try to connect. So you go along to a Certificate Authority and you say, hi, I'm Google. I've got this website, mail.google.com, and here's my public key. Please, can you issue a certificate to me? The Certificate Authority will take this information, they'll take a look at it, and they'll run a bunch of checks to convince itself that, first of all, this is actually Google it's talking to, and that all of the information provided is both correct and truthful. And if the CA is happy with this information, it'll sign a certificate, issue the certificate, and send it back to Google for them to use. Now, this is the point in, you know, WebPKI where there's trust. We trust Certificate Authorities to get this right. We trust CAs not to issue certificates to the wrong people. Okay, so now, you know, you're on your browser, you're trying to check your email. When you go to mail.google.com, Google will then respond with the certificate. It'll send through the certificate that it's had issued. And your browser will check the certificate and see that it's been signed by a trusted Certificate Authority and set up a secure connection so that you can then check your email. Okay, so that's all fine. That's sort of how the system was intended to work. But then what happens if a bad guy comes along and says, you know, goes to the Certificate Authority and says, hi, I'm Google. I've got this website mail.google.com. Here's my public key. Please, can you issue a certificate for me? Like I said, we trust Certificate Authorities to do the right thing here. We trust CAs not to issue certificates to, you know, to the wrong people. We trust CAs to be able to tell that this isn't actually Google they're speaking to and not to sign anything. But what if, for whatever reason, you know, say maybe the CA is hacked or they make a mistake, say they issue a certificate that they shouldn't? Well then, when you go to connect to mail.google.com, the attacker could respond with their certificate instead. And your browser has no way of telling that this is a fraudulent certificate. You know, it's still signed by a trusted Certificate Authority. So your browser will set up a secure connection. But this time, when you send your login details, you'll actually be sending them to an attacker. Okay, so this sounds bad, right? You know, you might be thinking, okay, this isn't great, but is it something that really happens? Is this really something we have to worry about? Unfortunately it is. There are a number of examples that I could have used here. The one I've chosen is the one that was like one of the real sort of inspiring incidents that sort of set off certificate transparency. So it's one of the sort of more relevant ones to us. Back in 2011, there was a Certificate Authority called DigiNotar. And what happened here was they were hacked. And the attackers managed to issue a wildcard Google.com certificate. And yeah, so what actually happened here was DigiNotar apparently actually detected the breach when it happened, but you know, being hacked is bad for business. So they decided not to tell anyone. They thought they had contained it. And it was actually something like three months later that a user sort of noticed this fraudulent certificate floating around and being used and then reported it back. And then after some investigation, it turned out that the attackers had actually got hold of not just one, but something like a hundred fraudulent certificates for a number of like, you know, big name domains. And yeah, the, well, DigiNotar's no longer a CA. Let's put it that way. So, okay, so we've got, you know, there's clearly a problem here. The problem is this blind trust we have in CAs to just do the right thing and not miss issue certificates. But what this example really shows is the problem is actually more than that. It's if a CA does miss issue a certificate, we have no way of telling that that's happened without them sort of owning up and saying, look, we did this. And there's actually no guarantee that they necessarily know this has happened. So what do we, what can we do about that? Well, this is where I bring up certificate transparency. This is enter certificate transparency. Um, what is certificate transparency? So it's an ecosystem to detect certificate miss issuance. Now I emphasize the word detect because we're not preventing certificate miss issuance. And it turns out that's actually a much harder problem. Um, within certificate transparency, there is this, this core concept, this idea of a certificate transparency log. And what that is is a publicly auditable, append only certificate store. So that means it's just essentially a big repository of certificates that anyone can access. Anyone can add to anyone can go and see what's in there. Um, there are certificate transparency logs already in existence. Um, and they, they current, you know, they can be run by anyone with the technical capacity. They can be run by anyone with the technical capacity to, to sort of keep them up and running. Um, currently they're mostly run by large tech companies and certificate authorities, but really they could be run by anyone who has a sort of a vested interest in the security of the internet. Um, in fact, there was actually a guy, uh, who we met up with. He was super enthusiastic about certificate transparency and, uh, he told us, he was like, yeah, I'm running this CT log. It's called behind the sofa. We were like, what? And, and it was literally running behind the sofa. So, so, you know, anyone who wants to do it, you can run one. How reliable they are. That's a different question. Um, so the goal with, with these logs is to get all of the certificates that are used on the web into certificate transparency logs, so that then if you're a domain owner, you can monitor these logs. You know, I mentioned there, they're publicly auditable, so you can monitor them and keep an eye out for certificates for your domain. And if you see a certificate for your domain that either you didn't ask for or that doesn't contain your public key, you can then take the necessary steps to get it revoked. Okay, so, before we had this system where, uh, you know, CAs were just blindly trusted to, to not to miss issue certificates. Now we have a system where certificate transparency logs will publicly publish all of the, uh, certificates that certificate authorities are issuing. But what if a certificate transparency log lies about what's in it? What if a CT log says it's going to, uh, store and then publish a certificate and actually doesn't? What if a CT log does store and publish a certificate for a short amount of time but then deletes it and stops publishing it? Haven't we just moved the trust from CAs to certificate transparency logs? No. Uh, it actually turns out we can, we can verify that certificate transparency logs are behaving correctly. And the reason we can do that is because of the cryptographic data structure that is used by certificate transparency logs to store, uh, certificates. And that data structure is a Merkle tree. Now, um, the Merkle tree that you can see up on the screen is just one containing seven certificates. So this represents a CT log containing seven certificates, very, very small. Um, and what a Merkle tree is, it's very simple. It's, um, it's just a binary tree where the leaves are the hashes of the things that are containing it, in our case certificates. So if you look at the, uh, at the slide, you know, A is the hash of certificate one, B is the hash of certificate two, C is the hash of certificate three, etc. Um, and the intermediate nodes of the tree are the hash of the concatenation of the two children of that node. So again, if you look at the slide to just like get your head around that, G is the hash of A concatenated with B. And then K is the hash of G concatenated with H. And that trickles all the way up to the top where you get the root of the tree or what we call the tree head. Now upon request certificate transparency logs are required to publish these, these, these tree heads. And those published tree heads, um, in conjunction with this data structure means that we can actually prove and we can check, like cryptographically check that certificate transparency logs are behaving. So one example that I mentioned of like, you know, dodgy things that a CT log could do was what if a CT log said, you know, accepts the certificate, says, yeah, I'm going to add it, I'm going to publish it, and that actually doesn't. Well, you know, if we, if we take that example, okay, what if we wanted to be able to check, you know, we've submitted certificate three to a certificate transparency log, and we want to be able to check that C3 has actually been included in the log. How could we do that? Well, we know C is just the hash of C3, so we can calculate C. Um, and then if we could calculate H and then calculate K and then calculate the tree head, we could compare the tree head with the tree head that the log is reporting. And if they're the same, we know the log is behaving, it's incorporated our certificate into its tree. Wonderful. But in order to get H, we need D. And in order to calculate K, we need G. And in order to calculate the tree head, we need L. So in order to, to, in order for a CT log to prove that it has included a certificate, all it has to do is provide those three intermediate notes. Um, now as a CT log grows, as more certificates are added to it, that this internal Merkle tree is going to grow as well. And so the tree head, that, you know, the root of the tree is going to, is going to change, the value of it's going to change. And so there's a second proof mechanism that's similar to this one, but a bit more complicated, that I'm not going to go over right now, um, you know, for time, um, that proves that, given two of these tree heads, it proves that they're consistent. And what I mean by consistent is that, uh, the sort of larger of the two tree heads was obtained only by appending, only by adding certificates to the previous tree head. So I mentioned CT logs are meant to be appended only. This is how you can check it, given, given two sort of tree tree heads, uh, with associated tree sizes. And those two proof techniques are actually all you need to be able to check that a certificate transparency log is behaving correctly. Does it work? Yes, it does. Um, we have the certificate transparency ecosystem, the sort of wider ecosystem, has successfully detected, uh, a number of, uh, misbehaving certificate transparency logs, using these, these checking mechanisms. Uh, one of the logs, what happened here was they used the same public key to run both a production and a test log. Now in this, in certificate transparency world, CT logs are identified based on their public key. So by using the same key for two logs, any tree heads issued by either of those logs were considered to be from the, from a single log. So what happened in this case was their production log issued a tree head, their test log issued a tree head for a different set of certificates because it contained a different set of certificates. And then the, and then, you know, when we, when asked, the production log couldn't produce one of these consistency proofs between these two tree heads. And that was when we sort of started figuring out something funny was going on. In the second instance, this log was actually running on, I believe, uh, the, on Amazon, the AWS S3, uh, it was running on S3 when the outage happened towards the beginning of last year. Um, and what happened here was so they issued a tree head and then S3 went down and, you know, in the panic of, oh no, all our data, you know, everything's down, they, um, they restored from an older backup that didn't contain all of the certificates. And then a bunch of new certificates came in and was added to this older version of the tree. And they issued another tree head and those two tree heads were then inconsistent because there were some certificates missing from, from the second one that were in the first one. Um, now in this case, actually it was, although okay, you know, they messed up it, the tree became inconsistent. Um, we were, it was really, really impressive how the log operators dealt with it. The inconsistent tree head was only actually sort of being reported for something like two minutes and the log operators very, very quickly produced a very detailed post-mortem of everything that had gone wrong. And there was a lot of public discussion on, on, on sort of everything that had happened here. Um, so it was just the way that they dealt with this, this sort of failure was, was absolutely fantastic. Um, and, you know, one of the things with certificate transparency is that we, we sort of pride ourselves on being transparent. So whenever something does go wrong, all of the discussion does happen in public. And because CT is, you know, it's still, it's still quite new, it's still quite young, we're still learning. And so every single one of these war stories, although something did go wrong, we learned something from it and, and, and, and, you know, something good came from it and we learned how not to mess up in the future. So that's, you know, pros and cons. Okay. So, sorry. Um, we, we have, you know, certificate transparency sounds like a great idea, right? I'm biased, but yay. Um, but how do we actually incentivize people to use these logs with, you know, when we've got them spun up? How do we, how do we incentivize people to put certificates into certificate transparency logs? Why would they want to do it other than sort of, like, for the good of the internet? Well, the answer to that is browsers. Um, so as of April 30th, this year, um, Chrome started, back in April, Chrome started requiring that all newly issued certificates have to be present within multiple trusted certificate transparency logs to be accepted. Safari currently requires this for a specific subset of certificates. So Safari requires this for something called extended validation certificates or EV certificates. And EV certificates are basically the ones that make your sort of URL bar look like this. Yeah. Um, but as of October this year, Safari will also require that all newly issued certificates be present in multiple trusted certificate transparency logs to be accepted. So, you know, this is, this is becoming part, certificate transparency has basically become part of the web, web PKI. And, you know, there have been a number of discussions with the other major browser vendors, you know, about supporting certificate transparency too. So, that's great. Now, I've mentioned a couple of times this, okay, on the previous slide we had these, these certificates have to be present in trusted certificate transparency logs. What do we mean by trusted? You know, I said, oh, you don't have to trust them, so what does that mean? And what trusted really means is that certificate transparency logs are, city logs that are behaving correctly. Now, Chrome specifically has a log policy that dictates what its perception of behaving correctly is. And it includes things like conforming to the certificate transparency RFC, maintaining 99% uptime so that, you know, you can access it to actually monitor and check for domains. And, you know, there's a, there's a handful of other things in there, but one particular one is that when a certificate is submitted to a log, it has to be, the log has to incorporate it into its Merkle tree within 24 hours. Now, the point of this requirement is that is to sort of minimize the amount of time that it takes, the amount of time between when a certificate is submitted to a log and when it's then publicly accessible for monitoring. But then at the same time it's trying to balance that with operationally allowing enough time to incorporate the certificate into this property structure. Now, moving 24 hours quite a long time, right? Like how long can it take to actually add a certificate to a Merkle tree? That doesn't sound that hard. But in reality what this 24 hours is, is the amount of time a log operator has to recover or to handle things when something goes wrong with the log. And, you know, we saw from the Amazon S3 outage in that war story that with certificate transparency logs recovering from failure isn't as simple as restoring from a backup. Usually there's some sort of like manual intervention needed. And trust me, if the thing that goes wrong is at 3 a.m. on a Sunday, your log monitoring your log maintaining team are going to be really thankful they have 24 hours to handle it. That being said 24 hours is not long enough for all logs. So one log did actually manage to incorporate a certificate into its tree within 24 hours. But it was actually having problems with the external APIs. Although it was technically in the tree within 24 hours the certificate wasn't publicly accessible until 36 hours after being submitted. So that was technically counted as going against the behavior of the log policy. Another log, which was actually one of our logs, simply failed to incorporate certificates within 24 hours. Turns out it's not that long after all. What happened in this instance was a specific person who we are good friends with now from the University of Michigan came across so they had been accumulating a large corpus of certificates themselves and they realized a large number of them, like talking about 6 million of them, weren't present in any certificate transparency and they thought, oh these guys their goal is to get all of the certificates used on the web into these logs. Let's help them out. Let's help them achieve their goal by submitting these to our logs and they submitted all 6 million of them at once. Pretty safe to say this was the largest load test we have had to date. So what this resulted in was this huge backlog of certificates to be sequenced and then stored. And because of the level of consistency required when storing certificates in certificate transparency logs 24 hours later this was still chugging along some of the certificates hadn't been incorporated so that was the end of that. But most logs don't have a problem with this 24 hour limit and are still very happily pottering along. The big question does certificate transparency actually work? Well I like to think that I wouldn't go around doing talks like this about it if we built the system and it was kind of failing at life so yes it does. Proof that it works. So we run a monitor for all of the certificate transparency logs that are out there for Google-owned domains and we discovered that a specific certificate authority had issued a certificate for www.google.com that we hadn't asked for. Now we went and spoke to this CA and it turns out they would like one of their test engineers needed a certificate for testing and they were like oh I need a domain for my certificate, what shall I use? Oh I know www.google.com No, bad! Do not issue certificates for domains you don't own, just like not okay. And so you know while it was a testing certificate so it was only actually valid for something like two days so you know the risk was kind of much lower, they did say they were like oh yeah no it's okay, it's okay it never left our test environment. We found this in the Google certificate transparency log and we were like yes it did, yes it did, you're wrong. Anyway, they are no longer a CA. Another example of the benefit of certificate transparency is something from that was in a Facebook blog post so Facebook also run a CT monitor for Facebook owned domains I assume and this while this wasn't a malicious thing they the benefit they got was one of their internal teams had used a CA to issue certificates that wasn't one of their like accepted CA vendors and so this CT helped them detect that and get the certificates revoked and reissued by a by an accepted CA so we got a thumbs up from them which is always nice the blog post if you want to know more okay so before I can tell you sort of what the current state and what sort of like in the near future for CT I need to start out by just saying how things started going back in the beginning in the beginning we wanted to you know we had this idea for CT and we wanted to show that it was possible to run reliable certificate transparency logs because we wanted to convince everyone to adopt this new idea and so we needed to show to them that it can be done and so the initial implementation of CT logs was designed to be extremely extremely reliable and extremely robust to infrastructure failure and what that resulted in was a design that essentially stored the entire Merkle tree in memory and that was kind of fine you know back when logs were maybe a couple of million certificates but you know recently everyone's switching to HTTPS which is great and you know there's this sort of rising popularity of much shorter live certificates which is also great but that does mean that there are a lot more certificates out there big sort of shout out to Let's Encrypt at this point and yeah so the current largest certificate transparency log is now sitting at about 400 million certificates it's not so feasible to store in memory so what this kind of led to when we saw this coming it wasn't like now ah what are we going to do and so what this led to was me and a bunch of the guys that I work with have developed this thing called Trillion and what Trillion is it's a scalable Merkle tree implementation and we use this scalable Merkle tree implementation to back our certificate transparency logs now the difference with Trillion is that the Merkle tree contents are actually stored from a database instead of being stored from memory Trillion is open source so you can go and check it out if you're interested we've tested it in a bunch of different ways and we've seen that it can sort of take roughly 2000 QPS is sort of heading up towards its top limit so you know potentially up to 2000 certificates being submitted per second that's quite a lot right we've tested it, we tested the size of it obviously size was an issue, 400 million certificates seemed like a lot this has now been tested to for a single log 13 billion certificates so you know that's way in the future Trillion is actually multi-tenanted which means that one Trillion instance can contain multiple Merkle trees so what that's sort of there are a few different benefits with that like the main one being for each new log you want to spin up you don't actually have to go through all of the setup process for that log you can just spin, you can just you know in one Trillion instance you can just provision a new log in there and it's much much simpler and much faster so there's sort of an ease thing with the multi-tenancy we've tested it to the point where it says it on the slide you can read it but we've got 10 tenants and across shared between those 10 tenants there are 42 billion entries or more big numbers that's fun one of the really key things with Trillion is it's not specific to certificates so it's more of a general transparency concept like it's just the Merkle tree implementation you can give it like blobs of whatever and it will store them and this is kind of great because it gives you all those potential all those verifiable properties you know with the two proofs that I mentioned potentially for other applications and Trillion provides both the log mode which is used for certificate transparency and what we call a verifiable map mode now the verifiable map mode is essentially a verifiable key value store the way that is actually from a Merkle tree point of view is this is a Merkle tree with two to the 256 leaves most of which are empty which is how it's sort of doable and if you want to add a key value pair to the Merkle tree you use the key the hash of the key to find which leaf to store it in which is the value and this is sort of great because then you can use these two proof mechanisms that Merkle trees provide again for a key value store to prove either existence or non-existence of specific values which is quite fun and has a use that I'm going to mention in a minute okay so Trillion I mentioned it's not just for certificates it can be used for wider things as well it is used for certificates that's one thing we're using it for to back CT logs a number of other I think a couple of CAs are also using it to back their CT logs which is great it's being used at the moment for something called key transparency key transparency actually uses both modes it uses the log mode and the map mode and the goal with well the idea with key transparency is to target secure end-to-end messaging so that's fun it's also being used by DeepMind are using it to DeepMind is doing a lot of work at the moment with health data, with NHS health data you may have read about it in the news and obviously health data is very sensitive and people accessing health data when they shouldn't be is bad so DeepMind are going to be using Trillion to keep a sort of verifiable log of all the data accesses that they have so then if someone does access the data in a way that they shouldn't at least they'll know they can't go back in Tampa with the access log and then there have been a number of other ideas for what to do with Trillion that have been floated so there have been some chat about binary transparency, there have been discussions about firmware transparency there was some discussion in sort of the early days of Trillion with the UK government about using it to help prove the integrity of a bunch of the registers that they maintain so things like the register of countries like register of land that the land registry keeps there's also apparently a register of all the cows in the UK which was news to me this actually made me laugh a little bit because I believe like a few years ago I someone was telling me about the old school like tech interview questions that were really abstract like oh yeah you know if you had to calculate how many cows that were in the UK how would you do it there's a register that's how I would do it yeah and so you know if any of you guys have any more sort of ideas for uses of Trillion or anything like that come and chat to us like I'm here a bunch of the guys from the team I'm on are here we would love to talk to you about your ideas that would be great and that's it thanks very much we've got time for really brief questions I think one in the middle he was first I think hello there was fantastic talk I was just wondering so when you've got billions of things in a tree and I want to calculate whether the tree head is actually accurate how fast is that because you were talking about sort of six million going in was quite slow to calculate if I want to prove that the city is accurate is that something I can do or so what do you mean do you mean like check that the whole tree is I guess so yeah if I've downloaded a certificate transparency log from semantic or whoever how how hard is it for me to prove that it is build it all the way so hard so there are so basically it currently in certificate transparency there are a number of like monitors that are doing exactly what you're saying like downloading the whole tree and building it up to calculating every single in between hash to check that that final hash like meets the one that the logs reporting and those monitors are very good at if it doesn't they will like they will get in touch with that well it's all on public mailing list I'll say look it's not in terms of like what it depends what you want to do it depends what your use case is for sort of like site owners they maybe don't care as much about that they sort of they're just like you know other people can be checking the integrity of it if Chrome trusts it then I you know kind of thing but in terms of the two proofs that I showed so if you know if you're say you're a CA and you submit a certificate to it and then you want to check that that certificate has definitely been included the number of nodes you need is log n because it's a tree so that's much quicker you know very quick verifying the whole tree can take a bit more time and all but actually the sort of challenge there is downloading the whole tree so that there's sort of different different different things okay this chap and then I think other questions will have to be offline I'm afraid yeah come and come and find us after happy to talk yeah so I was just wondering you said the tree is currently at 400 million certificates how big is that in terms of like gigabytes I don't know does anyone do you guys know how big that is in terms of gigabytes I'm getting a from my boss so we don't know no we haven't looked I can look we can look and we can we can tell you yeah okay thanks very much that's our appreciation