 All right. Hello everyone. Let's get started. I want to talk about a system called certificate transparency today. And this is a bit of a departure from most of the topics we talked about. And so far we've talked about distributed systems that are really closed systems where all the participants are trustworthy. They're all maybe be run being run by the same sort of mutually trusting organization like raft is that way, you know, we just assume that the raft peers do what they're supposed to do. But there's also plenty of systems out there, particularly systems sort of built at internet scale where the systems are open. And anyone can participate be active participant in some big systems out there. And if you build systems that are completely open in that way. There's often no single universally trusted authority that everybody is willing to trust to run the system or to protect it. That is everybody is sort of potentially mutually suspicious of everyone else. And if that's the situation you have to be able to build useful systems out of mutually distrusting pieces. And this makes in any sort of internet wide open systems make trust and security sort of top level systems issues when you're thinking about designing a distributed system. So the most basic question when you're building an open system is when I'm talking to another computer or another person. You need to know are you talking to the right other computer are you talking to the right website. And this problem is actually close to unsolvable. It turns out there's really, there's lots of solutions and not really worked that well. But it is the problem that certificate transparency today's topic is is trying to help with the material today ties sort of backwards in the course to consistency. It turns out that a lot of what certificate transparency do doing is ensuring that all parties see the same information about certificates that's a real consistency issue. And this material also ties forward to blockchain systems like blockchain, which is what we'll be talking about next week. And certificate transparency is among the relatively few non cryptocurrency uses of a blockchain like design. All right, so by way of introduction, I want to start with the situation on the web. With web security at any rate, as it existed before 1995 before certificates. So this is for 1995. And in particular, there was a there was a kind of attack in those days that people were worried about called a man in the middle attack so this is man in the middle and this is a name for a class of attacks style of attack. You know, the setup in those days is you have the internet. And you have people running browsers. From their computer attached to the internet. Maybe I'm sitting in front of my computer, I want to talk to a specific server, exposing what I want to do is talk to gmail.com. Right. Ordinarily, I would, you know, maybe contact the DNS system. I would as a user, maybe type gmail.com, I would sort of know what it was I wanted to talk to, namely gmail.com. My browser would talk to DNS server say what's gmail.com it would reply with the IP address I connected that IP address and you know I need to authenticate myself so I type my password to Gmail to Gmail's website, and then Gmail would show me my email. Without some kind of story for security, this system is actually quite easy to attack and turned out to be easy to attack. And the one style of attack is what's called a man in the middle attack where some evil person sets up a another web server that serves pages that look just like Gmail web servers like the last year login and password. Right. And then the attacker would maybe intercept my DNS packets, or just guess what I would have sent to DNS packet and come up with a fake reply that instead of providing the real IP address or the real Gmail.com server would provide the email address of the attackers fake computer. And then the user's browser instead of talking to Gmail would actually unknown to them be talking to the attackers computer the attackers computer would provide a web page looks just like a login page user types are login and password. And now the attackers computer can forward that to the real Gmail login for you of course you don't know that. You can get your current inbox back to the attackers computer which presumably records it along with your password and then sends your inbox or whatever to the browser. And this allows a, you know, if you can execute this kind of man in the middle attack, the attackers computer can record your password record your email and you'll let it be the wiser. And then before certificates and SSL and HTTBS, there was really no defense against this. Okay, so this is the man in the middle attack and this attacker here is the man in the middle looks just like Gmail to the browser pretends to be the user when talking to Gmail so that it can actually get the information from Gmail required to trick the user into thinking it's really all right so this is the attack. In the mid 90s people came up with certificates with SSL or it's also called TLS it's what the protocol the security protocol that you're using when you use HTTBS links. So here the game was that Gmail calm was going to have a public private key pair. So, we'd have a private key that only Gmail knows sitting in its server. And then when you connect, well, you're the user, you connect somewhere, you asked to connect to Gmail, you know, to verify that you're really talking to Gmail, the user is going to demand Gmail prove that it really owns Gmail's private key well, of course, where does your browser find out Gmail's private key from or Gmail's public key which is what you need to check that it really has the private key. There's also this notion of certificate authorities and certificates so there'd be a certificate authority. When Gmail setup at server would contact the certificate authority maybe on the phone or by email or something and say look you're not I want a certificate for the DNS name Gmail calm. And the certificate authority would sort of try to verify that oh yes the whoever's asking for the certificate really owns that name, you know really is Google or whoever owns Gmail calm. And if so, the certificate authority would provide a certificate. Back to Gmail calm which basically what a certificate contains is the name of the web server. The web server's public key and a signature over this, the certificate made with the certificate authorities private key. This self contained assertion checkable by checking the signature an assertion by the certificate authority that the public key of Gmail calm is really this public key Gmail calm server would just keep a copy of the certificate. If you connect to gmail calm server with HTTPS. The first thing it does is sends you back this certificate. Right now of course since Gmail calm is willing to give it to anybody it's the certificate itself is not at all private it's quite public. And then the browser would send some information like a random number for example to the server and ask it to sign it with its private key. And then the browser can check using the public key in the certificate that the random number it's random number was really signed by the private key that's associated with the public key in the certificate. And therefore that whoever it's talking to is really the entity that the certificate authority believes is Gmail calm. And now, the reason why this makes men in the middle attacks much harder is that I'm yeah you know you can set up a rogue server that looks just like Gmail calm and maybe you can even hack the DNS system indeed you still can. If you're sufficiently clever powerful hack the DNS system to tell people's browsers that they should go to your server instead of Gmail calm. If somebody's browser contacts your server. You're not presumably going to be able to produce a certificate that says, but you can produce Gmail certificate but then Gmail certificate has Gmail's public key your server doesn't have their private key. So you can't sign the challenge the browser send you. So presumably since you're not the real Google and not the real Gmail, you're not going to be able to persuade a certificate authority to give you a certificate associating gmail calm with your public key that you know. And so this certificate scheme made man in the middle attacks quite a bit harder. And, you know, indeed they are quite a bit harder now because of certificates. Okay, so it turns out though that the certificate scheme as people now have a lot of experience with it. Almost 25 years experience within so we now know the kind of things that go wrong. It was originally imagined that there would just be a couple of trustworthy certificate authorities who would do a good job of checking that request really came from who they claim to come from that if you have a certificate for Gmail calm that the certificate authorities would indeed actually verify that the request came from the owner of Gmail calm and not hand out certificates to random people for Gmail calm. But that turns out to be very challenging for Google maybe you can convince this certificate authority can convince itself that a request comes from Google but you know for just x.com. It's hard to have a certificate authority reliably able to say oh yeah gosh, this request really came from the person who really does own the DNS name x.com. All right. A worse problem is that, while originally they were envisioned to be only a few certificate authority there are now literally hundreds of certificate authorities out there. And any certificate authority can generate a certificate for any name. If you want to you're allowed to change certificate authorities if you're a website owner you can change certificate authority to whoever you like. So there's no sense in which certificate authorities have limits on their powers they can, any certificate authority can produce any certificate. Now browsers have you know there's a couple hundred certificate authorities and that means that each browser has built into it like Chrome or Firefox or something has built into it a list of the public keys of all the certificate all couple hundred certificate authorities. And if any of them sign has signed a certificate produced by web server certificates acceptable. The result of this is that there have been multiple incidents of certificate authorities producing bogus certificates that is producing certificates that said they were certificate for Google or Gmail or some other real company. But we're actually issued to someone totally else absolutely not issued that certificate for one of Google's names but not issued to Google issued to someone else like you know sometimes this happens just by mistake because certificate authority doesn't realize that they're doing the wrong thing and sometimes it's actually quite malicious and there have certainly been certificates issued to people who just want it to snoop on people's traffic and mount man in the middle attacks and did mount man in the middle attacks. Today's readings are mentioned a couple of these incidents. And they're particularly troubling because they're hard to prevent because there's so many certificate authorities and not all of them. Sorry the last question what's the last line in the search box. It's a signature over the certificate by the certificate using by the certificate authorities using the certificate authorities private key. Okay, so there have been incidents of bogus certificates certificates for real websites like Google issue to totally the wrong people and those certificates have been abused. And it's not clear how to fix the certificate authority system itself to prevent them because there's so many certificate authorities, and they really. You just can't expect that they're going to be completely reliable. They're not going to have this. One possibility would be to have a single online database of all valid certificates, so that when a browser, you know browser contacts a website website hands at a certificate, you know, might or not be valid, then maybe you can imagine the browser would contact the global valid certificate or is this really a certificate or is it a bogus certificate issued by a rogue certificate authority. The problem is, as many problems with that approach. One is, it's still not clear how you can, how anybody can distinguish valid correctly issued certificates from bogus certificates, because typically you just don't know who the owner of DNS names is. Furthermore, you need to allow certificate owners to change certificate authorities or renew their certificates or they may lose their private key and need a new certificate to replace their old certificate because using a new public private key pair. So people certificates change all the time. And finally, even if technically it were possible to distinguish correct certificates from bogus ones. There's no entity that everybody would trust to do it. You know, everybody in the world, both, you know, the Chinese Iranians, the Americans, you know, there's not anyone outfit that they all trust. There's no reason why there's so many certificate authority so we really can't really can't expect there to be a single clearing house that accurately distinguishes between valid and invalid certificates. However, what certificate authority certificate transparency doing is doing is essentially trying to do the best that it's possible to do. The longest step it can towards a database of valid trustworthy certificates. Now I'm going to sort of give an overview of the general strategy of certificate transparency. The style of certificate transparency is that it's an audit system, because it's hard, hard to impossible to just decide, does this person own a name. The certificate transparency isn't a building a system that prevents bad things from happening, which would require you to be able to detect right away. That a certificate was bogus instead certificate transparency is going to enable audit. It'll, it's a system to cause all the information to be public so that it can be inspected by people who care. That is, it's going to, if you know, maybe people, it'll still allow people to issue bogus certificates, but it's going to ensure those certificates are public and that everybody can see them, including whoever it is that owns the name. That's in the bogus certificate. And so this fixes the problem with the pre certificate transparency system where certificate authorities could issue bogus certificates, and no one would ever know. And they could even give them to victim a few victim browsers would be tricked by them. And still because certificates aren't generally public. Somebody could certificate authority could issue a bogus certificate for anybody for Google or Microsoft and Google Microsoft might never realize it and the incidents that have come to light have generally been discovered only by accident. Not because they were sort of for doomed to be discovered. Instead of relying on accidental discovery of bogus certificates, certificate transparency is going to sort of force them into the light where they much easier to notice them. Again, so it has a sort of audit flavor, not a, not a prevention flavor. Okay, so the basic structure again we have gmail.com or some other service that wants certificate. As usual, they're going to ask some one of the hundreds of CAs for certificate when when when the web servers first set up. So we're going to ask a certificate and the certificate authority is going to send this certificate back to the web server because of course it's the web server that gives a certificate to the browser. And at the same time though the certificate authority is going to send a copy of the certificate or equivalent information to a certificate transparency log server. And there's going to the real system there's multiple independent certificate transparently log servers. I can assume there's just one so this is some service that you know we don't have turns out we're not going to have to trust but the certificate authority is going to send a certificate to the certificate log service, which has been maintaining a log of all issued certificates or all ones that certificate authorities have told it about when it gets a new certificate it's going to append it to its log. So this you know might have millions of certificates in it after a while. When the browser and some human wants to talk to a website. They, you know, they talked to set up an HTTPS connection to Gmail Gmail sends them a certificate back, and the browser is going to send that certificate to the certificate log server and say is this certificate in the log. The certificate log server is going to say yes or no is certificate in the log now. And if it is then the browser will go ahead and use it. Now the fact that it's in the log, you know doesn't mean it's not bogus right because any certificate authority including the ones that are out there that are malicious or badly run any certificate authority can insert a certificate into the log system, and therefore, perhaps trick users into using it so so far we haven't built a system that prevents abuse. However, it is the case that no browser will use a certificate, unless it's in the log. So at the same time, Gmail is going to run up with the CT system calls a monitor. So for now well just assume that there's a monitor associated with every website. So this monitor periodically also talks to the certificate log servers and asset. Please give me a copy of your log or really you know please give me a copy of whatever new has been added to your log since I last asked, and that means that the monitor is going to build up, it's going to be aware of every single certificate. It's going to be in that's in the log. And, but also because the monitors associated with Gmail, the monitor knows what Gmail's correct certificate is. So if some rogue certificate authority issues a certificate for Gmail that's not the one that Gmail itself asked for. Then Gmail's monitor will stumble across it in the certificate log, because Gmail's monitor knows Gmail's correct certificate. Now of course the rogue certificate authority doesn't have to send its certificate to the certificate log system but in that case when browsers, you know maybe accidentally connect to the attackers web server. And the attacker's web server gives them the bogus certificate if they haven't put it in the log then the browser won't believe it and will abort the connection, because it's not in the log. So the log sort of forces, because browsers require certificates to be in the log the log forces, all certificates to be public, where they can be audited and checked by monitors who know what the proper certificates are. And so some monitors are run by big companies and companies know their own certificates. Some monitors are run by certificate authorities on behalf of their customers and again those certificate authorities know what certificates they've issued to their customers. And they can at least alert their customers if they see a certificate they didn't issue for one of their customers names. In addition there's some totally third party monitor systems where you give the third party monitor your names and yours and your valid certificates and it checks for unexpected certificates for your names. All right. So this is the overall scheme. It depends very much on browsers seeing the very same log contents that monitors see. And remember we were up against this problem that we're not sure that we can trust any component in this system. So indeed we found the certificate authorities, some of them are malicious or have employees who can't be trusted, or are sloppy and don't follow the rules. We have to assume, we have to assume that the same will be true of the certificate log servers that some of them won't be malicious. Some of them may conspire with road certificate authorities and intentionally try to help them issue bogus certificates. Some of them may be sloppy. Some of them may be legitimate but maybe some of their employees are corruptible. You pay them a big enough bribe they'll do something funny to the log to lead something or add something to it. But what we need to build is a log that even though the log operator may be not cooperating not trustworthy, we can still be sure, or at least know if it's not the case that browsers are seeing the same log contents as monitors. So if our browser uses a certificate that was in the log, the monitor who owns that name will eventually see it. What we need to do is we need to build a log system that is append only so that it can't show a certificate to a browser then delete it before monitors see it. So append only no forks in the sense that we don't want the log system to basically keep two logs, one of which it shows the browsers and one of which it shows the monitors. So we need no forks. And we need untrusted. We can't be sure that the certificate servers are correct. So just to back up a bit the critical properties we need for the log system, so larger than just a log servers but the entire system of the log servers plus the various checks is we have to prevent deletion that is we need the certificate server to be append only because if a log server could delete items out of its log, then it could effectively show a bogus certificate to a browser claim it's in the log and maybe in the log at that time, that's the browser uses it, but then maybe the certificate server could delete that certificate from its log, so that by the time the monitors came to look at the log, the bogus certificate wouldn't be there. So we need to have a system that either prevents deletion or least detects if deletion occurs that's the sense in which the system needs to be append only. And we also have to prevent what's called equivocation. So, you know, if maybe the certificate log servers could be implementing append only logs but if it implemented two different append only logs and showed one to browsers and show the other append only log to monitors, then we can be in a position where the browser that we showed the log we showed the browsers contains the bogus certificate but the log we showed the monitors doesn't doesn't contain the bogus certificate. So we have to rule out equivocation to all without trusting the servers. So how can we do this. Now we're getting into the kind of details that the last of the assignments was talking about. The first step is this thing called a Merkel tree. And this is something that's sort of that the log servers are expected to build on top of the log so the idea is that there's the actual log itself which is a sequence of certificates you know certificate one certificate to presumably in the order that certificate authorities asked certificates to be added to the system, the prime millions I'm just going to assume there's a couple. Now, it's going to turn out you know we don't want to have the browsers have to download the whole log. And so we need tools to so that we can allow the logging system to basically send over the summaries or unambiguous summaries of what's in the log to the, the browsers and we'll talk in a bit about exactly what those summaries are used for. But the basic scheme is that the log servers are going to use cryptographic hashes to sort of hash up the complete set of records that are in the log to produce a single cryptographic hash which is typically these days about 256 bits long so that cryptographic hash summarizes the contents of the log. And the way that's done is that the is as a basically a tree structure of pairs, we're hash always hashing together pairs of numbers. So I'm going to write each for hash. Each one of the log entries has a hash so we're going to have sort of at the base level with the hash of each log entry. Each certificate. And then we're going to hash up pairs. At the next level we're going to have a hash of this and concatenated with this and a hash of this concatenated with this, these two hashes and then at the top level sort of we're going to have a hash where we're we're doing is hashing these to the concatenation of these two hashes. And this single hash here is a unambiguous sort of stand in for the complete log. One of the properties of these cryptographic hashes like shot 256 is that it's not feasible to find two inputs to the hash function that produce the same output. And that means if you tell somebody the output of the hash function, there's only one input. You're ever going to be able to find that produce that output. So if the log server does hash up in this way, the contents of its logs, only this sequence of these log records will ever be able to produce that hash or guaranteed effectively that the log server is not going to be able to find some other log that produces the same final tree hash as this sequence of log entries. All right, so this is the Merkel tree. This is the sort of tree hash that summarizes the entire log at the top of the Merkel tree. There, there's, we'll actually call it a signed tree head because in fact the log servers take this hash this at the top of the tree and sign it with their private key and give that to clients to browsers and monitors. And the fact that they signed it means that they, they can't disavow it later. It was really them who produced it. So that's you know just to be able to catch lying lying log servers. And so the point here is that once a log server has revealed a particular sign tree head to a browser or monitor. It's committed to some specific log contents, because it won't be able to ever produce a different log contents that produce the same hash. So these hashes are really function as kind of commitments. Okay, so this is the, with the log, but the Merkel tree looks like for a particular log. Now, the third reading today sort of outlined how to extend the log how to add records to the log for arbitrary numbers of records. I'm just going to assume that the log always grows by factors of two, which is impractical but makes it easier to explain. And so that means that as certificate authorities send in new certificates to add to the log the log server will wait until it has as many new records as it has old records, and then produce another tree head and the way it is, it's going to, in order to extend the log, the log server is going to wait till it has another four records, and then it's going to hash them pair wise, just as before. And then it'll produce a new tree head that is the hash of the concatenation of these two hashes. And this is the new tree head for the new expanded log. And so that means as time goes on and a log server log grows longer and longer it produces sort of higher and higher sequence of higher and higher tree heads as the log grows. Okay, so this is the structure that we're expecting log servers to maintain. Of course, who knows what they're actually doing, especially if they're malicious, but the protocol certificate transparency protocol sort of is written that, you know, as if the log server was actually doing this. All right, so what do we need to do, but what do the point of this Merkel trees is to use them to force log servers to prove certain things about the logs that they can, but the log that they're maintaining. We're going to want to know what those those proofs look like. The first kind of proof is what I'll call a proof of inclusion. And this is what a browser needs when it when it wants to find out if a certificate that it's just been given by a web server if that certificate is really in the log. It's going to ask the certificate. If the log server look at here's a certificate, you know, is it is it in your log and the certificate server is going to send back a proof of actually not just that the certificate is in the log but actually where it is what its position is in the log. And of course the browser wants this proof because it doesn't want to use a certificate if it's not in the log because if it's on the log then monitors won't see it and there's no we have no protection against the certificate being bogus. It's going to be a proof because we, we, we can't afford to let the log servers, a malicious log server change its mind, right we don't want to take the log servers word for it because then they might a malicious log server might say yes. And this proof is going to help us catch it, you know, if a log server does lie, these proofs are going to help us catch the fact that the log servers lied and produce evidence that the log servers malicious and should be ignored from now on. So that's sort of the ultimate sanction against the log servers is that the browsers actually have a list of acceptable log servers, and these proofs would be part of the evidence to cause one of the log servers to be taken out of the log, if it was malicious. Okay, so we need a proof, we want the log server to produce a proof that a given certificate is in its log. So actually the first step is that the browser asks the log server for the current sign tree head. So what the browser is really asking is, is this certificate in the log that summarized by this current by this sign tree head. The log server may lie about the sign tree head right the browser asked for the current sign tree head and then asked for a proof that the certificate is in the log. The log server could lie about the sign tree head and we'll deal about that we'll consider that later. But for now let's assume that the the browser has the correct sign tree head and is demanding a proof. For simplicity I'm just going to explain how to do this for a log with two records and it turns out that extending that to a log with with other more higher power of two records is relatively easy. So, the browser actually has a particular sign tree head. Let's suppose the correct log that sits under that sign tree head is the two element log a be for particular certificates a and B. And that means that the correct mergel tree for that is here we is at the bottom as the hashes of a be and then the sign tree head is. Actually the hash of the hash of a concatenated with the hash of B. So let's suppose this is the sign tree head that the certificate that the log server actually gave to the client. Because the client doesn't, this client only knows this value, this final hash value doesn't actually know what is in the log. The proof, if the if the browser asked for a proof that a is in the log, then the proof that the log server can return is simply the proof for a is a in the log is simply a position in the log and the hash of the other element in the log. So, zero, and the hash of B. And that is enough information for a to convince itself that for sorry for the client to convince itself that a really is at position zero, because it can take it knows the certificate it's interested in it can hash it. So the proof was the hash of the other element in this lowest level hash. So the browser can now knows ha and hb can hash them together can execute this hash and see if the results is the same as the sign tree head that it has. And if it is. It means that the certificate log is actually produced a valid proof that certificate a is at position B. Sorry is that position zero in the log summarized by the sign tree head. And it turns out that in larger, larger logs. So if you're looking for, if you need a proof that a is really here. All you need is the sequence of hashes of the other branch of each hash up to the sign tree head that you have. If you if you need a proof that a is a decision, so you need this hash, you need it's then you need this hash, and if a log is bigger, you know, eight elements, then you also need this hash, assuming that you have a sign tree head so you can take the element you know and hash it together with each of these other hashes, see if it's equal to the sign tree head. Okay, so if the browser asks is supposedly browser asks whether x is in the log at position zero. Well, x isn't in the log right. So, hopefully, there's no easy way for the log server to produce a proof that x is in the log position zero. But suppose the log server wants to lie. And it's in the position where it already exposed assigned tree head for a log that contain a and then be browser doesn't know was a and B what's in the log and the log server wants to trick the client into the browser into thinking that it's really x at position zero. Well, it turns out that in order to do that. The for this small log, the certificate server has to produce for some why it needs to find a why that if it takes his hash when calculated with x, you know, so this is that's that it's equal to the sign tree head, right, because the client or assume the client already has assigned tree head, we need to find a number here that when hash together with the hash of x that the client's asking about produces that same sign tree head. Well, we know the sign tree head or the assumptions assigned tree it was actually for some other log right because we're trying to rule out the possibility that the log server can give you a sign tree head for one log, but that convince you that something else is in that log that's not there. So the sign tree had really was produced by from the hashes of the records that really were in the log and now we need and since you know x is definitely different from a that means the hash of x is different from the hash of a and that means that the log server needs to find two different inputs to the hash function that produce the same output and the assumption widely believed to be true for practical purposes is that that's not possible for cryptographic hashes. The sign tree head was produced by hashing up one log that it will not be possible to find these sort of other hash values that would be required to produce a proof that some other element was in the log that wasn't really there. Any questions about this about anything interesting a nice thing about this is that the proofs are the produce proofs consist of just the sort of other hashes on the way up to the root. If there's n certificates there's only log in other hashes and so the proofs are reasonably concise in particular they're much much smaller than the full log and since you know every browser that needs to connect to a website. It's going to need one of these proofs. It's good if they're small. Okay, well this was whole discussion was assuming that the sign tree head that the browser had was the correct sign tree head. But you know there's no immediate reason to believe that the log server would have given if the logs are as malicious and wants to trick a client. You know why would it give the client the correct seat sign tree head why doesn't it give it just beginning the sign tree head for the bogus log that it wants to trick the client into using so we have to be prepared for the possibility that the log server has cooked up. It's really different log for the browser that's not like anybody else's log and just contains the bogus certificates that a malicious log server wants to trick this client into believing. So, what do we do about that. Well, it turns out that this is at least in the first instance this is totally possible. Usually what's going to happen usually the way this would play out is that we'd have some browser that was, you know, seeing the correct logs until some point in time when, when somebody wanted to attack it. And, you know, you want the browser still to be able to use all the websites that it's ordinarily seeing, plus a sort of different log with bogus certificates that the log server wants to trick just that client just that victim browser into using. So now this is a fork fork attack. Or more broadly, equivocation, and the reason why people call this kind of attack a fork attack is that if we just never mind the Merkle tree for a moment if we just consider the log. Usually the log already has you know millions of certificates in it. And everybody seen the beginning part of the log then at some point in time. So, attack, we want to persuade our victim to use some bogus certificate be, but we don't want to show be to anybody else certainly not to the monitor so we're going to sort of cook up this other log that sort of continues as usual and contains new submissions but definitely doesn't contain the bogus certificate be. You know what this looks like is a fork because both the sort of main log that monitors are shown is kind of off on one fork and then this log we're cooking up especially to trick a victim is a different fork. This is the construction that the malicious log server would have to produce if it wants to trick a browser into using a bogus certificate. And again these are possible. So, to do this at least briefly in with certificate authority, the certificate transparency. Luckily, though, is not the end of the story and certificate authority contains some tools that allow it to make forks, much more difficult. So, the basic scheme is that this isn't this is the way certificate authority sort of intended to work all certificate transparency is intended to work but doesn't quite. What's going on here is that the, the monitors and people who aren't being attacked are going to see a sign tree particular sign tree head. Sign tree head one of course is going to change as the log extends and the victim we know must see some other sign tree head, because this is a sign tree head that is hashed over this bogus certificate is guaranteed to be different from the sign tree heads that this is the malicious servers showing the monitors. If only the browsers and monitors could compare notes. Maybe instantly realize that they were seeing different trees, and all it takes is comparing, you know, if we play our cards right, all it takes is comparing the sign tree heads that they've gotten from the log server to realize wait a minute, we're seeing different logs. Something's terribly wrong. So the critical thing we need to do is have the participants in the system be able to compare sign tree heads and the certificate transparency has a provision for this called gossip. And the way it's intended to work is that browsers. Well the details don't really matter but what it really amounts to is that all the participants sort of drop off the recent sign tree heads they've seen into a big pool that they all inspect. And they figure out if there's inconsistent sign tree heads that clearly indicate divergent logs that afford. So we're going to gossip which really means exchange sign tree heads and compare. It turns out that current certificate transparency implementations don't do this but they ought to. And they'll figure it out at some point. All right. Okay so the question is, given to sign tree heads. How do we decide if there are evidence that the log has been forked. The thing that makes this hard is that even if the log hasn't been forked, as it depended to new sign tree heads will become current so you know maybe sign tree had one was the legitimate sign tree head of the log at this point and then some more certificates are added and sign tree head three becomes the correct head of the log and then sign tree head for etc. So really what this gossip comparison needs to do is distinguish situations where one sign tree head is really described as a prefix a log that's a prefix of the log described by another sign tree head because this is the legitimate situation where you have the two, these two sign tree heads are different. But the second one really does subsume the first one, we want to distinguish that from two sign tree heads that are different where neither describes a log that's a prefix of the other one's log. I want to tell these two cases apart. And this telling that situation apart is the purpose of the consistency proof the log on Merkel consistency proof that the readings talk about. So this is the log consistency proof. So the game here is that we're given to sign tree heads. Each one and each two, we're asking is each one's log prefix. And it's not, these are two, these are hashes so it's really asking about the log that the hashes represent. And you know we're hoping the answer is yes. And if the answers know that means that the log servers forked us and is hiding something from one party or the other. Okay, well, it turns out that as we, as I mentioned before the, as the Merkel tree as the log grows, the Merkel tree also grows. And what we see is a sequence of science of tree heads. Each one as a log doubles in size, each one has its, as its left thing. Let me draw in the actual hash functions of this hash function is hashing up two things the results of this hash function is one of the inputs to the next sign tree head. The results of this hash function is one of the inputs to the next sign tree head. So you get this kind of tree of, of sign tree heads, right, and any two sign tree heads if they're legitimate, you know, if each one's log is a prefix of each two that means that maybe this one's h one and this one's h two, and they're going to have this you know, if each one is a prefix of each two, then they must have this relationship where each two was produced by taking each one, hashing it with some other thing and maybe hashing that with some other thing until we get to the point where we find h two. And what that means is that if a browser or monitor challenges a log, a log server to prove that each one's log is really a prefix of each two's log. The log server has to produce is this sequence of other, the other side of each of the hat sign tree head hashes on the way from h one to h two, and this is the proof. Again, and you know this is reminiscent of the inclusion proofs. Then to check the proof. You need to take each one hash it with the first other thing, you know, hash that along with the second other thing until you get to the last one of these and that better be equal to each two. And if it is, it's a proof that h two is a suffix of h one otherwise the logs are as evidently tried to fork you. And again, you know the basis of this is that there's no other. You know h two really isn't supposing each one isn't a prefix of each two. There's no way that since it was created from some actual log that's not the same as each one. There's no way that the log server could cook up. These values that are required to cause the hashes this repeated hash of each one to equal each to the page to really come from here because we're assuming that the cryptographic hashes prevent you from finding two different inputs that produce the same out. Okay, so this is the log consistency proof. Okay, so the question is who usually challenges the log server so actually talk about that in a minute but it turns out that both browsers and monitors. Well, both browsers and monitors challenge the log server is actually usually the browser is challenging the log server that's the most important thing. But there's two points in time at which you need to challenge the log server to produce these proofs. And I'll talk about both of them. All right. Okay, actually. So, the first place at which the point at which these proofs proofs are used as for gossip as part of gossip as I outline. And the scheme that's intended for gossip is that browsers will periodically talk to some central repository or some set of central repositories and just contribute to a pool of sign tree heads the sign tree hits they've recently seen from the log server. And the browsers were also periodically pull out random elements of sign tree heads that other browsers have seen just randomly pulled them out of the pool. And then we multiple of these collects these pools run by different people so that if one of them is cheating. That will be proof against that. And then the browser will for whatever just any random sign tree heads that are pulls out of the pool, it will ask the log server to produce the log consistency proof for that pair of sign tree heads. And you know if nobody's cheating, these sign should always be easy for the log server to produce, you know, any consistency proof that's demanded of it. But if it's forked somebody suppose that the log server is forked somebody and given them a sign tree head this really describes a totally different log or even a log that differs in one element from the logs that everybody else is seeing. Eventually that browser will contribute that that sign tree head to the pool, the gossip pool, then eventually somebody else will pull that sign tree head out of the pool and asked for a proof for, you know, some other sign tree head that presumably is on a different fork, and then the log server will not be able to produce the proof. And since they're signed since the sign tree heads are signed by the log server. That's just absolute proof that the log server has forked to this clients, presumably with intent reveal a bogus certificate to one of them and hide it from the other. Okay, but there's actually another place where it turns out you need the, not these consistency proofs, not just during gossip, but actually also during the ordinary operation of the browsers. So the difficulty is that suppose, you know, suppose a browser is is kind of seeing a consistent version of the log is the same as everybody else. But then log server wants to trick it into using this bogus certificate. Log server sends it a sign tree, you know, makes a sign tree that is different from everybody else that refers to, you know, malicious log that contains this bad certificate refer everybody else since it doesn't want other people to notice certainly doesn't want, you know, the monitors to notice, you know, cooks up this other log that is what everybody else is seeing. All right, so now the, you know, the browser checks and sees, you know, I asked for inclusion proof and the inclusion that log server will be able to produce the inclusion proof because this sign tree head that the browser has really does refer to this bad log. The browser will go ahead and use this bogus certificate, and maybe get tricked and give away the user's password, you know, who knows what. Depending on the details about the browser's work, we're at risk of the next time the browser which doesn't realize anything's gone wrong talks to the log server, the log server might then say oh you know there's a new log with a bunch of new stuff on it and here is the sign tree head of the current log why don't you switch why don't you use that as your sign tree head. And so now, if that were allowed to happen, then the browser's now completely lost the evidence that anything went wrong because now the browser is using the same trees everybody else. So that's going to contribute this sign tree head to the gossip pool it's all going to look good, and we had this sort of brief evil tree that was evil blog that was revealed evil log fork but if the browser's willing to accept a new sign tree head, then we can basically have the browser forget about so we want what we want is this we want is for if a browser if the log service shows a particular log to the browser that the browser that it can't trick the browser into switching away from that log. That is that we want to be able to enforce that the browser sees only strict extensions to the log that it's seen already. And doesn't simply get switched to a log that is not compatible with the log the browser seen before. So the property that we're looking for is actually called fork consistency. What that name refers to is that if the browser's been forked onto a different fork from other people, then it must stay on that fork and it should never be able to switch to the main fork and the reason for that is we want to preserve needs to preserve this bad sign tree head that we have accessors, so that when the browser participates in the gossip protocol, it's contributing sign tree heads that nobody else has and that cannot be proved to be compatible using the log consistency proof. Okay, so how do we achieve for consistency. Well, it's actually easy with the tools we have now every time the log server tells a browser. Oh, here's a new sign tree head for a longer log. The browser will require the will not accept the new sign tree head until the log server has has produced a log consistency proof that the new sign tree head describes a suffix of the old sign tree that is that the log of the old sign tree has a prefix of the log of the new sign tree head. And of course, if log servers has forked the browser and is keeping the browser on that same fork can produce the proofs but of course, you know, it's digging its grave even deeper because it's producing more and more sign tree heads for, which will eventually be caught by the gossip protocol. Whereas if the log server tries to cause the browser to switch to a sign tree head that describes the same log everybody else has been the browser will demand a consistency proof and the logs will not be able to produce it because indeed the log described by the first sign tree head is not a prefix of the log described by the second sign tree head. Okay. Okay, so the system these these log consistency proofs provide for consistency and for consistency plus gossiping and requiring this log consistency proofs for the sign tree heads found by gossiping. The two of them together make it likely that all the participants are seeing the same log and if they're not seeing the same log they'll be able to detect that fact by the failure of a log consistency proof. Any questions. Okay, so that how many log servers are there. That is a great question. I describe the system as if there was just one log server it turns out in the real system. There's lots of log servers at least dozens. So this is a deployed system which you can poke around in that is actually used by Chrome and I think Safari. There are at least dozens of these log servers and when certificate and certificate authorities are actually required by Chrome to submit all their certificates to the to the log servers to multiple log servers. The different log servers don't actually keep identical logs. The convention is that a certificate authority will submit a new certificate to save, you know, a couple maybe five different log servers. And actually in the certificate information that a website tells your browser, it includes the identities of the log servers of the certificate transparency log servers that have the certificate in their log so your browser knows which log servers to talk to. The reason why there's more than one of them is of course some of them may go bad. Some of them may turn out to be malicious or go out of business or who knows what. And in that case you still want to have a couple more to fall back on. They don't have to be identical because they don't, as long as the certificate is in at least one log that's, you know, as far as anybody knows is trustworthy. That's sufficient, because, you know, the issue here is not really the fact that the log had the certificate in it because that's not proof that the certificate is good. All we're looking for is log servers that aren't forking the monitors and browsers that use them. So it's enough for a certificate to be an even a single log server that's not forking people, because then the monitors are guaranteed to see it because the monitors check all the log servers. So if a bogus certificate shows up even even a single log server the monitors will eventually notice, because all the monitors look at all the log servers that the browsers are willing to accept. Another question. What prevents a log server from going down and issuing bogus certificates. Before they get caught, you know, nothing actually, if you're willing to. That's definitely a defect in the system that, at least for a while, you can malicious log server can trick browsers into accepting bogus certificate so if you have a certificate authority. The bogus certificates become malicious and issuing bogus certificates. They look correct, but the bogus analog server. Then that's willing to serve these that's willing to put these certificates in the log and of course they all are then at least for a while browsers will be willing to use them. The thing is though that the, you know, they will be caught and this is this system is its intent is to improve the situation in the pre certificate transparency system. If somebody was issuing bogus certificates and browsers were being tricked into using them, you might never find out ever in the certificate transparency world, you may not find out right away and so some some people may use them. But then relatively quickly, you know, a few days or something, the monitors will start to notice that there's bad certificates in the logs, and somebody will go and track it down and figure out who is malicious or who is making mistakes. Yeah, so I guess a certificate certificate transparency log could refuse to talk to the monitors. I'm not sure. I think, ultimately, the if you know, we're now treading into a kind of non technical region, you know what to do if there's evidence that something's gone wrong. This is actually quite hard because much of the time it's something seems to go wrong even bogus certificates often often the reason is just somebody made a mistake was a legitimate mistake, you know, somebody blew it. There's not evidence of malice is just that somebody made a mistake. I think what would happen if a monitor was misbehaving in almost any way like not answering requests. If it was doing consistently, people notice and either ask them to shape up, or take them out of the list of stop using them. The browser vendors would take that log server out of the list of acceptable log servers after a while, but yeah there's like a gray area of bad behavior that's not bad enough to warrant being taken out of the acceptable list. I think about log servers been found to fork the question is what if the log servers been found to fork what happens then. I think what would happen is the people who are run, you know the people, the browser vendors would talk to the log server and ask them the people running the log server and ask them what happened. And if they came up with a convincing explanation that they had made a mistake. So maybe I don't know they their machine crashes it loses part of their log they restart, you know starting from a prefix of the log and start growing a different log. If it seems like a mistake, honest mistake then. Well, it was a mistake, but if the log server operators can't provide a convincing explanation of what happened, then I think the browser vendors would just delete them from the list of acceptable. The log servers. Okay, but these are, you know, these are sort of problems with the system because you can, you know, the definitions of like who owns a name or what acceptable but you know whether it's okay for your server to be down or not. These are very hard to pin down properties. You know, I think the system's not foolproof, you could definitely get away with bad behavior, at least for a while. But the hope is that there's strong enough auditing here that if some certificate authority or log server was persistently badly behaved that people would notice the monitors would notice they may not do anything for a while but hopefully they would decide that, you know, you're either too much of a pain or too malicious to be part of the system and delete you from the browser lists. Of course they split the browser vendors in a position of quite strong power so while the system is in general pretty decentralized here there can be lots of certificate authorities and lots of certificate transparency log servers. There's only a handful of browser vendors, and they're because they maintain the lists of acceptable certificate authorities and log servers, they do have a lot of power. And you know, it's the way it is unfortunately. Okay, so things to take away from certificate transparency design. The key property it has super important is just that everyone sees the same log, even if some of the parties are malicious. Either everyone sees the same log or they can accumulate evidence from failed proofs that something's funny is going on. And because both browsers who are using those certificates, and the owners of the DS DNS names were running monitors see the same log, because of these proofs. They can detect problems and therefore the browsers even though the browsers can't actually detect bogus certificates, they can at least be confident that there if there's bogus certificates out there that monitors will detect them and possibly put them on revocation lists actually that's something I didn't mention if, if there's evidence of a monitor spots, what must be a bogus certificate, like MIT sees somebody they don't know about being issued a certificate for MIT. And you, it turns out there's a pre existing revocation service that you can put bad certificates on that the browsers check. So if a monitor sees a bogus certificate, it can actually be effectively disabled by putting it on the in the revocation certificate revocation system. That's not part of certificate transparency it's been around for a long time. Okay. So the key property is everyone sees the same log of certificates. Another thing to take away from this is that if you can't figure out a way to prevent bad behavior. Maybe you can build something of these usable that relies on auditing instead of preventing that is can detect bad things after the fact that might be good enough. It's often much easier than preventing the bad things. And some technical ideas are here in this, this work one is this idea of equivocation that a big danger is the possibility that a malicious server will sort of provide split views one view to one set of people another view to another set of people. It's usually called a fork or equivocation. It's an important kind of attack. Another property this fork consistency property it turns out it's often valuable to when you're worried about forks to build a system that forces the malicious server once it has forked somebody to keep them on that fork so it can't erase evidence by erasing a fork. The final technical trick is the notion of gossiping in order to detect fork is actually gen if the participants don't communicate with each other. It's actually typically not possible to notice that there has been a fork. So, if you want to detect forks there has to be one way or another, some kind of gossip, some kind of communication between the party so they can compare notes and detect forks. We'll see most of these things again next week when we look at Bitcoin and that's all I had to say.