 Okay, all righty everybody, welcome to the afternoon of day two of Real World Crypto 2020. Today we have two talks in the session, Privacy Preserving Primitive Section. So first we have Privacy Preserving Firefox Telemetry with Prio by Henry Corgan Gibbs. So everybody give Henry a nice round of applause, thanks. So I'm really happy to tell you about some work that I've been doing with Dan Bonnet Stanford and this wonderful group of people at Mozilla on the new Privacy Preserving Telemetry System in Firefox. It's built on top of a system called Prio. So the running example that I want to use throughout this talk to explain how the system works comes from the Firefox browser and it has to do with measuring the effectiveness of a new browser feature called enhanced tracking protection. So if you're a Firefox user, you may have noticed that there's this new feature that tries to detect third-party trackers and other types of cookies that follow you around, advertisers use to follow you around the Internet and this tracking protection feature prevents those things from loading. So it's a useful privacy feature and when tracking protection is active you'll see this purple shield in your address bar. So what Mozilla engineers want to know are the answers to questions like this, like how many Firefox users block the tracking cookie from Facebook.com. And the engineers need to know the answers to these types of questions to better debug the feature, make it work better and kind of tune the heuristics that they use to implement this. So the way that software vendors today often answer these types of statistical questions is by collecting sensitive data directly from their users. So you can imagine each Firefox user has this bit indicating whether or not it's blocked the tracking cookie from Facebook.com and the user could just send this data to Mozilla and then Mozilla could compute this aggregate statistic of interest just on their own. The problem with this type of non-private aggregation solution is that you end up sending sensitive information to Mozilla. So it turns out that the set of trackers that you've seen as you browse the internet says a lot about which websites you're visiting. It's sort of like a fingerprint of your browsing activity. So you would not want to send this data to Mozilla in the clear. In particular, Mozilla would become kind of a single point of privacy failure. An attacker could come steal the data, malicious insiders could abuse it or resell it, and governments could come asking for it. So the goal of this system called PREO that we've designed and built into Firefox is to allow Mozilla to collect aggregate usage data about how people are using the browser and using this tracking protection feature without ever having to see any individual user's sensitive data. There's a new cryptographic tool that makes this system practical called the proof-on-secret shared data and I'll explain to you what that is and how it works. And this is the basis for this new telemetry system I'll describe. For reasons I'll mention in a minute, this system is in a pilot phase still and it's enabled by default only in the nightly build of Firefox. So it's kind of the developer edition of the browser. But even so, this is, as far as I know, the largest deployment in the wild of technology based on probabilistically checkable proofs. So it's an example of a seemingly theoretical cryptographic tool being applied to solve a practical privacy problem. And that's one of the reasons why it's exciting to me. So to dig into this a little bit more, I want to explain what kind of data Mozilla is collecting and how they're collecting it with the system. So in your browser there's a block list of known tracking domains, about 2,500 of them. And for each blocked domain, so each domain on this tracking protection block list, each user, each Firefox user has a bit. And this bit will be set if your browser ever blocked cookies from that tracking domain. So you can think of each user as having a big vector of dimension, something like 2,500. For each of these tracking domains, did I ever block a cookie from Facebook or Orchid or RU4 or the rest of these tracking domains? And as I mentioned, these bits are sensitive because they reveal which websites you've been visiting. They reveal information about your browsing history. So what Mozilla wants is the sum of these vectors over the set of all Firefox users within say a particular reporting period. And the reason they want that is because it gives them useful information about how the feature is working. So for example, if they learn that no users have ever blocked a tracking cookie from Facebook.com, it indicates that maybe the feature is not working well and needs to be tuned. So to keep things simple, for the next few slides, I'm going to pretend that we're only interested in collecting data about one tracking domain. So I'm going to pretend that each user only has an individual bit and Mozilla wants the sum over these bits. And I'll call these bits, user one's bit is x1, user two's bit is x2, and so on. And then if we want to collect information about many tracking domains, we can just run the system in parallel many times. So this is kind of without loss of generality. The architecture of the system, this telemetry system, looks like this. So we have these millions of Firefox users, each with their sensitive bit, indicating whether or not they've blocked the tracking cookie from this particular website. And the system consists of two or more infrastructure servers. So think of Mozilla as running one server and some second organization running a second server. And there could be even more than two servers in a large deployment. The way users interact with the system is each user takes their sensitive bit and they split it into multiple encoded pieces using a cryptographic secret sharing scheme I'll describe. And they send one encoded piece to each of the two servers. And then the servers can jointly compute the aggregate statistic of interest, in this case the sum of all the user's bits. And the privacy property that we're after is we'd like to say that an attacker should have to compromise all of the servers to learn any user's sensitive data. Let's see if this comes back. So there's these multiple infrastructure servers. And the property that we're after, I'm wondering if I should say something different so it doesn't happen again. Privacy is never as easy. Oh, no. It's like I push the next button and somehow it, oh, OK. So I won't go back to that slide. Yeah, privacy is never as easy as it seems to get. So the properties that we're after are first, if everyone does the right thing, the server should get the right answer. So the server should get the sum of all the user's bits. And we can even extend the system to provide a stronger notion of correctness. Like if some of the servers fail, we still get the right answer, but I'm not going to talk about how that works. The second property is the privacy property I just mentioned. An attacker should have to compromise all of the servers to learn anything more than the sum of the user's bits. There's a complementary privacy notion called differential privacy that we can layer on top of this privacy notion I just mentioned, but I'm not, again, going to talk about that extension here. The third property that's a little bit more subtle is about defending against malicious clients. So what we'd like to say is that the worst malicious clients can do to mess up the system is to lie about the value of her input bit. So a client can always submit a 1 or a 0, but shouldn't be able to do anything else to corrupt the statistic of interest. And then finally, we need this to work at really large scale. We'd like to handle, say, millions of submissions per server per hour. It should work at Mozilla scale. So there's a bunch of really nice systems out there to solve this type of private aggregation problem. But as far as we know, each of them requires relaxing one of these four properties I just mentioned. So for example, there's a beautiful line of work on what are called randomized response protocols to achieve local differential privacy. These protocols relax correctness in the sense that they give a rough approximation of the aggregate statistic of interest. And in cases where you're looking for rare events in telemetry data, the noise that these systems introduce blows away the signal that you would have seen. There are systems that relax the privacy model that rely on the security, say, of the Tor network or some type of hardware enclave. There are systems that relax the disruption resistance property that don't protect against malicious clients. And there are systems that relax efficiency so that make heavy use of public key cryptographic primitives or general purpose multi-party computation. But it turns out with this new cryptographic tool I'll describe, we don't need to make any of these relaxations. So to explain how the system works, I want to introduce a strawman scheme that gets us kind of part the way there, but that doesn't protect against malicious clients. And then I'll explain how we patch up the system to get to the thing that we've really built. So think of there now as being three telemetry servers. There could be more than, any number more than two. And each server will start out at the beginning of the day having an accumulator initialized to zero. And each of the clients, each of the Firefox instances, will come online kind of one at a time. And each has its sensitive bit. And the way it will interact with the servers is as follows. It'll split its bit into multiple random shares, one per server, by picking three random numbers, say, modulo of big prime. And the properties of the secret sharing scheme say that if an adversary only controls two of the three servers, the adversary learns nothing about the client's private bit. So the client splits its data into shares, sends one share to each of the three servers. The servers aggregate their shares, so add these shares into their local accumulator. And then that client can go offline. The next client comes online, does the same thing. It splits its data into shares, sends one share to each of the three servers, and then goes offline. So at the end of the day, after millions of clients have participated in the system in this way, what the servers can do is they can publish the contents of their accumulators. And oh, use a specie. OK, cool. So the servers will learn that some of the client's values are nothing else. So for example, they'll learn that 58,000 something users block trackers from Facebook.com, but they don't learn which users did, which is exactly what we wanted. So this super simple scheme already gets us what seems like most of the way there. It's correct in the sense that the servers get the statistic that they're after. They get the sum of these clients' private bits. It provides the privacy notion that we wanted, that an attacker has to compromise all of the servers to learn anything more than the sum of the user's bits. And it's very efficient. There's essentially no cryptography going on here. It's a simple secret sharing. So the problem is that one malicious client can completely corrupt the output of the system. So this is why this simple scheme doesn't work in practice. So to see what I mean, you can imagine that there's some evil ad network out there that's out to corrupt the statistics that Mozilla is collecting on this tracking protection feature. So instead of sending a share of a zero or a one value, this evil ad network can send a share of a large value or a small value or a negative value. And the servers won't notice that this attack is in progress. The privacy of the secret sharing scheme prevents the servers from learning that this has happened. And at the end of the day, they'll get garbage data. So you can protect against these kind of corruption attacks using general cryptographic techniques like multi-party computation or traditional zero-knowledge proofs. These tools are extremely powerful. We'll definitely solve this problem. But in some sense, they're too powerful. They end up being more expensive than we need. So the idea that we have here is to kind of design a new cryptographic tool that we call a proof-on-secret-share data that's tailored to solve this exact problem and it ends up being much more efficient in the setting. So just to give you an analytical sense of why that is, these techniques, imagine if the client is sending a packet of n bits encoded in this way I described. Then if you use dishonest majority multi-party computation, which is what we need for the threat model we have, or kind of general zero-knowledge techniques, these require either a lot of public key crypto on the client or the servers, and a lot of communication either between the client and the servers or the servers in each other. And as a result, these systems impose something like one to three orders of magnitude slowdowns over a preo, which requires no public key crypto for the kind of disruption resistance part of the scheme, and essentially the minimum communication that you could ask for between the clients and the servers. So let me explain what this proof-on-secret-share data is, and then I'll try to give you a flavor for how we construct them. So I'm going to change the problem slightly. So now think of the client as having a vector. So this is this bit vector I mentioned at the start, where for each tracking domain, the user has a 1 or a 0. So there's n bits in the client's data packet. And the servers each have additive shares of this vector, modulo sum, some big prime. So what the servers need to be convinced of, to be convinced that the client has sent them a valid encoding of a data packet, is that if they summed up their three shares, modulo this big prime, they would get a 0-1 vector. So we can think more generally about the setting in which the servers want to check that some validity predicate, I'm not going to say what it is, but some validity predicate holds on the secret-share data that they're looking at. So in preo, the way this works is we have the client sends not only a share of their data packet to each server, but also a share of a proof. And this proof convinces the servers that the client has sent them well-encoded shares of a data packet. By exchanging a constant number of bits of information, the servers can check this proof and ensure that a client isn't corrupting the statistic, so essentially ensuring some notion of proportionality. So this is at a very high level what a proof on secret-share data is. I want to now give you a taste for how we construct these things. So again, the setting is that the servers each have a share of a vector, and they want to check that some validity predicate holds on this vector. So think of this validity predicate as being expressed, say, as an arithmetic circuit. What the servers could do, the first thing that you think of if you're a cryptographer maybe, is well, they could just run a multiparty computation to check that the secret-shared values satisfy this validity predicate. But the problem with this is it requires a lot of communication between the servers, and in the dishonest majority setting also requires potentially a lot of public crypto. But what we can take advantage of here in this setting is that we actually have a client that knows all of X. So the servers only have shares of X. They don't know X itself, but this client is kind of all-knowing. It knows exactly what its data packet was. It knows what shares it gave to each of the servers. So we're going to have the client help the servers do kind of a multiparty computation. The idea is that the client will kind of imagine in its head the process of the, yeah, yeah. All right, so the idea is that the client imagines these three servers running a multiparty computation and writes down a transcript of the messages that each server would have sent and received in this NPC if they had run it. So the client imagines this NPC, writes down these transcripts, and then sends the transcript of each server's messages to the respective server. And these transcripts amount to the proofs. This is kind of the proof on secret shared data. So now what the servers have to do is they just have to check that their transcripts are valid in the sense that they kind of match the share of this data packet that the client sent and that they're consistent. So the messages that server A sent to server B and server A's transcript match the messages that server B received from server A and so on. And it turns out that checking a transcript of an NPC in this way is much, much, much easier than generating one without the help of a client. So again, just at a very high level, the way this works is the servers generate some kind of randomized digest of the transcript, so they hash down the transcript using a randomized hash function, and these digest values leak nothing about the client's secret input X. And these digest are structured in a way that if the client's data packet is well-formed, these digest values will sum to zero, and if the client's data packet is ill-formed, they'll sum to something non-zero with very high probability. So what the servers can do to check the proof is once they've computed these digest, they just broadcast them to each other, they check that this predicate holds, and they accept the client's data packet, if so. So you notice there's no public key crypto. This hashing step is actually extremely efficient. So in concrete terms, this looks kind of like this. So on the y-axis here is the throughput of a five server cluster with pre-o servers and five different data centers, how fast they can process client data packets, and on the x-axis is kind of the number of tracking domains on the block list. So the scheme, the red dash line here is a scheme with no privacy where the client just sends its data to the servers over TLS, and the blue line here is a scheme that uses general purpose zero knowledge proofs to provide disruption resistance. Pre-o ends up being something like 50x faster, and this is because we're not using public key crypto, making heavy use of public key crypto, and it's something like 10x shy of the baseline. So you still pay something for privacy, but you don't pay nearly as much as you would have with the standard techniques. OK, so that's what the system is. Now let me talk a little bit about how it's built into Firefox. So Firefox ships today with libpreo, which is a C library we wrote that implements a subset of this pre-o system. And one of the nice things about the fact that we're using only information theoretic crypto and very simple primitives is that the library is not too big. It's something like 4,000 lines of C. And it's also pretty fast, so encoding a length 1,000 data packet in my browser with all the extra middleware and stuff takes something like 35 milliseconds. And this is without a bunch of optimizations that we could still add but kind of don't need because it's fast enough already. The Mozilla people wrote some really nice Python bindings to simplify the kind of server side data analysis and make it friendlier for their data scientists to interact with. And since late 2018, the library ships with every version of Firefox, so stable beta and nightly. But since Mozilla is running both of the pre-o servers, it's enabled by default only in the nightly build of Firefox, where users have opted into extra kind of data collection. So the next step, which I'll talk more about in a second, is to move the second server out to an external organization so that this can be used in production. So the way this is built into Firefox is as follows. So your browser has two public keys hard coded into it, one for each of the two pre-o servers. And periodically, when your browser generates a telemetry ping, it sends one packet to Mozilla and one packet to the second server, proxied through Mozilla's existing telemetry infrastructure. One nice feature of the secret sharing scheme we use is that you can kind of swap out an information theoretic secret sharing scheme for a computational one. And this has the result of shrinking the size of the packet to the second server to just the size of an encrypted AES key. So the bandwidth requirements on the second server are actually quite modest. If you're a Firefox user, you can set this preference that enables your browser console. And I'll post the slides if you're interested in trying this at home. And you can run this pre-o encoder yourself. So you can generate some proofs on secret shared data in your browser and be convinced that this is actually not too computationally expensive. If you use Firefox nightly, which you can download if you don't have it, and you set this preference that enables this telemetry UI, you can see which tracking domains your browser is blocking and what data is getting fed into the pre-o system, which is kind of fun. As you browse the web, you'll see that these numbers change and the telemetry data that you're sending back changes. So Mozilla has published a number of things about this deployment. So if you Google origin telemetry, you'll see the documentation that they have for how this is built into Firefox. And then they also have a couple of blog posts. So Rob Helmer and others wrote a blog post in late 2018 when this pilot first started about how Mozilla was planning to use this telemetry system. And earlier last year, there was an update after the system was kind of rolled in with this tracking protection feature and explains how they're planning to use it. In the initial deployment, they're collecting data on these 2,500 block list rules, which include tracking domains you may be familiar with. And what we did is talk to a data scientist at Mozilla and get kind of what's the minimum amount of data they would need to collect to get statistically significant information about the statistical questions they wanted to answer. And what they told us is we need data on something like 0.015% of page loads for 1% of clients. So when this rolls out to stable, that's what it's going to look like. And at this deployment size, we're expecting to process something like 200 million telemetry submissions per day from all Firefox users. So this happens, I guess, every 24 hours. The browser will take the aggregated data from the local pre-o clients and send it up to Mozilla servers through their telemetry system. And this amounts actually to pretty modest infrastructure cost for both Mozilla and the second server. So for the second server in particular, all this data from all Firefox users when it's used in production will only require something like tens of gigabytes of network bandwidth per day. So I've been mentioning this second server a few times. The real question is, who is this going to be the second server? So pre-o, as I mentioned, requires two non-colluding servers. The privacy property says users' privacy is protected as long as an attacker doesn't control both servers. So if Mozilla controls both servers, this is not very interesting. So ideally, we'd have independent organizations on independent infrastructure, so not both on Amazon VMs. And in different countries, perhaps to get some independent legal jurisdictions controlling these servers. So this has proved to be a pretty serious non-technical challenge. So technically, it's not rocket science to make it work. But from a business perspective, it's turned out to be kind of sticky. But I think there's some reasons to be optimistic. So the first one I already mentioned, that infrastructure costs are not actually that big for the second server. The second thing is that there's multiple candidate organizations that Mozilla has been talking to over the past few months that have kind of a privacy-centric mission that are interested in seeing this kind of advanced privacy technology work at a larger scale. And the third thing is Mozilla has looked at kind of a quid pro quo arrangement with the second server where if there's some other organization that wants to use pre-o, they can kind of act as the second servers for each other. So Mozilla can act as the second server for this organization. So I'm optimistic that this is going to happen in the next few months. And Mozilla, the people I've talked to there, say that they're really hoping to do it in early 2020. So before I wrap up, I want to mention first that the library is open source, and you're happy to, I'd be happy for you to take a look at it. There's a bunch of things that we still would like to do and don't have the cycles to do, so things like adding support for a wider range of data types. Preo can actually support computing more interesting functions than just sums of bits, but we haven't implemented those in the library yet. There's a bunch of optimizations, too, that we'd like to implement, and to add these differential privacy features I mentioned that we haven't built into the library. The Firefox team is really interested in moving from C code to Rust code, so that's something that's going to be happening with Lib Preo soon, I hope. And of course, if you're interested or you know someone who really wants to run the second server, please come talk to me or talk to the Mozilla folks who are here. I'll also say some people have already found bugs in the C code, and it's eligible for the bug bounty. So if you are interested in making some money, I'm sure that there are more bugs in the library, so please take a look. Great, so Preo is this new system for privacy preserving telemetry that Firefox is using to collect data on this tracking protection feature, and the deployment is ongoing. I hope we'll have more blog posts come up as we make progress, but please come talk to us if you're interested in helping out or interested in privacy preserving telemetry more generally. Thanks. OK, we have time for a few questions. Small point, maybe, but you're arguing that these two servers are independent and are not going to collude. I just wondered if you felt that your offer of, if you be second server for us, we'll be second server for you, isn't that compromising that independence of those two entities? Yeah, that's a good question. I think the question is whether users will be able to trust that these two organizations will not come together and collude. I think it depends on who exactly the organizations are, but I think I can imagine there's some nonprofits, for example, or even kind of privacy first tech companies that I think users would have the right to trust them not to collude in this way. The non-collusion requirement is only interesting if the client can make sure that it's talking to the two non-colluding servers. So how is server identity established to the client? Is there like a public key identity that's baked in that would have to be changed as part of an update? Right. So the public keys of the two servers are in the browser. And so you can compile the browser from source if you want, or you can inspect the Firefox source and see what those public keys are. So presumably the second server would publish on their website, this is what our public key is, and this is what should be in your Firefox browser. Can you talk about the scalability when you have multiple servers, in particular the proof part? Other things look scalable, but the proof from the client perspective, will it be scaled for more servers? Yeah, so the proof scales, essentially the cost to generate the proof scales linearly with a number of servers. So it's not quadratic or anything, but essentially what the client has to do is it has to generate this proof and then split it into case secret shares for case servers. So as the number of servers grows, the number of shares that you're going to have will increase. Thanks for the talk. What I wanted to ask is, if we're talking about a very, very large number of users, even local differential privacy might get relatively accurate because of the large number, and you can do a much larger number of users more than 1% for example, because the bandwidth will be much slower. So I don't know if you consider different trade between those options. Yeah, absolutely, I mean local differential privacy is a super interesting way to solve this type of problem. I think the issue that comes up is that the variance that you're going to get in the statistic is going to grow something like the square root of the number of users times some constant that depends on your differential privacy constants. And if you actually run through the constants and actually calculate how big the variance is going to be, it's often much bigger than the expected sum of the statistic that you're actually interested in. For rare events in particular, the noise really will wash out the data. So I think in cases where you're expecting kind of a large sum and the square root error is OK, I think local differential privacy is a great solution, more generally, it's not clear to me that that will always be kind of applicable. Thanks for the talk. In this pre-o ecosystem, is there any space for more advanced functionalities as opposed to just aggregation? Like does Firefox need those? So you're asking me about sort of more interesting aggregate statistics than just some? Yeah. Yeah, absolutely. So we've looked at things like computing linear regression over data-helded clients. And the system can actually support that without too much work. I think the really interesting question is whether you can do things like the more sophisticated training machine learning models over data that's held in distributed fashion by the clients. And there's been a lot of research on this, but nothing that seems like. I was thinking it from the Mozilla point of view. Like, do they need those functionalities? Because yeah, there is a lot of academic literature, but is there a real use case in there? So what surprised me actually was that, at least for the first set of statistics that they came to us asking about, just sums actually gets you 95% of the way there, if not more. So even a very, very simple system like this captured a lot of the telemetry statistics that they're interested in. One that it doesn't capture is if you have each client, say, has a URL, and you want to know the most popular URL. It's not easy to do that with this type of system. So that's an example of a statistic that they'd really like to be able to collect. But without doing more work, this type of system won't do it. Thank you. Hi, I'm up here. Can I ask a question from the top? So one thing is that you did a lot of work to make sure that a client with a single request can't mess up your statistics, but you also did a lot of work to make sure that clients are anonymous. So it seems like an important aspect of the system is that one client can't make lots of requests to mess up the statistics. Is that out of scope of this project, or did you do any work on that? Yeah, so the question is essentially about civil attacks, if I understand it. Exactly, yeah, yeah. Which are super important. I think the thing is that Mozilla already has, even if you don't care about privacy, even if you don't care about PREO, they already have to worry about civil attacks for their existing telemetry infrastructure. And so they already have a system in place to detect when one client is submitting a huge number of telemetry packets from a single IP address in an attempt to kind of manipulate the telemetry data. And essentially we can inherit that existing system. So it was kind of out of scope from our perspective, but it's certainly in scope for the telemetry engineers at Mozilla. Thanks. How much money have you paid out so far? I think there have at least been two bugs. I don't actually know the dollar amounts, but yeah. There were bugs in writing C code without memory errors is not so easy. That's the move to Rust, so actually this is what prompted me to start rewriting in Rust. Okay, thank you, Henry. Thanks. As the next speaker gets set up, there are two lost items. So I have a key card to a hotel. If you know which hotel you are staying at and are missing your key, come talk to me. And the other is a very specific amount of cash. So if you know how much cash you lost, as a clue, it's a K smooth number with K equals three. So come up to me and we'll find it. There'll be just one more moment. So please enjoy the logos of the sponsors. All right, technical difficulties. We'll get there soon. Our next speaker is Ida Jop, who is going to be talking about Pays You Go, which is a project that is a practical implementation of direct anonymous attestation. So just a second, we'll get some HDMI issues resolved. Okay, let's please, yeah, welcome the speaker. Thank you. Hello everyone. So I'm going to present this talk, our work entitled Pays You Go, which, well the goal is to actually solve the privacy breaches in the current deployment of transport passes in public transport networks. And this is a joint work with my colleagues at Orange, Nicolas de Moulin and Dr. Chowri. All right, so transport operators have been working with mobile operators and standardization bodies for a while, actually, to deploy commuters credentials directly onto their mobile phones. So it's a standard authentication architecture where a user actually goes to register with a transport operator. Then the user obtains that credential directly onto their mobile phones. And during the registration process, for those of you who have actually registered, you might know that you provide a number of personal data that are stored in servers remotely by the transport operator. And then you can use your credential to actually validate on the network really easily. So it's a technology that's been deployed in many cities across the world for a while now. But as it stands, actually, the protocol allows users to be completely traceable across the whole network. And that's what we try to solve with our solution. So you might already be familiar with the new strict regulations regarding privacy in Europe. And regarding the transport operators, the privacy definitions have changed a little bit to include a second notion of untraceability. So this shouldn't look like this, but that's okay. So you have the usual anonymity so that when a user validates their path, you shouldn't be able to actually trace back to their identity from that validation. But you also have the notion of untraceability. So if you validate across a number of stations multiple times, the transport operator or the mobile operator or anyone actually on the chain of command shouldn't be able to link back all those validations to the same path, therefore to the same identity. But the international standard for NFC-enabled authentication is called Calypso. So those, yeah. And all the transit passports are deployed across the world from the English oyster car and the London oyster car from the Paris bus Navigoo and the Octopus card in Hong Kong. They're all based on this specification here. Basically, when a user wants to validate their card, the reader detects the card and then prompts for the card ID. And then the card sends its ID in clear and the reader is able to actually compute the authentication key from that ID and another key called the master key. This is standard for any type of two-way authentication based on symmetric key encryption algorithms and that's how the protocol works at the moment. So as you might have guessed, the privacy problem will arise in the first two steps here because if the same pass gets validated at multiple stations by providing its ID and the transport operator stores all these validations in clear, then it's really incredibly easy to generate itineraries for the same user and then this leads to privacy problems because transport operators are then able to, while in fact, a lot of information from people's daily habits when you go from your home, for example, to your work every day, they are actually able to identify you really precisely with using the system. So we wanted to leverage some cryptographic solutions that already exist and that's what we did because in trust computing, they had the same problem in terms of the linkability of transactions and well, we had the idea of using that and that's what we did. So the initial anonymous public key signature scheme is a group signature where you have a group manager and a user and a verify and the user obtains this group signing key and you can anonymously sign as a member of a group and oh wow, this shouldn't look like that, sorry. You can anonymously sign as part of the group and any verifying entity doesn't have any information about the user's identity. So it provides anonymity and non-frameability which means that even if the other group members collide with the group manager, they cannot actually generate a signature on behalf of a known as user but it also provides traceability meaning that the group manager or sometimes a different entity called an opener but to simplify things I just say the group manager can actually lift the anonymity of the user and group signatures are also unlinkable. So if the same user generates two signatures, there's no way to actually link them back together and this is a property that we'll need so we didn't, group signatures weren't quite enough for what we wanted to do. Yeah, so about two decades ago, like I said, in trusted computing they need, oh well, okay. So they had this, well this shouldn't look like that, I'm sorry. They needed a variant of a group signature where basically the signatures should be linked. So in order to attest of the validity of a platform which embeds a TPM and therefore remotely attest of the validity of the platform, they needed a group signature scheme that could actually allow a user to generate signatures that could be linked together but also remain anonymous with respect to the group manager that's called here and issuer. So in 2004, I talked a lot but I wanted to say that Brickhead Commission Chen proposed this scheme called direct anonymous attestation where basically this is exactly what you have. So you have a platform that can obtain a group signing key anonymously and generate anonymous signatures that could be linked using this tag here called the base name, this should be here. It's an Inclability tag so basically when the platform generates two signatures using the same base name, they can be linked. And the DAA schemes over the year, since the first introduction in 2004 that was based on RSA, the subsequent schemes were based on elliptic curve cryptography and that they basically divided into two families depending on the underlying security assumption. Those based on the RSAW assumption which is an interactive assumption that tends to generate more efficient schemes but the problem is that the assumption is an interactive one obviously which is not ideal from a cryptographic point of view. And those based on the QSGH assumption which is non-interactive assumption but tends to generate schemes that are less effective, less efficient. So what we did is that we proposed a new DAA scheme that's actually based, we kind of get the best of both worlds because our DAA scheme is based on QSGH so we have a fairly standard underlying assumption but it's also more efficient than all the existing of the DAA scheme so far. This is going to be our underlying cryptographic construction for the mobile transit pass protocol. Well I said our DAA scheme, it's not quite DAA scheme actually because in the traditional DAA model the TPM doesn't have enough computational capabilities to actually perform certain operations like a pairing computation. So it delegates part of the computation to the host which can be a bar phone or a PC. But in 2013 there was this new model called the PDA model where the host and the TPM are considered to be just a single entity so all the signature generations were done by the TPM actually. And this is the model that we were interested in for our use case because for our mobile transit pass we wanted the SIM card to be actually able to generate all the signatures in a standalone manner and I'll justify in a little bit why we wanted that property. So our scheme is actually a PDAA scheme. All right, so if you go back to our use case, how do we use our underlying cryptographic construction? So our issuer is therefore the transport operator. So the user registers with the transport operator and gets its group signing key which is basically a group DAA signing key, pre-DAA signing key actually. And the group in our case is obviously all the users that have a valid subscription plan. And each user can then use their mobile phones and to generate the DAA signatures that can be verified by the reader present in every turn style, every turn style. And we use the base name in the pre-DAA scheme kind of in an innovative manner because we use the base name as a time slot. So if we set the base name to be, for example, like a time slot of 15 minutes, if a user generates two consecutive signatures in those 15 minutes, they're going to be linked. So we detect, so we prevent a property that's important with mobile transit passes called pass back. So you can't really validate the same pass, twice consecutively without being detected. Okay, so what properties do we actually achieve with our protocol? Our pre-DAA scheme, like I said, is efficient enough to be computed solely by the SIM card. So this allows users to actually use their mobile transit pass protocol even when their phone runs out of battery, for example, or the phone becomes compromised. Also, there is a specific time in mobile transit pass specifications that is that authentication should be done in less than 300 milliseconds and we achieve that property as well. Our mobile transport pass protocol inherits from the anonymity and traceability properties of our pre-DAA scheme. And we're also able to detect transport pass clones by generating an extractable commitment on the user's public key. And like I said, we use the base name of the DAA scheme to actually enforce anti-passback property. So in practice, we did a prototype implementation of our protocol where on a standard Java card, where basically just ask the embedded card company to give us access to the mathematical API so that we could do elliptic curve cryptography. And as you can see, we obtain some interesting timings here where the SIM card can generate the signature in under 200 milliseconds and the overall authentication time is well within the threshold specified in the standard. Okay, so to summarize, transport pass protocols as they stand, each and every one of you are actually completely traceable. So what we wanted to do is leverage the techniques that we already had in interested computing to provide the anonymity, but also in traceability properties that we needed. But that's why we use the DAA scheme but more importantly a pre-DAA scheme so that our SIM card in every mobile phone can actually generate the signatures in a standalone manner so that you can validate your pass even when you run out of battery or the phone is turned off for one reason or another. And we achieve also, we also enforce user's responsibility and accountability because obviously you can't really just use your pass twice to let other people in and stuff like that. So the real goal of this work was to actually push for the standardization of pre-DAA models because most of the time for our use cases we need cryptographic protocols to be able to be implemented on the SIM card or learn without any delegation. And this is a step forward for those kinds of applications. Also new applications such as the remote attestation of IoT devices, for example, where devices don't have much more computational capability than a SIM card, so they can be useful for that as well. All right, so on that note, I'll conclude my talk and thank you for your attention. Okay, questions. Let's start up top, actually. Hi, thanks for the talk. Sounds like an interesting application. My question is about how revocation is supposed to work in this protocol. Normally the issue with deployment of DAA is that you don't actually have an identity to revoke. So I'm curious if you're gonna have to redistribute the cards to everybody who's using this mass transit system in order to re-key after every period or something like that. Well, when, for example, a client is detected, the revocation is actually, you can revoke with the DAA scheme because when the anonymous group signing key is, well, a client of an anonymous group signing key is detected, there's actually a way for the issuer to, well, put that group signing key on a, they call them a revocation list and then it's the very first job usually after that to check that the key, well, the anonymous signature wasn't generated by the key that's present on that revocation list. So it might not be the most scalable method at the moment, but that's what's being done, so. Cool, let's keep it quick because we're a little bit over, so Richard. Hi, thanks for this presentation, it's interesting. Some transit systems require the identification of the same user on their entry and their exit to the system, which seems like it requires a degree of linkability, so can this scheme accommodate that kind of small scale linkability without opening the door to much broader scale tracking? Yeah, because as long as the, well, I said 15 minutes because, I mean, I wanted to give an example, but for example, we could have two types of base names where one type of base name could be linked to, could have a much smaller timestamp than the other one, and one could be used to actually, well, to take pass back and want to let a user through. So we haven't gotten that far yet, but we think that, you know, with 3GA scheme it's something that can actually be done, so. Thanks. So I have a related question, which is, can you accommodate pair ride payments or does this only work for a sort of monthly pass sort of thing? This, as the amendment only works for long-term subscriptions, just monthly or yearly. I have also a related question, which is that a lot of systems use registration to permit refunds in case someone has, for example, tagged in and out incorrectly, or to permit a user to say to their employer, I took this particular ride, I need reimbursement. Do you think your system, your protocol will support either proving you took a given ride or supporting refunds in case of delays or incorrect usage in the future? Yeah. I've mentioned the clone detection before, because there's actually a way to, well, revoke anonymity and trace back to their user, but what we do is that we use an extractable commitment and we share that trap door between two entities with completely different incentives. So one would be with the transport operator and one would be with the union for user privacy, for example. So if a user actually brings proof that they need to reimburse, for example, these entities can come together and revoke the anonymity and actually check for the transactions of that user and verify that. All right, let's thank the speaker again. Thank you. That brings the privacy preserving primitives section to an end. We are starting a brand new section right now. So this is, we're gonna have two talks on side channels. So I'd like to introduce the first speakers. Do you have slides and everything? Or is this, okay. And the slides are up. Okay, suit around them, black swans. Please thank the speaker. Is the mic working? Oh, we're good. There we go. Afternoon, everyone. I'm Shanan Connie and I'm gonna be talking to you today a little bit about counter-DRBG and some attacks that were found against it. Clicker is not, there we go. Yeah, so there are, in recent times, we've seen plenty of attacks against pseudo random number generators which form a critical part of any crypto system and if compromised can result in devastating attacks. So some recent examples from past history. We have the Julie C. Backdoor and the work of my co-authors and I on the Juniper incident, which was presented at RWC just a couple of years ago and the duck attack, which, there we go. And the duck attack on ANSI X931, which was another example of how state compromise can lead to fatal protocol flaws. Many of these designs are standardized in this SP 890 series, which lists approved designs. Among them, Julie C, which was deprecated following the incident, HMAC-DRBG, HASH-DRBG and the focus of today's talk, Counter-DRBG. Up until recently, there was limited formal analysis with that changing a little with Woodigan Shumo's work at RWC two years ago. Counter-DRBG, the focus of today's talk, is highly popular. We surveyed the set of NIST certifications and found that over 67% of certified implementations use this design. It's been integrated into major libraries, operating systems and even in hardware into CPUs. Its design consists of a state, which is composed of a key K and a counter V. And generally the way it works is you increment the counter, you encrypt it under some block cipher and you use that as the output. There's an option for the user or the implementer of the DRBG to provide additional entropy, which gets mixed in through the block cipher in order to provide protection against state compromise. The generator works in a three-stage process. In the first and third stages, the counter is incremented and encrypted along with any additional entropy if the implementer is chosen to use any under the block cipher, the output of which is used as the new key and the new counter. The next phase, the same thing happens again with the counter being incremented, only this time the output is used as the output for the pseudo-random number generator, which is provided to the client library or use case. The third stage is again similar to the first. You mix in the counter, the additional entropy, and you update the to a new key and counter. This all looks great, but Woodage and Schumo pointed out a subtle flaw. Problem number one, the key is not rotated until after the encryptions are done. So there's a loop through which the second phase is repeated over and over again until as much output is generated as the user requested, which means during this period, the same key is reused. If the key is compromised at any point during this process, you may be able to recover the counter and thereby compromise the DRBG. The second problem in the design is that the additional entropy is optional and implementer chosen. The standard does not require any particular type of entropy or any amount of entropy, and so it is up to the implementer to choose whether they want to ignore that altogether. Going more depth into these two problems. With the first one, if the attacker compromises the key using something like a side-channel attack, an attacker might then be able to decrypt the PRG output to learn the state. So what they do is they would use the key K, decrypt the output, and that would give them the last value of the counter that was encrypted, which they can then increment or decrement as needed to get to the next state of the DRBG. The second problem is with the lack of entropy. So once the attacker has the K and counter value, if they're able to guess the value of add-in, they can then wind the generator forward to the next state. And this is if the additional entropy is used at all, which it might not be. If it's not used, incrementing the counter state is pretty trivial, you just increment the counter value and then run it through the update procedure. So given that this is real world crypto, the important question is, is this attack realistic? There are a couple of preconditions that are necessary for this to be the case. The first of which is asking whether our counter DRBG implementations are actually vulnerable to a state compromise attack or a key compromise attack. Well, it uses AES, which is known to have some problems of implemented using software lookup tables, but we've known about this for a very long time, so what? It wouldn't everyone already know to use AES and I or hardware AES? Well, yes, most implementations do use this for encryption by default, but it turns out that they don't use it for AES when used as part of counter DRBG and it's a surprising number of instances. One that I'd point out amongst the different implementations that we surveyed is OpenSSL when used in FIPS mode. So surprisingly, OpenSSL in FIPS has some vulnerabilities that OpenSSL not in FIPS has. This is an old version of OpenSSL at this point, but serves to illustrate the point that at various times when using AES in a random number generator context, the lessons that we thought would learn from AES in an encryption context appeared not to have been learned. This is not entirely surprising given the way that FIPS models side-channel vulnerabilities. They don't actually, in prior versions of FIPS, so 140-2, which up until September of this year was the most recent version, they actually allow cryptographic implementations to be vulnerable to side-channel attacks while still remaining compliant. The language that they use is that cryptographic modules may be susceptible to other attacks and then go on to list the set of attacks, but they say that they're outside the scope of the standard. The second precondition for mounting a successful attack against counter-DRBG is discovering a scenario where we actually observe the output in a real-world protocol that we can then decrypt to go back to the state of the PRG. So one question that we needed to answer is is this a problem for real-world protocols like TLS? The scenario being that the attacker would compromise the PRG using their side-channel attack during protocol execution that observe the output and then be able to go back from the output to the state of the PRG and wind it forward. For an AES cache attack, though, you actually need to see a lot more output than just one or two blocks of the state. In this case, empirically, you need to see about 2,000 bytes of AES output to perform key recovery. Junipera and duck attacks, which I mentioned previously before, used nonces for state compromise, but the nonces in TLS are a little too short or significantly too short, actually, to mount our cache attack. They tend to be about one or two blocks in size. So we exhaustively went through the entire TLS spec looking for places where the output of the random number generator might be used and trying to see if we could get the protocol into a state where we could observe this output. First thing we looked at was randomized padding used as part of RSA PSS. No luck there, RFC 8446 restricts the amount of randomized padding that can be used as part of TLS 1.2. What about extended random? This was great for the Junipera attack, but it's a non-standard proposal and there aren't really any functional implementations out there which makes it a little less exciting. PKCS v1.5 padding, well, this is a bugbearer that we've seen before. Maybe this is the solution to our problems. Well, it turns out if the server uses a really large RSA modulus, the client actually will generate enough randomized padding which the server can then observe. So if you get a malicious server to serve a really large certificate, they'd be able to see this randomized padding. Let's run with this idea and see where it goes. So what are the full set, what's the full attack scenario we're looking at? Well, first you have to have an attacker able to pull off the side channel attack. So they're co-located with the victim TLS client using counter-DRBG. One interesting thing I'd point out here is that a lot of prior work of this type actually targets TLS servers. This one, it's a little different, we're targeting a client where co-location might be a little less realistic. The client also has to connect to a malicious TLS server with a 16K RSA modulus. The server must request an RSA Cypher suite and mutual authentication and the client must also authenticate with an ECDSA signature. This is really quite the list of requirements that we're putting on our attacker, but the key point here is that we're instantiating a theoretical weakness and this teaches us something about the way we design and model PRGs that's actually quite useful despite the fact that the attack scenario might seem a little far fetch. This is what the protocol flow looks like and I won't tire you by going through all the details but I will point out two key points. The client key exchange is where all the magic happens. So there the client connecting to the server sees that it needs to encrypt to a really large modulus and generates a large amount of padding, 1,996 bytes, which the attacker then uses as its opportunity to side channel and the very next thing that's generated is the nonce used for ECDSA signing and as is well known, if you know the nonce that was used to generate the ECDSA signature and you have the signature, you can then go backwards to recover the long-term private key which is exactly what we're going to do. How did this process work out? Well, we used flush reload to attack the T-table AES implementations used across the various libraries. We brute-forced any additional entropy that was used so a number of them didn't use sufficient additional entropy to keep them safe and then from the ECDSA signature nonce, we computed the victim's ECDSA private key, gaining access to impersonate them in any future connections. The complexity of this attack was totally reasonable with the parameters chosen by the various implementations. At a maximum, it took us 30 minutes to brute-force through the entropy and for the case of Fortios, we didn't actually need to do any brute-forcing at all because they didn't use any additional entropy. We also discovered a bonus attack where counter-DRBG was operating inside a secure enclave in SGX. There are several attacks by which a malicious OS can single-step through the victim and thereby obtain really high-resolution cache traces. With these in hand, we could do a similar attack requiring only two encryptions for state recovery, which actually happens every time you call the random number generator and this could be performed blind so you wouldn't even need to see any of the output from the PRG in order to mount this attack. What are our overall takeaways from this piece of work? Well, random number generators can be side-channel too. It's not just encryption, it's not just user behavior, random number generators need to be thought of when thinking about side-channel mitigations. We don't always learn lessons we think we've learned. We should never use t-table AES anywhere. It's not just for encryption that it matters. This lesson we can say time and time again, but it still appears not to have sunk in. When using a pseudo random number generator, implementers and designers need to ensure that sufficient entropy is always, always incorporated. Counter-DRBG is not provably secure and therefore this should be taken into account when thinking about it and using it. FIPS 140-3 actually updates the threat model but we should be sure that when creating standards and designs that we consider a variety of potential threat sources and not necessarily just pure theoretical models which don't incorporate hardware and side-channel attacks. And finally, when choosing what random number generator to use, if you are going to choose something from SP890, use hash-DRBG. Turns out that HMAC-DRBG also has problems. See Woodage and Schumo for their beautiful analysis. Counter-DRBG has the theoretical problem and as we all know, Dualy-C has its fair share of issues. And with that, thank you very much. All right, questions. It seems to me like your bonus attack is actually much more powerful than the base attack. So can you give a little bit more details? Is it a bonus attack explained and which implementation is it targeting? Sure, so that was targeting a version of TLS implemented as a client library in SGX called Embed TLS SGX, which is not really widely used, but it turns out that there aren't many client implementations out there. The way it works is by using a differential path that exploits the fact that counter-DRBG, in counter-DRBG, the plain text that you're encrypting through AES only differs in the last bit sometimes. So you have more information than you might when doing an ordinary differential attack. So it's similar to fault-based differential attacks. I see, and this is reported to ARM and fixed? Yes, we reported these and disclosed them and got either responses or sometimes they weren't fixed with some of the vendors, the various issues, but we tried. Thanks. Do you have any suspicions about how much of the, how many of these attacks were known before they were proposed? I think it's probably a stretch to say that we have any evidence that this one was used. I think we could probably wager that at least the dual ECE backdoor was known ahead of time. And for the others, I'd hesitate speculating, but this definitely, I think, teaches us lessons about the way we designed, not just random number generators, but the way we designed cryptography in general is to think very carefully about the way that we're standardizing them, so as not to leave room for, say, malicious parties or even inadvertent flaws to get through the process. Okay, thank you very much. Thanks. We have one more talk in this session and then off to a break. So, speaker, please come on up. Okay, everybody, please welcome Chandler Karuth, who's gonna be talking about cryptographic software in a post-Spectre world. Hi there, buddy, I'm glad to be here. So the first thing I have to tell you, so I'm Chandler Karuth, I work at Google, and I'm actually not a cryptographer. In fact, I don't even, I'm actually really bad at math. I was gonna get a math degree when I was in college, and I had a very polite conversation my second year in with a professor who encouraged me to not pursue that any longer. I actually work on programming languages and compilers, C++ and LLVM specifically, and I'm gonna be talking about how, you know, that really impacts Spectre and cryptographic software. I got involved in cryptographic software because someone you probably know better than me, Adam Langley, who works on boring SSL, really wanted to understand why compilers and cryptographic software were so frustrating when they were used together, and he asked me to come and help people understand that, and about halfway through figuring that stuff out, Spectre happened, and I ended up helping Google respond to Spectre. I'm one of two leads at Google for responding to Spectre. I handle most of the application software response to Spectre, and it's taught me a lot about how computers work, how CPUs work, but also how cryptography works, and sometimes doesn't work. I wanna try and give you all a summary of the lessons we've been learning over the last couple of years, really diving into Spectre and side channels for our cryptography libraries at Google. In order to do that, we have to talk a little bit about side channels. When I talk to people about mitigating cryptographic side channels, they're surprised that we care, which always confused me. This seemed like it was super important, and I dug into it, and they made some really good points. There are so many side channels. They're timing attack side channels, but there's also power and EM and acoustic side channels. There's every side channel you can imagine, and it can be kind of daunting to think, are you really going to mitigate this stuff? Like, does this actually matter in practice? Are real users impacted by this? They tend to be incredibly difficult to exploit, very expensive, and software's full of normal bugs. I mean, a colleague of mine has a great talk that he keeps giving about the Linux kernel and how many bugs there are in the Linux kernel, or you go to some other piece of software, there's so many bugs there, and humans are also pretty inexpensive to compromise. If you really want to get a human compromise, it's not too expensive. So why do we care about side channels? Why do we invest so much energy in mitigating these? Well, side channels do have some unusual properties. One of my favorites is plausible deniability. It's very, very hard for me to go and look at a system, no matter how good of a forensic expert I am, and prove, aha, yes. Someone compromised the system with a side channel, and they didn't just do that, they stole this particular cryptographic key. Historically speaking, when people actually, get a bug in the kernel and take over a system, we actually have good forensics, and we tend to figure out who did it, and why, and what they were after. But we don't get that with side channels, and that makes them really, really alarming. So our side channel threat model here are about secrets, which are very long-lived. These aren't short-lived secrets, right? The ones which are extremely valuable if compromised, where each bit that you compromise out of the secret is incredibly valuable on its own, even if you don't get the rest of the bits. And where deniability or confidentiality of the compromise is itself highly valuable. And the fun thing is this, of course, means that, yes, side channels are perfect for attacking cryptography. But that doesn't address the other concern about mitigating side channels. We're never going to fix all of them, right? They're going to be all of these physical side channels that we can't address. And that's a really reasonable thing. I'm working on software. I work on compilers and programming languages. I can't fix some of the physical side channels. And we needed to come up with the principled way of actually deciding how we address side channels in our software. And we came down on this idea that from the software's perspective, we have to defend against software observable side channels. If it requires hardware to observe it, then we're willing to rely on hardware observation or paying security guards a decent wage, right? There are good ways to address hardware and physical attacks that are outside of the software, but we have to be able to defend against the software observable side channels. Now, one kind of side thing that I want to encourage everyone to think about here, it's really scary to me every time we add new telemetry to our hardware with an API. Because that can transfer these hardware side channels that we don't know how to mitigate into software observable ones. And so I'm really nervous about hardware telemetry APIs. But this is the model we use when we're trying to mitigate side channels. Now with Spectre, this just drastically expanded our side channel risk. And I want to talk about why Spectre is different from all of the other side channels that we've tried to address, okay? Most side channels, it has to do with an actual cryptographic primitive that is operating on a particular secret that you know about. You know that there is a private key in your system or a password and you're doing a cryptographic operation on it. And it's reasonable to look at that code, look at that part of the system and analyze in tremendous detail, can that leak this through a side channel? But that's not how Spectre works. Spectre is right about speculative execution and the fact that all of the side channels you might want exist behind speculative execution and none of the software guarantees you expected exist behind speculative execution. And that means that you don't have a bug in the source code that you go and patch. And it also means you can't just look at those one line, those one or two lines of code where you're manipulating an actual key, you have invisible bugs that you can't see and they're everywhere. So that's kind of a scary thing. And when I say Spectre, I keep saying Spectre. I just mean speculative execution side channel vulnerabilities because I know that there's Spectre but there's Meltdown, there's foreshadow and then we got the MDS ones with Rital and Fallout and ZombieLoad and these are just the ones with the really cute logos and I think we can all agree that the ZombieLoad logo is the winning logo here. Really, really appreciate those logos but we have even more that didn't make it to get logos. We have things called speculative store bypass, bounds check bypass store and net spectre. And that's just in two years and I skipped a bunch. Okay, so this isn't like one particular kind of issue. We are definitely seeing a class of issues and it's very, very extensive. Now how does this actually impact cryptography and cryptographic software? Well, the good news is most of these don't, at least not directly. You don't have to change your cryptographic software to mitigate many of these. You have to change your operating systems and your hardware. And that's actually really nice. It means that those things we get to kind of be a little bit more distant from but I wanna talk about some that very directly impact cryptographic software. The first category of them are fundamentally CPU bucks. So I talk about MDS because that's kind of the official name for it but this includes zombie load and riddle and fallout. MDS is really, really scary to me. This is probably the most dangerous CPU based side channel we have. And we also have branch target injection or the original spectre variant too. Now a lot of people think these are mitigated, right? These are mitigated and they are mitigated for certain parts of your system. But they're very often not mitigated for cryptographic software and that's very, very concerning. Mitigating this for cryptographic software has extra challenges and these are probably the most dangerous. We do hope hardware will fix these eventually but it's not going to happen anytime soon. The other thing I do wanna be very clear about, MDS is an Intel CPU specific set of issues but branch target injection or spectre variant too is not Intel CPU specific. This is very, very viable on a wide range of CPUs. We have reproduced this on a very wide range of CPUs. But while these are CPU bugs and they're going to get fixed eventually, we might want to mitigate them in the short term but we will eventually get rid of them. There's another thing that seems to have just passed by a lot of the industry. People have forgotten about the original spectre bug which is variant one. And spectre variant one is here for decades. It's not here for a year or two until a CPU gets fixed. There is no operating system patch coming. There is absolutely nothing we are ever going to do to today's CPUs that will fix this and I've talked to every single CPU vendor that I can find and none of them have any ideas about how to fix this in the future. And so we have to be prepared for this to be here for the rest of software. And variant one is very frustrating in that regard. So let's do a quick refresher on the parts of variant one that apply to cryptographic software, the things we end up caring about when trying to mitigate this. So I'm going to use some example code that kind of demonstrates spectre variant one. We have some large array of memory that we're going to use as our side channel. When we access it, this is going to leak cache timing information. We also have a bound. If you remember, spectre variant one is often called bounds check bypass. So we have some bound that we're going to use to enforce a bounds check later on and it's somewhere in memory. And unfortunately, it's in memory that we remove from the cache so that it's memory that will be very slow to access. That means when we try and access the bound inside of this bounds check, it will take a long time to read that bound out of memory. Now, we wait for the system to kind of stabilize in some way. We compute some safe offset into a data structure, into a buffer, right? And this is never going to go out of bounds unlike the actual offset this function's called with. Then we compute a local offset. Now this local offset's fun because basically almost always this is the safe offset and the safe offset varies but it's always in bounds. Then every couple of thousand iterations, we go to the user requested offset which is not necessarily in bounds. That's okay, our software does not have a bug in it. We have a correct bounds check implementation. But it doesn't matter because speculative execution sees this, it is rare, it predicts that we are in bounds and we proceed right on to speculative execute the next statement, which loads some memory from an out of bounds offset, uses whatever we loaded to index some other array in memory and we have a cache timing side channel. This is the classic spectra variant one in a nice little example code. They need one other thing here, the attacker needs some code to actually read this value back out of the side channel. And the attacker code here can look something like this where we loop over all the different possible characters in our string. We have to mess with the index so that the prefetcher doesn't make it hard for us to actually measure cache timings. We figure out exactly which entry we're timing and we time a read of this to try and understand was it in the cache or was it not in the cache? And then we need a bunch of heuristics in order to actually decide definitively have we read the data back out of the side channel. Now, if you want to actually see how this works, don't try and just take pictures of my slides. There's a great URL. All this code is extracted from a project called SafeSide. This is a really great project where we're trying to build reliable tests and examples of spectra style vulnerabilities, not just one of them, but all of them. And when I say reliable, I mean ones that we are continuously running across a wide range of hardware to make sure these really do test the underlying primitives. We've had a lot of trouble where people have actually tried to mitigate spectra or tried to mitigate some other kind of vulnerability. We go to test it only to find that mitigation was never applied or was not applied correctly, had a bug in it, was partial instead of complete. We've even just seen this with Riddle where at first it seemed like it was mitigated but it turned out there were deeper problems and it took a second iteration to really get this mitigated. Okay, and so having a kind of test suite we think is gonna really help us improve this, we'd love to have contributions, but this is probably one of the first things that we have to do to really attack spectra for cryptographic softwares. You actually have to understand spectra and despite it being two years old, we still don't have a good understanding of spectra and it's still very, very scary when we run our tests just how few of the mitigations that are purportedly deployed are actually working in practice. Okay, but this is example code. What do actual gadgets inside of cryptographic software look like? Well, I have some bad news for you. Most of them look like that example code because they have nothing to do with the cryptographic software, okay? Because you link your cryptographic library with hundreds or even thousands of other libraries. All of them share the same address space. If they have a 64-bit offset to an array and it doesn't even matter what array, they have a 64-bit address to an array that the attacker controls, they can read any memory in the address space. And so that's the first problem. They could look like anything and they could be anywhere in your application. But there are some really unique gadgets that we have to talk about with cryptographic code and I couldn't not have those in slides. So the very first one, this one terrifies me, you're doing a nice signature, right? You have a fast algorithm that uses lookup tables, you have a slow algorithm that does not use lookup tables. You naturally are worried about side channels and so you have a very nice check for is this a private key? But that does not work because Spectre doesn't care if you have this check. The speculative execution goes right past this check and will speculatively execute the entire leaky algorithm and indeed produce visible side channels, okay? So these predicates don't work the way we think they do in software. And they may not be simple predicates, right? Maybe you have some large switch table of different kinds and you can tell I'm not a cryptographer, I know these are not the actual key types you're using. But like you have different switch tables. And sure sometimes the thing is that you're predicting it's a public key and so you're predicting you go to some public key algorithm but maybe you don't have that. Maybe you just have a single algorithm, right? And it's nice and constant time. It doesn't have any leaking behavior but if you get predicted to the wrong algorithm, like who has analyzed what cache timing attacks exist if I do the RSA key signing algorithm on an elliptic curve key, right? We haven't actually studied that but I can probably cause it to happen in speculative execution. And so these kinds of predicates are very, very scary and things that are intrinsic to cryptographic software because we know there's a side channel on the other side of this edge, right? We know you have the private key right here and you're about to use it. So we need to be thinking about that across the cryptographic libraries themselves. There's one other thing that hasn't really been talked about as widely as it should have and that's NetSpector. How many folks here have read the NetSpector paper? Only a few hands. So you should all read this paper. It's amazing, it's a fabulous paper. I'm not gonna try and repeat all of it. The key thing here is that NetSpector demonstrates exactly what it would take to conduct a specter-based side channel attack remotely. Most of the time we assume that the attacker has some code running on your system. Maybe in JavaScript that's running in your web browser, maybe some other kind of untrusted code on the system. But that's not necessary to pull off a side channel attack. One of the concerns is what if you can get the side channel measurement itself to be formed by a speculative execution gadget? If you can make some part of the program hit the exact same side channel that you encode things into, you can often kind of trigger a remote measurement of this. This is especially true if you can trigger a side channel that's extraordinarily easy to measure and the NetSpector paper proposes my favorite one which is run some AVX512 instructions conditionally in speculative execution. So your side channel here isn't that you access a line of cache or you don't access a line of cache. The side channel is that you branch to a series of instructions that reduce the frequency of the processor. And everything else running on that processor will observe that side channel. Every query you send it will get slower. And now I don't need to run my code on your system to observe the side channel. Your code running on your system will very helpfully observe your side channel for me. Now, network noise makes this hard to read. It's a very low bandwidth channel. But is it low enough bandwidth that we're not worried about leaking a 200 and some bit elliptic curve key? And then we thought about TLS termination. So imagine, purely hypothetically, you run a widely distributed TLS termination service similar to Cloudflare, although I don't particularly work at Cloudflare. But like, you might have thousands and thousands of machines that all have the same key in order to terminate TLS connections. Now, I don't need to be able to have this tremendous bandwidth because I can parallelize each leak of each bit of your key across thousands of servers. It may take me hundreds or even millions of samples to confidently leak a single bit. But if I have access to thousands of your servers, I can leak a lot of bits very, very quickly. As far as we can tell, we can easily overcome any bandwidth limitations that are present due to noise by just paralyzing the leaks across more servers. This is a really, really concerning thing. And the best part is it doesn't look weird. If you're running a distributed TLS termination service like Cloudflare or any of the others, AWS, there are plenty of them. If you're running this, you're expecting to get distributed denial of service attacks all the time. You're expecting random people to send you millions and millions of weirdly malformed packets with no explanation. That's actually a normal course of events. But some of the time, they may be actually leaking cryptographic keys in the timing information. And so for large scale termination of TLS, this is a very, very serious issue. We think that it needs a lot more attention. So there's good news, right? We can mitigate this in source. That's the only good news about Spectre variant one is that there's at least some hope that we can actually write software to mitigate this, surely. So let's look at what it takes to actually mitigate. What are the mitigations we've come up with over two years of staring at this issue? Well, we have elephants. This was recommended by Intel and lots of other companies. You put an elephants after this branch, and that will ensure that the bounds check completes before any side channel. Unfortunately, if you have a profile in elephants, especially after it was made to service a mitigation for this on AMD platforms, the performance hit is extremely significant. Like this is not going to be free at all. And so a lot of people are not satisfied with this mitigation. There's some other approaches that we could take, though. So one of them is if you can compute some mask that will effectively, redundantly, enforce your bounds check, and you can do so in a branchless way, then you can use masking to make sure that even if the bounds check is bypassed, you're still going to stay in bounds. This is probably the lowest cost mitigation that we have, but it tends to be very, very hard to deploy. You need to have the bound available in a form that lets you compute a mask. And it has to be enough to just mask the entries to a particular range, as opposed to completely controlling them or turning off the leak. This will still leak data from the buffer, just not from outside of the buffer, okay? So this has primarily been used in the Linux kernel and in other operating system kernels where it's always going out of bounds in order to chain permission levels and where they happen to be able to arrange for all of the buffers in question to be nicely structured so that masking can be done cheaply. That's a great one, it works, but it just doesn't work very often in practice. So we're trying to come up with something better. We have a lot of work to try and build a programming language extension that will actually tell the compiler that you need to mitigate some predicate in your program. But it turns out to be incredibly hard to specify and actually really hard to implement. We've gone through about three different prototypes over the past couple of years. We have one that almost kind of mostly works but is really slow. It's not that much faster than the elephants, but we're hoping to get something even better and this is something I think you can definitely expect to see work on over the next year or two in programming languages. But we actually have to have the programming language and the compiler has helped to even do this because we have to connect this branch to an offset that's then used in an array index. It requires a lot of infrastructure to make all of this work. And then even if we figure out all of this and we have a wonderful way to mitigate this, which branches do we need to do this to? All of them, plus all of the virtual functions and all of the switches and all of the other dynamic dispatches in your entire program, that's not feasible. That's just completely outside of what you can realistically do. And it's not necessary, you might think. So one of the original things that made people not stress about Spectre variant one is that there weren't that many of these gadgets known, but we've actually looked at large scale software. We've done the static analysis and we have found real variant one gadgets reachable from an IP packet parsing routine. Okay, and so absolutely parsing untrusted data off of the wire and you could thread it all the way through to a read of any address in the address space. And so they do exist, but they are rare and we don't know how to find them very effectively. It took a whole program distributed static analysis system to find the ones that we have and we were at the limit of our ability to do static analysis there. We have larger systems that we need to protect as well. So we came up with an automatic mitigation called speculative load hardening. This is beautiful, it's automatic, it's compiler based. You can get clang right now and you can pass this flag to it and it will harden your code against variant one. But it comes with some caveats as well. It only mitigates the code compiled with the flag. It doesn't support a lot of features so no C++ exceptions, no 32 bit architectures. We've only really tested it with 64 bit architectures and specifically x8664. It's incredibly expensive, but it's the only automatic solution we have which is really important. Now, when I say very expensive, people aren't really sure what I mean. So how expensive is this mitigation? It triggers a 40% reduction in both the latency and QPS for a large and performance critical C++ service that we also have actually deployed this to. And so this isn't just a micro benchmark number. We benchmarked it, it looked like 40%. We've deployed it, we've measured roughly a 40% drop in performance. But this is tremendously better than what we saw using other kinds of comprehensive mitigation techniques. Okay, so what should you actually do right now? So some steps you should take today. You have to update your system. I skipped over all of the operating system, hypervisor and hardware fixes that you need to have for anything I'm talking about to matter because otherwise you're toast, okay? Then you need to go and test that you've actually applied those updates correctly because it turns out that's not trivial and an easy mistake to make. We have a repository of tests. Please send us patches to improve it. Please integrate it into whatever your QA flow is. We need to switch to unconditionally using data invariant programming for secrets. It's too tempting of a target. We're going to get these conditions wrong if we keep having conditions in the software, okay? I really want to see us move to using agents in a separate process for long-lived keys. SSH agent is actually an incredibly significant mitigation for someone trying to steal your private SSH key, okay? GPG agent is actually a mitigation for the same thing. We need to be using these kinds of techniques pervasively in our cryptographic systems whenever we have them. This is perhaps most important if you're doing distributed TLS termination. You need to design some kind of agent system to separate your long-lived keys from your session keys. Otherwise you have a very serious risk here. You need to then run that agent isolated onto a single physical core if you're running on Intel CPUs with untrusted users on your system. Otherwise, MDS is going to make it extraordinarily easy to steal your key data, okay? And this is the only mitigation we have for that one. If you can afford it, please harden your agents with SLH. I'm really sad that there are not, like the widespread operating system distributions have not already done this. We need to get all of these agents hardened with SLH in order to be resistant to side channels like variant one. So really briefly, I also do want to mention some things going forward. We really need to be designing all of the cryptographic protocols with these exact facilities built in on not just by default, but always. You always need to have a separation between an ephemeral key and a long-lived key. You always need to move the long-lived key into some kind of separate and isolatable agent process. And then we have to redesign the protocol so that we can only code against the untrusted inputs using data and variant techniques because those are the things that are most resistant to side channels, okay? In addition to coding against your key data with data and variant, also every untrusted byte going into the separate isolated agent to transform a long-lived key into an ephemeral key needs to be coded against with data and variant techniques. And that requires a different protocol. Then we need all of the implementations of these protocols to make it super easy in the default to deploy them in this structure. And we have to do all of this with higher-level languages because there are going to be more issues. We've only seen two years of this. We're not done. We're going to see more issues. And having compilers available is going to be essential to doing rapid response and kind of mitigating the damage of these kinds of issues. That's something we found again and again at Google. We were only able to respond rapidly to a number of these issues because we were writing our software in a higher-level language. We could deploy the mitigations through the compiler itself. All right, well, thank you. That's my advice. I'm happy to take questions if we have any time. I don't know if we have any time left, though. Not a problem. So unfortunately, we are well over, so there's going to be a break after this. If folks want to come up and talk to Chandler, he'll be here and see you all in 20 minutes or so.