 So, um, my name's, uh, Sam, thanks, uh, uh, my name's Sam, um, I'm from the University of Oxford, um, so coming from sort of a more academic, uh, perspective, then I think I'm getting from a lot of people here. So apologies if I sort of, uh, tell you something that you already know really well, or if I skip over something that you don't know, feel free to stop me at any time with questions. Um, what I want to talk about is efficient and anonymous and updatable credentials, or this thing that we call Allosaur, and I'll explain how we got to this tortured acronym, uh, somewhere through. So, um, this was a project I've been working on with, uh, Hart Montgomery, here at Hyperledger, and Mike Lauder at Coinbase. Um, and in this talk, most of it is going to be very high level, and describing the problems that you can run into with privacy in digital credentials. Um, and centering in on the question that we really focused on of the trade-off between anonymity and efficiency, and then our solution. And this is very high level, not touching on the math, and in fact all presented entirely through references to the Simpsons. Um, so credentials are something that we use all the time. Um, maybe you don't need to tell this crowd, but you know, you go to the lawyer, and your lawyer has some sort of certificate on the wall that says actually that this lawyer has gone to law school, they passed the bar, the law society verifies them. And, uh, but these days we're trying to move this all to computers, so, uh, what sort of happens there. Um, and some examples to keep in your mind through this talk that, uh, we're sort of inspiring what we were thinking about is all different kinds of credentials you might imagine. So, uh, certifying your age, if you want to buy something age-restricted, uh, a license to do something like driving, fishing, hunting, um, practicing different kinds of professions. Um, you could imagine currency as some sort of credential, uh, prescriptions, uh, parking permits, maybe even in the future having some carbon credits. Um, but these all have massive privacy risks that can come up if you're not careful about how you make them digital. And privacy risks that they don't currently generally have today. So if you're using your age to buy things, um, it's hard to sort of build a profile of where you're buying them because, you know, someone looks at your ID at the convenience store and they forget it immediately. But if this goes, if some electronic system needs to, uh, record that, it's much easier for that system to remember what's going on. And then they know what kinds of things you're buying, um, which could be insurance companies, could use that nefariously, uh, that location data is bad. Um, something like a fishing license, you know, we want to know when fish are being harvested but we don't necessarily need to know who is harvesting them. Um, and, uh, sort of professional licenses, the privacy risks, if you're verifying that you're a lawyer, this feels maybe less of a privacy risk. But you could imagine if we had some scenario where we had different credentials for different kinds of medicine and a doctor in a particular neighborhood. All of a sudden, all of a sudden starts verifying their credential for some aspect of medicine. This tells you something about the neighborhood. Um, prescriptions and currency, fairly obvious privacy concerns, parking permits, revealing your location. Um, carbon credits, that one I, I struggle to see a privacy issue but if you think on it, maybe you'll find one. So this is kind of the, the framework to think of here. Um, so a lot of security requirements we need from a credential are simply solved by a digital signature. Um, you can have some certificate that's signed that says I'm a good lawyer and you send it to the client. And generally the client starts to wonder, all of these things, like is it real? Does it really belong to this lawyer? Um, and yeah, a digital signature satisfies most of this. So, uh, we can imagine the law society in this case is some sort of authority. They send their certificate to, uh, or some sort of information to the law society who signs it and returns it back. And now when the lawyer presents it to the client, the client has the law society's public key and can verify, okay, this is a real certificate. And you can jam, you know, any kind of credential here, you can kind of jam as much information as you want into this. Uh, you can add their name, you can add some sort of, uh, date of issue. Um, but the last one ends up being a bit tricky because how do you know if this lawyer has done something that is bad? Um, lots of reasons this lawyer might be disbarred. Uh, and if you're the client, you know, how do you know that this hasn't happened? And your recourse really is you could maybe go to the law society and say, well, I've just got the certificate from this lawyer. Are they still a valid lawyer? And the law society says, no they're not. But now there's no reason to use digital credentials at all. Uh, the lawyer can just tell you I am a lawyer and you can say, well, what about this person? Or do they actually have this license? And the law society just says yes and digital credentials are kind of pointless. Um, and this already has its own privacy issues. The law society thinks as soon as you ask this that this particular person needed a lawyer, which is a privacy risk. Um, so maybe she uses Tor or something like that. Um, but, uh, from the law society's perspective this is also not a great idea because they don't want to sort of have to handle everyone querying all the time all of these different issues. Uh, and their servers go up in smoke. Um, so what we can, what these authorities can do is they can update their public key essentially. And at this point I will start calling it an accumulator. And if you're familiar with accumulators this will feel like a very bizarre way to introduce and define them. But this is because as we worked on this the function got closer and closer to digital signatures. And I think a sensible way to view them is kind of a digital signature that you just update fairly regularly. Um, so here you know the calendar days progress and they update from 1 to 2 to 3 to 4. Um, and to clarify something here, you can think that revocation is handled in a lot of cases now by say expiry dates. Um, so you issue the certificate, you say it expires in 2023. Um, and this works in a lot of cases but let's say that you have the certificate to be valid for a year, then if it's a particular situation where you know a year, you only have a year long guarantee in which they might have misbehaved in some way. And if that's too infrequent then you have a security risk. Say your lawyer could be disbarred and they could continue practicing law for another year until their certificate expires. This is no good. So you can make that interval more frequent but then your honest users have to refresh their credentials quite a bit more frequently. So there's kind of a trade-off there already even just for digital signatures. Um, and other methods of revocation such as sending a list of revoked signatures kind of puts a lot of work on the verifier. In this case the client verifying the lawyer's credentials doesn't want to have to have on their phone a list of all of the disbarred lawyers ever. Um, so you can, the accumulator kind of helps with all this. You update it every time you make some sort of change to who has credentials. Uh, but it creates this update problem. So in this case the lawyer has sent some certificate for the first accumulator, the client has the fourth accumulator, and they think that's three different accumulators in which case this lawyer could have been disbarred. So they say, hey you need to update your license. And so the lawyer has to interact with the law society again and says give me an updated license. Um, and the law society does this and now the lawyer can use their credential. And uh, I think that's Landers, should say Marge Simpson but, here you know there's also a bit of a privacy risk which is if the law society starts to correlate information about who is verifying with these updates. The lawyer example is a bit hard to explain, so I'll switch to a different example. Um, so parking permits, there's an obvious reason that verification should be anonymous because if I start tracking whose credentials are being verified where, I can start to build up not only a pattern of where you are, even if these are say pseudonymous, I can probably figure out who it actually is based on their home. Um, but let's say that your parking permit is anonymized. Um, so you have your car and you park somewhere and it sends just a parking permit and it tells you it's valid but it's you know it's completely untraceable. Uh, the parking meter has no idea who this is. Great. Um, but let's say that the parking meter says actually you need to update you know a bunch of people have been driving badly and their parking permits were revoked for example. So you need to go to the issuer. Um, and let's say the issuer is the police. So you tell the police, um, I need an updated permit. And if you're not careful about this, you kind of need to send some identifying information so that the police know that you actually, what the license is and they can check their database of the licenses that have valid credentials. So they know, okay I need to, this is the person to send the parking permit to. And the car in this case could still anonymously verify to the parking permit. But the parking permit is quite easy to imagine in this case colluding with the police because probably the police have a pretty big say in how they run and they can simply tell them, hey someone is updating a permit at this parking meter at this place at this time. And since the police have a record of someone requesting an update at exactly the same time, they can say well almost certainly this license plate parked at this place at this time. So even though the credentials are anonymous, the update is not anonymous and therefore the person is de-anonymized. Um, and sort of to refresh kind of some privacy sensitive applications where this would be a problem. If you're using restrict purchases and you have to update your credential at the store, um, this creates a problem. If you have say credit ratings, um, the professional licenses, prescriptions, all of the same privacy issues for verification come up in this update. So, uh, this problem exists and it has been addressed. And one idea is you just include a lot of information that allows users to locally update. And now this sort of, you can't really do this with a basic digital signature and this is where you really need accumulators and the mathematical structure there. But, uh, it would sort of work like this. There's just a wad of data and the users can use this to update. So how this would work is the lawyer in this case, uh, contacts anonymously the issuer who sends a new accumulator and some data. The data is not even user-specific so there's no de-anonymization there. And the, uh, lawyer locally updates their credentials and sends it off and everything works great. But let's say that the lawyer is really out of date. So the client comes in with 400 updates to the accumulator. Um, and so they say give me the new accumulator and the law society comes back with a lot of data. And, uh, so this is problematic for communication issues and let's say they can even manage to take all this data, uh, and try to do their local update and it's computationally expensive. Um, so this is this kind of core problem that we looked at. The problem here is that this is actually kind of provable, provably bad. Um, if you want to just broadcast data that is not user-specific that users need to update, you can prove that it needs to be proportional to the number of changes since the last update. And so this is kind of if you're 400 updates out of, 400, uh, changes out of date, it's going to be 100 times more data than if you're 4 changes out of date. Um, and we prove something even a little bit stronger which is an anonymous update more generally has to be equivalent to private information retrieval which is generally hard. Can we save anonymity was our central question. Um, I would say solution number one is do not use digital credentials. Um, and I feel a bit of out of place here saying this but I kind of have to say my piece. Uh, there's lots of cases where you don't need to put the credentials on the internet. The systems we have now with physical credentials are working fine and, uh, don't create these privacy issues. So, uh, lots of times we don't, the privacy issues that you might be creating might be worse for a lot of people than the maybe the convenience, um, benefits that you would get. Um, but this is kind of how I feel saying this. So I'll get into, okay, you have put your credentials on the internet. You have made them digital. Can you do this anonymously? Uh, so we call our solution Allosaur. I thought this was a fun graphic. Uh, I asked Dali Mini, photorealistic Allosaurus holding her digital license to practice law. Uh, so what does this mean? Um, the main thing we want to have is digital credentials with revocations. They're at the end, so, uh, that's kind of what stops us from using a digital signature. We want anonymous credentials, um, and with the, we want the updates to be oblivious. We want them to have, uh, low client side latency and sublinear communications. We want to break the communication lower bound, which we do by simply requiring interaction. How, how do we imagine doing this? Um, basically you have multiple third parties run the accumulator. So, uh, sort of a standard kind of blockchain scenario, some sort of multi-party computation. Um, you have found some Simpsons characters, a business person who's just in it because they're being paid to run this service. Uh, some privacy nerds who are, want to make sure the system is private, a scientist with some spare server time, organized crime, um, and non-disorganized crime. Uh, so suppose there's some sort of request to change the accumulator to remove Homer Simpson. He is not a lawyer, should never have been given a license to practice law. Uh, the parties do some fairly simple multi-party computation and they output a new accumulator. Um, so this handles sort of how the accumulator can be maintained. How do users do this anonymous update that was our motivating issue? Um, they just kind of split their identity among these different servers. So, some Shamir's secret sharing. And the servers do some local computations. Kind of the hard work of the computation is done with the servers which we imagine have a lot more computational power than the user's device. And, uh, they return pieces of data back to the user who can reconstruct them into an updated signature. Uh, updated credential more generally. Um, so what are the nice features about this? Um, so the secret sharing has this property that, uh, not everyone needs to be useful or honest. So if the criminals even collude and they try to figure out from their sharing their, uh, their portions of the data who it is, it's not enough information to break it. If they decide they don't want to return any data at all and only the other honest servers respond. Um, in this case the lawyer can still reconstruct their identity. So, uh, get a bit more availability and anonymity from the, um, multi-server case. Um, and something that we sort of ran into, uh, that is kind of a theoretical contribution that applies to our work and to any of these, um, anonymous updates is there's sort of a subtle anonymity loss from the very concept of doing an update even if it is anonymous. So here I've colored them all in blue because at this point the servers don't know, uh, who these updates are coming from. They don't know what, our system works, everything is anonymous. But if you look at the middle two, uh, if you're updating from accumulator number two to number four, and then someone else updates from number four to number six, that might be the same person because we knew that, uh, the first person had an accumulator only up to number four. So we can imagine some malicious server starting to build these pseudonyms and pseudonymous profiles of these updates. So then maybe they take this first one, say it's a second user because it doesn't quite match up, and they start kind of building these profiles, um, seven to twelve, they've made this little, uh, chain there. User three did this update, user one probably did this one. Now this last one, uh, they can't actually completely de-anonymize because both user two and user three updated to the 12th accumulator. And so then they're sort of, uh, stuck. And so this kind of highlights something that needs to happen which is if you're having these frequent updates and the users are not, uh, updating kind of in sync, then they, you can kind of get this fine grained profile of when their updates are. So our recommendation is basically, scheduled updates are critical. So here these two different lawyers need to be updating, um, at the same sort of chain of events. Uh, so now suppose that some client comes in and the accumulator is only at number four, the lawyers are waiting to number ten before they update. And now this doesn't seem much better than, uh, expiries, certificates with expiry, because, uh, you know, we've created this long sort of scale between updates again. But what the, the lawyers can actually do is do something ephemeral. They can contact the servers, do their anonymous update and get a credential with number four, um, and use that. And now this is sort of out of sync with the, the schedule of the synchronized updates. So it could be, uh, de-anonymized. So the lawyer, uh, throws it away. And they just wait for the next one to, uh, to do their actual update. And so if it is ephemeral, then no one can sort of correlate that with anything else. Um, so getting a little bit more technical if you're familiar with multi-party computation at all, you end up with a lot of overheads. What we notice is that accumulators give almost all of them for free, because the mathematical structure of it ensures that kind of, uh, if let's say this person, uh, you know, we want to produce this update, um, this person here, you know, these red arrows indicate that they're sending bad data, um, you can quickly check whether this is a valid update and say no, and then the parties can start to, uh, look at the communications they had, start to break open the, the accumulator and figure out, um, that, uh, it was this person who misbehaved, and they can be removed from the set of servers that are involved. Um, so this is something that we get just from being an accumulator, uh, which is nice. Um, the actual technical details, so this is derived mainly from, uh, Nguyen's 2004 pairing-based accumulator. Um, these three other works, basically one of them has these update polynomials that we rely on heavily. Basically we just evaluate polynomials, uh, with Shamir's secret sharing, and that's how this is done for the update. Um, use it, we, uh, use a slightly different security assumption that we approve, uh, some security notions for that also helps, and then sort of build on multi-party, um, uh, versions of accumulators. Um, we also will have this extensive security proof. Um, and also for the user update, we can save on a lot of MPC overhead because it only is using these public update polynomials, and so actually kind of the user can act as a trusted third party, and this ends up still being secure because the servers only ever interact, they only ever use public data to do the update. So even if the user does something malicious, they don't get anything worse than data that could have been public anyway, they're just doing it to save some computation. Um, and, uh, why not Merkle trees? Um, revocations kind of increase the public key size. It's hard to do revocations in a Merkle tree without, you know, sending a list to the user. Um, different varieties of sparse Merkle trees. Um, if you have revoked credentials of sparse Merkle trees, you also require update data, um, proportionals of a number of revocations. Basically just, uh, nice, efficient updates. Don't seem possible. Digital signatures just on their own, um, almost work. So you could change the signature, essentially, with each revocation, and then everyone just gets a new signature. And the problem is, how do you do this anonymously? So you could even have the server, you know, it knows all the valid users to re-issue the signature for everyone and wait for them to request it. How do they do that anonymously? Basically all you could really do is private information retrieval. But this creates a server workload where this anonymous user says, hey, give me this credential. Even if the server can do this anonymously, it requires a lot of server computation. And if it has to, it ends up being proportional to the total number of users. If it has to do that for every user who requests an update, so it ends up being like, uh, potentially trillions of operations, sort of every change, which is untenable. Um, so hopefully I convinced you that, uh, anonymity and efficiency are a hard problem to compromise on. Um, protocol provides more efficiency. Um, the main, how it accomplishes this is basically multi-party computation. Um, I was hoping to be able to link to a prefront, and link to a proof of concept, but we have this massive UC-style security proof, which is over a hundred pages, and it turns out this is a bigger task to finish than, uh, uh, I had originally thought, so coming soon is all I can say. So, uh, thanks everyone. I'm happy to take questions. Information. Mm-hmm. Yeah. Do you already have an idea of how efficiency operations are going to take you to update? Um, no. So the basic, the thing is it's so, what we end up doing is so similar to what was done before. You can basically, like it's, it's the same elliptic curve operations, the same finite field operations. Uh, and, uh, sort of running the numbers should be about a factor of seven fewer operations for the user. If say you doing a thousand, you know, updating over a thousand updates and something like a factor of 16 if you're doing more. Um, we might save even some extra because of our change in security proof, but, um, no actual performance numbers. Um, you could, um, and I suppose it depends on, like, where this would be useful is somewhere where you're having, uh, kind of a high flux of credentials, like credentials are frequently being issued and they're frequently being revoked. Um, so, and something where, uh, verifiers don't want to have to keep a list of all the revoked credentials. This is kind of the thing we're trying to avoid. Uh, so my sense on this is that with, you know, internet, uh, certificates, this is not the case that, uh, they're being revoked and not frequently enough for this to be a computational problem. So we're, we are imagining generally more exotic scenarios than that. Sorry? Oh, shit. No more questions? Oh, sorry. When do you think your approval process will be available? Uh, hopefully by early next month. Um, that's a good question. Probably it'll end up on either, uh, Mike Lauder's GitHub or my GitHub.