 Hello there. Can we all start taking our, oh god it's a bit loud, can you start taking your seats please? A few sort of announcements sort of things to start with. The third talk today has been switched. If you've got an old version of the program, if you've got the new version of the program because you're all online then it's exactly like it says online. If you are a speaker and you haven't put your slides with the AV people and you've used your own laptop, if you email them to me I will put them eventually on the IACL website for posterity if that's what you want. Apart from that I think we are all almost ready. So we will introduce the first speaker. This is a session on MPC which is everybody knows is the most exciting area of crypto at the moment and so that's why I'm introducing it and so Marcel is going to tell us just how exciting it is. Thank you. So yeah my name is Marcella and I am talking about some work I did looking at software tools that implement secure multi-party computation but as Nigel said first I need to convince all of you real-world cryptographers that MPC is a technology that you should pay attention to. So just for a little background MPC is a tool that allows a group of mutually distrustful parties to compute a function on their joint inputs without revealing anything beyond the output of the computation and this includes all the input data. So it's basically like having a trusted third party where you send all your data to them and they compute something for you except you don't have to trust anybody. So this is more interesting with an example. We'll start with the classic example which is the data sugar beet auction. So in Denmark all the sugar beet farmers sell their wares to a single government buyer every year but so every year they submit their bids for how much they want to sell for and then you compute a market clearing price and then they sell for that much. But when you're selling to the same buyer every year for many years at a time you might start to reveal information about your business or your farm that you don't want to be revealing. So in 2008 they used MPC in collaboration with some researchers they used MPC to compute this market clearing price while keeping all their bids private. So this is cool but it's relatively old news it's almost 15 years ago now. So there's a ton of other exciting stuff that's happening with MPC that's a little more recent. So in Estonia there was some concerns about tax fraud and so there was proposed legislation that said required all companies to report their transactions above a certain threshold so that they could be so we could make sure that they're paying the appropriate amount of taxes. This was vetoed because it's a huge burden and it's also it's not very private it reveals a ton of information about what companies are doing. However in the meantime a group of researchers proposed produced a proof-of-concept implementation that that takes everyone's transactions privately, computes a risk score for them and then reports that risk score to the Estonian tax and customs board if it's above a certain threshold. So then the customs board can just audit specific companies that look like they might be doing something fraudulent without having to look at everybody's transactions. The Zcash company wanted to produce new random parameters for their cryptocurrency but they didn't want all of their users to have to rely on a single entity to produce that randomness correctly. So they used MPC to collect randomness from over 90 different parties and then produce the parameters which were verifiably used all that randomness in it so it expanded the trust base there. And then in Boston the Boston Women's Workforce Council was studying the wage gap so they were able to convince over a hundred companies to secret share their data, financial data about employees and their incomes so that they could study the wage gap from a huge data set of companies. And then of course we have a bunch of other applications as MPC becomes more feasible at larger scales. We have both government funded and private companies that are working on MPC applications for statistics, for digital payments, for key management and for lots of other settings. So long story short there's a lot happening in a practical MPC land right now and we're very excited about it. However traditionally MPC applications required a team of expert cryptographers to build a custom MPC engine that's tailored to that specific use case. And if we want to see wider spread adoption of MPC in practice we're going to need better general purpose tools that can be used in a wide variety of settings in a pretty efficient way. A little bit of background MPC was first introduced in the 1980s but it was assumed to be too inefficient for practical use until 2004 when the first general purpose framework came out. So this could take any circuit as input and then compute MPC, execute MPC on it. And that started the rapid wave of development, performance improvements both algorithmically and in the implementations rapidly advanced the state of the art and we've seen huge improvements in MPC tools since then. Now just to give you a better idea of what I'm talking about in this work we looked at end to end general purpose frameworks. So general purpose just means that it can execute any computation that you give it. End to end means so in most MPC algorithms in theory they operate on a very limited set of primitives like addition and multiplication mod of prime. And if you are a developer and you're trying to express some complex and interesting functionality you probably need a broader set of primitives than just addition and multiplication. So we looked at frameworks that have a compiler which takes some high-level function description and translates it down to a representation that the runtime phase and the algorithm can understand. Then the runtime takes this compiler output and is executed by all the parties simultaneously and they provide their functions, execute MPC and produce an output. So this is what we're looking at for the rest of this talk. We wanted to ask a bunch of questions about what was currently available. Like who are the frameworks designed for? Are they for other expert cryptographers or can they be used by any developer? What kind of cryptographic settings do they use? Are they flexible? Can they have many parties? What kind of security levels do they have? And are they suitable in general for use in larger scale applications? In order to do this we did a sort we produced a survey paper in 2019. So we looked at nine end-to-end frameworks plus two circuit compilers that don't have a runtime phase. We looked at features of the protocol. So what protocol do they execute? How many parties does it support? What are features of the high-level language? And what can it express? What kind of data types does it represent? And other details of the implementation. And then we evaluated them on a set of usability criteria. In order to do all this surveying we really wanted to record what the frameworks themselves could do and not just what the paper said they could do. So we also produced an open source framework repository. We wrote three sample programs in every framework. We produced Docker instances which have complete build environments for every framework that we looked at. And it compiles our sample programs and has instructions for how to run them. And in this repository we also have extra documentation based on our experiences compiling and using the tools. So I encourage you to check that out if you're interested during this talk. Overall we found that most frameworks are in good shape. They have a very diverse set of threat models and protocols. They're suitable for a wide variety of applications. The high-level languages are fairly expressive. Our sample programs were simple but we were able to implement them in almost every framework. And for the most part they're accessible. They're open source. And they're compilable. You can use them. However we did find two major areas of limitation. Room for improvement so to speak. One is that there were major engineering limitations to these. Part of that is just the fact that they were mostly written by academic groups and therefore are subject to the engineering constraints of such a group. But there were some issues that we found here. And then we found major barriers to usability which is mostly rooted in a complete lack of documentation. So, oh I forgot to mention, building this repository took us about 750 person hours to produce. So we're hopeful that it will make it easier for other people to use tools in the future. So I want to start off by giving you a better picture of the frameworks that we looked at. So these are the nine end to end frameworks. So if you're considering using something like this, a good first question is what protocol does it implement? Different protocols are suitable for different needs. So that's a good broad sweep when you're trying to choose a framework. So I want to explain, we define three protocol families. I want to just explain what they are. So the first one is called garbled circuit protocols. Garbled circuits were first introduced in the 1980s by Andrew Yao. They've been under continuous development since then. Academic theorists have produced an incredible variety of garbled circuit style protocols for a huge variety of settings. But in practice we found that the implementations typically represent functions as Boolean circuits. They operate in the semi-anus model and they operate for two parties with a couple exceptions now. So here typically the two parties, the first one is the garbler. They encrypt the circuit. They pass it to the evaluator who decrypts it and produces the output. And these have a constant number of rounds of communication here, which is really nice. The second style is what we call multi-party circuit-based protocols. This encompasses a huge range of protocols. You can see some of them at the top here. They all have two things in common. The first is that they represent the circuit function as a circuit, either Boolean or arithmetic. And then they represent data as linear secret shares. So this means that they're suitable for an arbitrary number of parties. The parties work through the circuit gate by gate and compute, convert secret shares of the input to secret shares of the output. Within this model there's a variety of different threat models and protocol types. You can do this in the information theoretic setting or cryptographically secure. But we encompass all of these under a single umbrella. And then the number of rounds and the volume of communication is based on the number of multiplication gates. Because in multiplication gates, you actually have to talk to the other parties. So when we first started the survey, we expected that these two protocol families would encompass the majority of the frameworks that we saw. Because in theory land, that's kind of what they all look like. And then of course both these framework types have drawbacks. And so in order to avoid some of those drawbacks, we expected to see something like ABY, which sits in the middle here. It means they implement multiple protocols. And if you want to, depending on what your application is, you can switch between the protocols for different operations in your application and sort of get the best of both worlds. However, what we found in practice is that there was an entirely different set of protocols, which we call hybrid protocols. So in this model, we deviate away from the strict circuit model, where you only have two types of gates. In MPC, you can define optimized sub protocols that don't reduce all the way down to, say, addition and multiplication gates. You can sort of do it at a slightly higher level and get something more optimized. So for protocols, we found many frameworks that implement some kind of hybrid protocol, in addition to these baseline primitives and sort of add it as a new gate in their circuit. Typically this was for common functionalities like doing bitwise operations in an arithmetic model or doing matrix operations. These are characterized by having a seamless front end experience. So there's still a one-to-one mapping from operations to protocols under the hood. But when you write it at a high level, when you write your function at a high level, you can't tell. You don't have to explicitly choose which protocol you want to use. So again, this is the survey that we did. We first submitted this for publication in August of 2018. So that's 18 months ago. In the time since then, there have been seven new protocols or seven new frameworks that have come out. So the biggest thing that you can see here that has happened in the past year is that there are a lot more frameworks in the overlapping areas. So this is really exciting, I think. You can see things like MP speeds and Fresco, which both offer a single front end with a bunch of different pluggable back ends. So you can use different protocols there. And you can see ScaleMamba also added a garbled circuit support to their framework. You have things like HiCC and ABY3, which expend the ABY model of switching between multiple protocols within a single execution using new techniques like automatic protocol selection or having three-party support. And then EasyPC and GIF in the hybrid model are both somewhat designed for increased usability, so making it easier for non-expert cryptographers to write really efficient front end, or really efficient implementations. So this is cool. This is what's been happening in the last year. Next thing I want to talk about is some of the decisions that framework designers have to make when they're creating a software tool of this size. So these are huge software products. And every decision that you make as you're building up your software tool is going to affect the ultimate framework that you have and how it's used, how expressive it is. This can be everything from your architecture decisions, like how are you going to structure your system, how do you represent data to the circuit model? The circuit model has certain quirks, which you have to decide how much you're going to represent that in the high-level language. And then other concerns about language abstraction, like how much are you going to abstract away from the fact that there's cryptography happening under the hood? I want to go through a couple of case studies so you can see some of the differences between the frameworks. So the first thing I want to talk about is data independent construction. So NPC uses a circuit model, so every function that you want to implement is described as a circuit. This is a data independent representation, so every branch on private data is flattened and you have to execute every branch in order to select one of them. This makes sense if you understand how a circuit works, but for non-expert users you might not recognize that there's a difference in the way that branching programs are handled. So for example, if you're accessing an array, typically this is a constant time operation in a language like C, but if you're accessing an array at a private index, you have to touch every element of the array so that you don't reveal which one you choose. So this kind of performance disparity can affect the overall runtime of your application. So there are a couple of ways that you can try and express this. So here we have sort of two camps. Obliv C is a garbled circuit framework. It implements an extension of C. The frontend language is an extension of C, so you can see it's extended with this oblivious keyword. And here you say we have a result and then we're comparing private data A and B. If A is larger then we do a multiplication. If it's not, we just take one of the values. So this is a pretty traditional looking thing. If you know C, it looks a lot like C. You can tell there's something going on with the if statement because it has the oblivious keyword in it, but it's not necessarily obvious that you're executing both branches here. EMP toolkit takes the other approach with explicit branch selection. So in this framework you compute your conditional, so is A greater than or equal to B, and then you select, if A is bigger you select the multiplication, otherwise you select the other option. And this makes it a little more explicit that you have to compute both branches. You sort of have to do it ahead of time in order to call this select statement. It's not necessarily, I don't know that one of these is necessarily better than the other, however, I would suggest that data independence is a really valuable paradigm for developers to understand. Of course there are a lot of side channel researchers here who understand that branching behavior on private data is dangerous in a lot of settings. So I would suggest that maybe fostering a greater understanding of data independent programming paradigms would be a good thing for the community. The next decision I want to talk about is the cryptographic abstraction level. So this is looking at things like how much control should the user have over the cryptography that's happening under the hood. For this example we're going to look at one of our sample programs which is the inner product operation. It takes the pairwise, the sum of the pairwise product of two vectors. So forget it's a circuit compiler. It has a custom language but it looks a lot like C. So you can see here we start our result at zero, then we loop over our arrays for each element in our private arrays A and B. We multiply the values together and we add them to our sum. So this is a perfectly good inner product implementation. This produces a Boolean circuit. However if you recall back to our multi-party circuit based in linear secret sharing based protocols, you'll remember that the communication, the amount of communication in our protocol is proportional to the multiplicative depth. And here all the multiplications are independent of each other. So what we would like for efficiency is to do all those multiplications in parallel first and then do all our additions afterwards. So maybe you don't want to think about that though. In that case you might be interested in a tool like Pico. Pico implements a hybrid protocol and one of their custom sub-protocols is the inner product operation. So you can see they have this custom infix operator. You just drop it in. You can trust that the Pico developers know a lot about cryptography and have written a super efficient implementation and you don't have to think about it at all. So that's really nice. But maybe you are a cryptographer and maybe you're implementing something more complicated than the inner product and you want to have a lot more control over exactly what gates are going into your circuit. In that case you might want to use something like ABY. ABY is implemented as a library in C. So you can see we have our private share type and this gives you super fine-grained operation or control over the gates that are in your circuit. So we have our shares A and B. You can see we first put a multiplication gate down. This actually automatically supports that parallelization using SIMD gates. So you put a multiplication gate down. You get a bunch of parallel multiplication and then you say, okay, split apart these arrays and now add them up one at a time and output our result. So this gives you a lot more control over what's happening but control can be dangerous. So you have to really know what you're doing. So this is an example of the breadth of control that you can have over the cryptographic abstraction level. Next, I want to talk about some of the limitations that we saw in these tools. These are major software engineering projects and as I mentioned, they are for the most part subject to the constraints of the academic setting. However, they were very hard to use in a lot of cases. So one example is that they had incredibly complicated build systems. We had to do a ton of stuff just to get them running. For example, in some cases you have to set up your own custom certificate authority or public key infrastructure so that you can have encrypted channels between your parties. You have to compile specific open SSL versions from source which takes a really long time. They mostly don't have dependency lists so you compile the tool until it breaks and then you have to figure out what package you need to download in order to get it to keep going again and then start the compile again. Overall, we estimate that it took about one to two weeks per framework just to compile them and run the existing examples. This is an obscene amount of work. Luckily, no one will ever have to do it again because it's all in our repository online, but that was really disappointing. And then of course, there's a lot of software here. So in addition to implementing cryptographic protocols, which is notoriously difficult to do correctly, there are a ton of supporting systems that go into making an MPC framework work. You have to implement distributed communication between multiple parties. You have to interface with other systems. So for example, if you write a domain specific language, you have to have a way for it to get input from and return output to a general purpose language. And all of these are just sort of, they feel like engineering problems, but they're necessary for this kind of tool to work. The other issue that we found is that there was a major lack of documentation. So we defined five types of documentation that we looked for. Half the frameworks had no more than three of these. I don't really think I need to lecture you guys about how not having documentation makes things really hard to use. Instead, I'd like to give a few illustrative examples that show the kind of frustrations that we ran into when trying to use these tools. So they're going to focus mostly on language documentation, which is anything that says, explains how the high level language works. So CBFC GC is a circuit compiler. It compiles a subset of ANSI C. So there's a lot of documentation about how the C language works online, so you think they probably don't need to have a lot included in this framework. So I wrote this simple program. It takes two private integers called Alice and Bob and multiplies them together, returns the result. And when I compile this program, I get an error. It says, did you forget to return a value? No, I returned it right there. So in fact, CBFC GC requires all inputs to the program to be in a variable name that's prefaced with a word input. So you fix that and then it compiles. It's not a huge deal, but it would be easier if it was written down. Oblivium is an end-to-end framework. It compiles a Java-like language. So again, we write a very simple main function. It takes two secure integers, multiplies them together, returns the result. And now we get a parsing error in the first line. It turns out that Alice and Bob are reserved keywords in Oblivium's language, so you can't use them as variable names. Okay, that's fine. Wisteria is a framework that was developed in conjunction with programming languages people. So they used a functional paradigm for their high-level language. And they recognize that a lot of people are not as familiar with the functional paradigm as they might with an imperative. So they're an extensive tutorial that explains different features of the language, how the parser works, and how these, you know, how things fit together. And so I am an average developer, so I copied the code from the tutorial and pasted it in a file and tried to run it. And there was a parsing error on the first line. So I fixed that and I got another one. And I just highlighted here everything that I had to add in order to make the tutorial code run. In this case, the issue is that the language docs explain the ideal version of the language, but in practice, the parser had some engineering limitations that weren't accurately reflected there. And it made it much harder to write interesting code in Wisteria. And then EMP Toolkit is a framework that I've actually used for a lot of my own personal projects, but on average I've counted about one comment per 600 lines of code and there's no additional supporting documentation. And that's not enough for a framework of this size. So that being said, there are lots of frameworks that do have really good documentation. I just like to thank them here. These are big software projects and it's really useful to have excellent documentation. I do have two recommendations for maintainers of frameworks like this. The first is that having multiple types of documentation drastically increases usability. This could be something like one document that explains the architecture of your framework and then a commented sample file that describes features of your language and how to use it. The second is that online resources are a super sustainable way to maintain documentation. This can be like a Google group or a GitHub issue tracker. It produces sort of a living FAQ and it prevents the maintainer from having to repeatedly answer the same questions via private email because that's all on the internet. I also think that in general these usability issues are not fundamental to the field. The IRPA Hector program is funding the next generation of MPC frameworks and they have specific usability criteria included in the program. A lot of recent frameworks claim to focus on usability and specifically trying to aim towards developers that don't have a lot of cryptographic expertise. I haven't necessarily used all these frameworks and added them to the repo yet but I'm excited that newer frameworks are explicitly trying to move in this direction. And then just to conclude there with a couple future directions that I see for MPC frameworks. We have definitely seen a lot of continued support for multiple settings and extended support for having multiple protocols within a single framework. I think this is great and it's making these tools more flexible and suitable for a wider variety of applications. I also think it would be cool to see frameworks extended for multiple threat models. Mostly they live within a single type of threat model per framework so it would be cool to see this. A wider scope there. It would be great to see better integration of work with other disciplines. For example hardware people have been generating super optimized circuits for much longer than MPC people have been but we haven't necessarily taken advantage of those tools. Tiny Garble does try to use sort of heavy duty circuit compiler but we weren't actually able to compile any custom programs in Tiny Garble. And then programming languages people also have been working on compilers for much longer than we have. So it would be cool to forge stronger bonds with them so that we can try and have better more formal guarantees about how the compilers work. And then just a note about the repository. I'm continuing to maintain this. As I mentioned we have a lot of new frameworks coming out. I do accept pull requests if anyone maintains the framework and would like to add it to the repository but I do hope that it'll be useful for other researchers, academics and also maybe people in industry who want to use MPC as a part of your projects and want it to be as streamlined an experience as possible. So thank you and I can take questions now. Do we have any questions? Okay so what was the best? It depends on your use case of course. In general I would recommend using a framework that is actively maintained and we do have we do note the ones that we know are being maintained online because having someone available to answer questions when you have them is really important. Thanks a lot for this talk. So you talked a lot about how there's a real usability problem across all multi-party computation frameworks and in a lot of academic products across cryptography this tends to be a serious problem like with formal verification a lot of the tools also suffer from a serious lack of documentation and usability and it's hard to compile. It turns out it's really hard to incentivize academics because they're just interested in getting new results. You know you publish a new theoretical representation a new result. So my question to you is how can you incentivize academics to focus more on usability? How can we make this a more attractive target for people who are interested in publishing? That's a good question. I think that well one in addition to publishing papers you want citations on them so I think one calling attention to usability issues I mean I'm not really here to name and shame but like I'm here to raise awareness of this issue that I found so just mentioning that it's a problem I think we'll encourage people to be better about it and then academic MPC frameworks are on the cutting edge of MPC frameworks in general. Well there are some private companies doing this most of the general purpose tools have come out of academia and so I think maybe you can bill it as increasing your citation count but just suggesting that like making it more clear that increasing usability does increase adoption of MPC and produces more MPC related academic problems for us in general is a way to do this. Thanks a lot for this talk it seems like it was a lot of work to look through all these frameworks. Did you get a sense that any of these frameworks were a good fit in terms of like code quality like security wise do you think like if I had some sensitive data should I actually use any of this code to run an actual MPC or will like That's a good question. Many of them you should not use for real sensitive data. You should make sure that like if you're going to you should do the baseline checks like does this framework have tests? Some of them are at a higher level of code quality than others but if you have real sensitive data I would recommend going to one of the private companies that has paid engineers working on them so this could be like ShareMind and Estonia does a lot of work with sort of contracting out their framework and we also have more private companies that are doing these services. Okay quick question that may be a little cynical isn't the compiler a trusted third party? Arguably yes. Okay. This is a very interesting talk. One thing is this isn't unique to academic crypto projects I think we all know which one I'm referring to. The other thing is could modularization help if you're interested in say coming up with a better circuit compiler or a better front-end language then seems like a lot of these projects have to then worry about back-end things protocol choices and then likewise people who want really care about like optimizing garbled circuits all of a sudden they have to worry about the front-end is there some way we could separate those two and so get teams to work together and not have to reinvent the wheel constantly? Yeah so we had originally considered recommending for example having a single circuit representation that everybody could use. There's already a lot of disparity in the intermediate representations and in the architectures and such that makes it a little harder to have like of course the way I talk about it's like we have a front-end and then a circuit and then a back-end but the reality of the architecture is a little bit messier than that. That being said I do think having a little more standardization would be great for example we weren't able to integrate any of the circuit compilers we used with any of the end-to-end frameworks and I was a little disappointing because the circuit compilers have tried to do more optimization. Fresco I know has been trying to present itself as a sort of API for multiple levels and they have multiple back-ends that you can use in a circuit compiler. So it would be interesting to see if we see more collaboration between frameworks to try and encourage this kind of modular optimization. Sorry it's not a question I just want to disagree that the compiler is a trusted third party anybody can review the compiler it's transparent you could have a third party certification that this compiler works and it's signed and then as long as everyone runs that and everyone can verify they're running that compiler you don't have to trust anybody. It's my fault, sorry. That's good. I had a question but instead I will answer Yuda from I changed my question so I will also comment that actually when it is when a protocol is deployed between two entities my experience even if an entity has the compiler and the code from let's say the other entity or whoever the tendency is to program it and put it on your platform and check the platform from the hardware up to the compiler and rely on end to end that your data is secure as long as your contribution from hardware to to software to communication to anything is secure and don't rely on the other party that's the meaning of multi-party computation. Thank you. I think my question relates to that one actually and I mean it came up yesterday in the e-voting system but you know how do you increase trust in the voting systems and one way is you know the the government publishes the code but of course there's no guarantee that the code they publish is what's actually running on their systems and I wondered if MPC has the same issue as well. Yeah so if you have not collaborated with the other or if you don't trust the other party to be running the same code then you could have an issue. There are several frameworks that operate in the malicious model and are secure against badly behaving other parties but yeah I think that issue does exist in general. Okay thank you. One last question. I noticed that you mentioned they are the challenges setting up a PKI for these projects. Do they use let's encrypt or do they use OpenSS cell to create their own specific PKI which is really challenging. More OpenSS cell, yes. I have a couple of new drafts on how to do that. Okay. Share with you later. Thank you. Okay, let's thank Michelle again. Cheers.