 All right, we'll get it started the first stock of the session it's gonna be given by Felix from technical University of Munich and We'll just go ahead. Hey, thanks Okay, hello everyone so This survey ended up in my Twitter quite recently and it nicely describes the problem. I want to address in this talk So it says I'm a programmer looking for a solution on stack overflow to paste into my project So as you can see it received a lot of love from the community lots of likes and lots of retreats and Indeed stack overflow is the most popular question and answer website for programmers It relies on community moderation to bubble up the best answer and weed out unapply advice to any programming question so most of the time answers come as a code snippet and That makes it incredibly easy to just copy that code straight into your software and think no more of it Apparently as the survey indicates this is common behavior and part of most developers workflows. No it no days So as this speaks for the high usability and utility of stagger-wall-flow it unfortunately comes with a major risk for application security So and that's when it comes to security related questions and usability issues around cryptographic APIs So this stagger-wall-flow question for example shows one of the biggest issues in Android How can you accept a certificate? During TLS handshake that is not part of the default Android trust store So this question showed showed up quite a lot and we observed that millions of developers have looked it up So it seems that it's a super important use case for Android developers But the crypto API just didn't support it and popular answer Unstagger-wall-flow for this question were unsafe workarounds Where the crypto API was simply overridden? They contain this null verifier here or some kind of variation of it That renders the TLS handshake vulnerable to man in the middle attacks So here certificate verification is simply turned off However, it technically solved the initial problem as it now accepts any certificate And Developers seem to be super happy with it. They got rid of their pesky certificate error And that's probably why they happily upvoted those answers until they became the most popular and accepted answer on stagger-wall-flow So in our Auckland paper in 2017 We've shown that those kind of code snippets that were insecure due to crypto misuse Were indeed reused in almost 200,000 Android applications available on Google Play And those included high-profile apps with an install base of over five billion users and Apps from security sensitive categories like business finance and health and social media Another paper from one of our co-authors demonstrated how to attack apps and steal credentials credit card numbers and other private data Based on these insecure code snippets from stagger-wall-flow in our ICs e-paper last year We've shown that this that stack overflows content indicators Such as the community given score and the view counts for of given answers All point towards the wrong direction security wise and therefore inadvertently promoted crypto misuse So usability issues with crypto APIs lead to vulnerable code Which additionally gets promoted and distributed by an Alexa top 50 website almost all developers use to get help So this to tackle this huge problem Different forms of security advice have been improved tested and compared with stagger-wall-flow for instance books and formula documentation static code analysis tools and Simplified cryptographic APIs that were specifically designed Keeping usability in mind So even though all of these approaches helped in improving code security Developers really struggled with getting a running code out of it They were less productive under given time constraints than developers that were allowed to use stagger-wall-flow a Very surprising and disappointing example were the simplified cryptographic APIs as they perform the worst in terms of productivity Some developers even had to look up the source code of the API to figure out what it actually does And that's the complete opposite you want to achieve with an interface They were oversimplified and therefore only supported a very small range of use cases So the most important thing for us to learn from the studies was that whenever developers encountered a usability issue with one of these approaches They turned to the web and went for code shopping on Stack Overflow once again So they went back to their default behavior So when that goes in line with a famous quote of Richard Taylor the founder of the nudge theory Which is a concept from behavioral science and economics So when it says first never underestimate the power of inertia So again as we've seen when that whenever developers encounter usability issues with crypto APIs, they ask Stack Overflow for help So when the call continues with second that power can be harnessed So don't even try to change the default behavior as it's too powerful But rather try to harness it in a way that it improves the outcome So and that inspired us for our main idea So let's try to harness code shopping on Stack Overflow to help developers get cartography right Okay, but well how to do that So one of our most important findings to answer this question was that on Stack Overflow Similar and secure code examples are available for almost all of the insecure code snippets So for any insecure code example, there's a pretty high chance to get an alternative code example that practically does the same thing But in a secure way So in the end we see getting cryptography right as a decision-making problem and That's where the nudge theory comes into play So the basic idea is to nudge people towards better decisions without restricting their options or requiring them to change their incentives So you rather change the choice architecture to steer people into a particular direction Here on the in the example on the right people have two options for getting upstairs However, using the stairs is the better option in terms of health But people prefer to use the escalator But with a new choice architecture where the stairs now look like a piano keyboard that makes sounds when you walk upstairs People tend to favor this option. So and that's the key aspect of the nudge theory It does not try to restrict or require to change incentives It reads it redesigns the choice architecture in a way that identified behavior leads to better outcomes So our goal was to design a new choice architecture on Stack Overflow And it nudges people towards reusing code examples that provide secure and strong cryptography And it must not interfere with the usability and the utility of Stack Overflow Such that developers can keep their high productivity level using the website and are not drawn away from it So but to be able to do that, we first had to find these better alternatives on Stack Overflow So we had to solve three technical problems We needed to be able to predict the similarity of crypto API usage patterns their use cases and their security of course So we combined supervised and unsupervised deep learning to learn these things directly from code available on Stack Overflow So in the first step we learn how to predict the similarity of crypto API usage patterns Thereby, we had to consider a problem. That's very specific to Stack Overflow and we're deep learning helped so code examples On Stack Overflow are oftentimes incomplete and erroneous programs That means their representation the code graphs used to determine similarity may be unsound So you may end up with different code graphs for the same pattern For instance, if one comes from a complete program and the other one does not Deep learning however doesn't really mind it learns its own representation. That's optimized for the problem. It tries to solve So the network tries its best to determine those features that allow it to predict similarity even though inputs are unsound So as a first step we learned a new representation for crypto API usage patterns And we did that by embedding their code graphs into a vector space using structure to back So in these embeddings I learned such that similar patterns are closer together and Disimilar patterns are more for a way each from each other in the embedding space So in this way we can simply use a distance function to determine whether patterns are similar or not So this is the architecture of our embedding network It's a Siamese architecture that uses two networks to generate embeddings either for two similar or dissimilar patterns So during training we calculate the distance of both embeddings generated by the network And if they are too close or too far away from each other We back propagate the loss to update and improve the network in generating new embeddings So in the second step we wanted to predict to learn how to predict the use case of a pattern For instance initializing a cypher or verifying a certificate So since we already trained a model for pattern similarity, we considered very beneficial knowledge for predicting use cases We transferred this knowledge from the similarity domain into the use case domain by applying transfer learning and in the last step That's the most important step. We trained to the security model And it predicts whether your cypher initialization or a certificate verification is secure or not And here we basically did the same thing again We applied transfer learning to reuse the similarity information encoded in the embeddings to train a new model for predicting security So what we did was we just added another hidden layer highlighted in red on top of the embedding network highlighted in blue So with a blue layer while the blue layer already encodes the similarity information The red layer will learn the use case or the security information of patterns And based on that information the classification layer on the right Will then be able to predict what use case it is and whether it's insecure or not So we applied different techniques to train the classification network with one technique called transfer learning We input the pattern graph into the fixed pre-trained embedding network Then we only update the weights of the use case or a security layer based on the classification loss With another technique called warm starting we also updated the weights of the pre-trained embedding network based on the classification loss So this way we update the pattern embedding as well in a way that it helps deciding whether a pattern is insecure or not Okay, so since code similarity does not necessarily need to be not learned from code that applies crypto We were able to compile an arbitrarily large data set So for instance theoretically all public Java repositories on on github However the data set with the crypto code snippets we obtained from stack overflow was relatively small So in warm starting and transfer learning helped us in tackling this challenge Transferring knowledge we obtained from large data sets helped us in learning from small data sets more effectively So here on the left side you can now see the results of the similarity model So it shows all crypto API patterns we extracted from stack overflow Highlighted by the use case so each point relates to a pattern So first for several use cases you can already see that the similarity model creates dense clusters For instance the cypher cluster in blue or the TLS cluster in orange But it creates sparse clusters for some of the other use cases for instance for key generation or IVs As you can see on the right The use case model is able to correct this it moves patterns closer together that belong to the same use case But that are not necessarily very similar Here in the middle of this slide you can see the results of the similarity model again So the distance still represents similarity, but the color now indicates security So red is insecure and blue is secure So in this cypher cluster down here it nicely indicates that our main idea of notching people away from insecure to secure Is actually technically feasible So the cypher cluster here has a security boundary That means that patterns that are close to this boundary actually provide very useful alternatives They do the same thing, but one is secure and the other one is not So in this shows a cherry-picked example of it So on the left side you see a warning for an insecure pattern as we show it on stack overflow At the bottom you can see the list of recommendations which is ordered by similarity and use cases When you click on the first link you will end up on the secure stack overflow post shown on the right and As you can see it is basically the same code It only differs in the statement that rendered the whole code snippet insecure before However developers might ignore basically everything I just showed and copy the insecure code anyway Whenever we detect an insecure copy attempt on stack overflow, we trigger a reminder notch And that shows the warning and recommendations again in order to make the user pay attention All right, so now that we have had everything together. We wanted to test our system design within a developer study And we had two treatments the notch in the control group and Both had to solve two programming tasks symmetric encryption and certificate verification We had two metrics functional correctness which allowed us to measure the productivity of developers and Security of course which told us whether a secure solution was secure or not So once again functional correctness was very important for us as our nudges must not interfere with a great user experience of stack overflow so developers should be able to happily Continue to copy and paste stuff and stay as productive as they have been with original stack overflow And yes, our nudges did not have a significant effect on functional correctness. So both treatments nudged in control Achieved a very high level of functional correct solutions within the given time constraints That means it doesn't matter whether you were not sure not stack overflow remains remains a very effective and efficient in solving programming tasks So if we now get secure solutions on top we exactly achieved what the nudge theory had promised us and Indeed the nudge treatment achieved significantly more secure solutions than the control and Interestingly being a professional or security knowledge didn't have any effect on security So solely the nudges made a difference here So to wrap up the talk I'd like to show one of the most surprising results of our studies And that was that it helped tackling null verifiers quite well So a short reminder we found out that 91% of apps With code from stack overflow contained in null verifiers. So certificate verification was basically turned off Based on those code examples from stack overflow one of our co-authors was able to attack them Only 0.2% of apps that reuse code from stack overflow got certificate verification right and In another study where participants were only allowed to use simplified cryptographic APIs There none of them got it right However, nudge participants achieved 77% secure solutions Well, the control rule was again quite behind with 67% insecure solutions So that's cool because this was the problem with the highest risk for application security And it was also the use case on stack overflow that had the few secure code examples We found over a thousand examples of null verifiers and only 50 to 60 examples that provided security best practices However, if you implement the right choice architecture on stack overflow those few examples seem to be already enough All right. Well, so what's next? We recently applied for the academic partnership program with stack overflow And that would allow us to further test and improve our approach within a larger field study With real stack overflow and real stack overflow users Of course, we'd be happy to test our approach with any company or institution So feel free to contact us So we'd love to see stack overflow considering some of our ideas Since almost all developers use this website, we should make sure that they stay on a safe path We believe that it could have a huge positive effect on how cryptography is used in the real world All right, thank you very much, and I'm now I'm happy to take questions Hi, do you have any plans on trying to integrate this into text editors? So currently not but sure why not Given that we have some time could you go into a little more detail about how these nudges actually appear on the stack overflow page? Yeah, sure. Oh, sorry. Okay, so here so this is one of the nudges. So here we have the security warning So the text in the icons is basically inspired by the text and icons used in Chrome for security warnings So yeah, we show this warning we show also Annotations like below the statement that is that that causes the snippet to be insecure And yeah below the warning we have the recommendations. So that list Displaced difference like overflow posts and it's ordered by the similarity of the code To the code snippet you see above in the warning So that's the the first the first Metric we use to order it and the second one is the use case So if the similarity is not enough we use the we at least show something that applies the same use case And yeah, and then you can just click on one of those and then we show positive security indicators as shown on the right That indicate that we didn't find any Anything that causes a problem and we had another much called the default much where we basically Reordered the search results on the web page So if you search for something We really ordered the results based on security so that you can that you get the the posts that are secure first Right hi How do you make sure that the results of your user study are not biased? I mean you're working with developers who know how stack overflow looks like how it operates So if you introduce something new, why aren't they just following this new feature then? So we didn't do any priming like we tried not to mention security or something like that and And we also tested for systematic differences Based on the demographics, and we didn't find any All right, okay So if I'm looking at this example, it seems like there's it's just sort of the This example seems very simple to find you're looking for just allow all hosting verify Is this model able to capture more subtle bugs or is it? Yeah, sure how's it compared to say a trivial we're gonna look for these certain indicators and flag them Yeah, sure. So this yeah, this is a very simple example, but we are also able to detect pucks like based on the For instance like initializing a key or initializing an IV or Like yeah, or most sophisticated vulnerabilities. This is just an example and it shows a very simple Case where you can where you basically have to find Yeah, like this this this java field at this job a few of the summer flight Would this technique be also be useful for finding seek other security, but non cryptographical problems such as buffer over flow Probably yes I mean we so that the whole thing is based on on code graphs Program dependency graphs And we feed these graphs into the neural network and there are other approaches that Kind of use the same representation of code to actually find buffer all the flows and so on so yeah Shubo from Facebook is gonna talk about Krypton frame Thank you for giving me the opportunity to be present on the behalf of the Krypton team Krypton is a very young framework It's essentially a machine learning framework based on Secure MPC right now, but hopefully other techniques in the near future We open sourced last October So giving us this opportunity to present means a lot to us I Should also say that I'm not a cryptographer This is only my second time at real-world crypto and that might reflect in some of the design decisions we have taken in In the framework itself So a lot of the talk is gonna be on how and why we design Krypton the way it is Every design has a set of trade-offs Krypton does too but hopefully a different set of trade-offs than what we have seen in frameworks and libraries in this space Especially in the space of secure computing I'm gonna use secure computing as a very broad term. I don't know if formal definition exists but for me it is a computing technique where you're computing on data that is encrypted in some way to kind of showcase a 40,000 foot view of what these design decisions are there is a Piece of code on the right-hand side, which may be foreign to a lot of people here But if you show this to somebody in the machine learning community, they're gonna say oh It looks just like by torch And by torch is a leading machine learning framework along with tensor flow, which also looks fairly similar And that in some sense captures what we are trying to do with krypton Our primary goal is to expose the machine learning community to various secure computing techniques And the trade-offs that come with it They're kind of in my view two broad aspects to this The first aspect is that of the choice of models In the machine learning community now for example, let's take an example of computer vision for example The models that are prevalent are deep residual networks They're actually not very convenient to work with in secure computing Mainly because they have these non-linearities that are hard to approximate They're also deep in the sense as a lot of multiplies and sequence that that are problematic So making these trade-offs explicit to the community would Hopefully help the community think in in different ways build models that are much friendlier. Maybe shallower models Another concrete example is number encoding Currently in machine learning we use float 32. There's a special 16-bit float type called BF 16 Neither of which are very convenient to work with and in cryptography or in secure computing Which likes to work with integers usually So this is another thing that we want to expose to the community as there are There is a space to think about different number encodings when your training models are doing inference with models The other aspect is to expose the community to a New feature a feature that allows them to encrypt data and then do computing with data And the hope is when you expose this feature the community is going to think about more applications where The data is sensitive and cannot be trained on in the clear I should at this point step back and say that krypton is very much a research framework Nobody should take this framework and start training on data that actually needs to be Secure we are not at that point yet, but this is our first step And the hope is like when these applications come to be there are people from this room or from this community Can help it build in the necessary security needed for the data that is at hand So we look at this framework as very much a conversation starter between these two communities Because let's face it machine learning isn't going away And privacy is becoming and security are becoming more and more important So with that in mind, let's look at what our main design goals are in krypton So the first thing that first major design goal is we wanted to present a machine learning centric interface interface based around Tensors and computation graphs, which I'm going to get a little bit Into more detail later on that is how machine learning is done these days with neural networks The second one is explainable performance by that. I mean Having a design that is very modular because modularity helps in Figuring out performance. It's easier to work with performance problems and a modular setup than when things are one monolithic Piece of code. So this is something we had in mind because if things are not fast, nobody's going to use it The third one is a debug ability krypton's a very young framework and the field is very new So we wanted a mode where Users who use this can figure out what has gone wrong So right now we only have a secure mpc setup for example So we have a mode where you can have multiple parties on one computer for example, which helps you debug When models are not training And I know that debug ability and security are kind of at odds with each other with security You are trying to hide stuff while the debug ability you are trying to reveal information And I'm not really sure what the right design call here is And the last one is interoperability and by that I mean You know people have been doing machine learning for a while So there are model formats models that are people have trained or They have trained or have models specified in other frameworks We need to be able to import these models and continue training or do inference So we have a compatibility layer, which is actually open source from Facebook or onyx Which is what we use to load models. So not only can we load models from pytorch We can load models for other Frameworks like tensor products, for example, which is from Google All of these design goals leads to one Trade-off, which is a threat model. So we are in the in the current version of grip 10. We are in the Honest but curious mode and I should also say that this is not something that is set in stone This is kind of the first setting on the dial And there's obviously gonna be trade-offs between things on the right and the things on the left So we want to pursue other Address other threat models as well, but we want to be very deliberate about what these trade-offs are What do you gain from getting a better threat model and what do you lose from on the usability? Usability side you want to make these trade-offs very very explicit Before I go into the details of crypto I wanted to give a brief kind of overview on what machine learning frameworks look like what the Ingredients are of what we have seen in the last five years or so So what makes an ML framework? So the first component of an ML framework is this object called a tensor, which is really a fancy name for a multi-dimensional Matrix usually of six dimensions and lower So anybody who has worked with MATLAB. This would be very familiar. So it's In frameworks that tensors our first-class objects The next thing you need is something that you need to up that will do something with these tensors So we have different operators that are basically functions That take tensors as your input and tensors as produce tensors as output and These are chained together by Directed acyclic graph, which is called a computation graph. You can think of these as operators that are Chained together they they take some tensors as input and produce inputs outputs that go to other operators And this computation graph is a special structure. So it usually has one sync node and then that sync node is used to Propagate what is called gradient. So in machine learning we are trying to optimize a function So we do a forward pass and then we do a backward pass through this computation graph for the gradients I'm going to show you a very very simple Computation graph because picture is a thousand words and it'll help put some Diagram behind what I've talked about So I'm going to start with these square boxes that are that are tensors that go into a operator Which is a multiply in this case element-wise multiply of two tensors of two matrices That produces another another tensor. So this is kind of what a forward graph is And to calculate the gradient we kind of invert the graph we kind of flip it around what I'm going to show it As a graph on the right-hand side. So for every operator on the Left-hand side, you're going to get a backward operator on the right-hand side. So for the mall. We have a beam all Operator and the black arrows of the gradients flowing back And then we have two special operators accumulate gradient Which essentially what they do is they accumulate gradients into the tensors from which the input came from So every machine learning framework in existence has something like this underneath it. Obviously the graphs get fairly complicated So let's see what among these Components that we have seen exist in in krypton itself. We have tried to be as much one is to one as possible So we have what is called a crypt tensor object Which is encrypted tensor, which you can think of it as an abstract base class people who are coming from C++ world It promises some functionality that these tensors will have It doesn't actually it's kind of agnostic to what computing you're going to use to ensure security at this point Right now we have we use as I said a secure multi-party compute So there is a MPC tensor that sits underneath it What we want to do going forward is Our goal is to have other tensors at this level So maybe we have a tensor that is backed by homomorphic encryption for example at this level as well Obviously not every tensor will implement every operation that might differ between What technique we use So underneath the MPC tensor we have two kinds of sharing so we have an arithmetic shared tensor And we also have X or a binary shared tensor and we can go back and forth between the two Two of these tensors So what we what I've said so far is only about interfaces and APIs there has to be a tensor which actually stores the data And and that is done by this thing called a long tensor a long tensor is basically a tensor within 64 type and this is where we kind of cross over from Krypton land into by torch land And this was a and by torch as I said is a is a leading machine learning framework Now in in the neural network deep learning space And this decision we took very very deliberately There are a couple of things happen when you have this hard Link to an existing machine learning framework one of the things that happens Is the interface filters up so whatever functionality and the API the long tensor has It filters up the API all the way up to the grip tensor You may or may not choose to implement all of these APIs, but it does does filter up and that gives a very natural interface to the people in the machine learning community The other thing that happens is performance gets linked In the sense when things in by torch gets fast Krypton gets fast And that's also a very deliberate choice and the other thing also happens. For example, nobody had Well, there really wasn't no long tensor in by George before Krypton showed up So Krypton also Influences by towards designs is a nice give-and-take going on between these two frameworks and The third one may be a little bit non obvious is the communication library. So machine learning is a very distributed operation Currently we can train models on, you know, thousands of processors. So there's a communication library that does this It turns out that a lot of the communications that we communication patterns that we see in multi-party compute map very well to the communication libraries that we use for Distributed machine learning. So this was a very lucky find in some sense What this design choice also Leads to is it kind of decouples where Protocol specific optimizations need to happen and where non protocol specific optimizations need to happen So anything in the in the protocol specific stuff can happen in the Krypton layer above the dotted line And then anything to do that is not to do with the protocol For example making the communication libraries fast or making some math operations fast can happen completely in the by torch layer So this this makes life a lot a lot easier because then we can make minimal code changes to Krypton or by torch depending on what we are doing So now that we have seen seen tensors. Let's look at what operations does machine learning training needs And this is a obviously a restricted set of operations than what he would do in a general purpose program But there are quite a few challenges So I'll go from the simplest to the hardest So it's simplest from the view of secure MPC So the first of two simplest ones are matrix multiply or a dense matrix multiply, which is a It's essentially is the component behind the fully connected layer Fully connected layer is essentially a matrix multiply followed by adding a vector called a bias And this is easy to do because it's all additions and multiplies Editions come for free in additive sharing multiplies come with beaver triples And convolution the way it's done in machine learning is usually done using matrix multiplied because the spatial extent of this convolution filters is Very very small. So if you can do matrix multiply fast, we can do convolutions fast as well And the beaver triple formula doesn't not only holds for scalars it holds for tensors as well So you can do it with tensors The next up is logs and exponential. So logs we these are all done through various series approximations So logs we do household or iteration Exponential we do a variant of repeated squaring We do division using Newton-Raphson then we do power power and square root using essentially exponential for power and log for square root and And then finally we have operations that are very hard to polynomially approximate so we have ReLU which is a It's a weird sounding function But all it does is if a if a value is negative it sets it to zero and if a value is positive It just lets the value through so you can think of it as an if condition basically It's people have tried doing polynomial approximations of this but it never actually works So these are the last the last line of ReLU and max and arg max are done using circuits And this is where we have to go from arithmetic sharing to binary sharing and back And max is usually used in layers called as a max pooling layer Which essentially looks at a filter and finds the maximum value within the filter And arg max is also used in the same layer So now that we have seen both the tensors and the operators We need something above Krypton to make things work and the stuff that we need but need above Krypton is to essentially some some metadata and data to Accumulate gradients so we have a separate tensor which is called an autograd Krypton sir Which is used in the backward graph So we now we have these tensors separate But it's quite likely we are going to fuse these tensors into one tensor which would which we can use for both the forward and the back of graph and We need one more object called a module Module is essentially a kind of a convenience object. So in the in the Computation graph you saw these operators as nodes So it turns out that it's good to have some state along with these operators And it's good to have a Consistence interface for this operator. So there's a forward function and a backward function So what the module does is takes all these operators and adds this kind of a standard API so that graph can be traversed very easily a Module can also contain other modules. So you can take a Subgraph and express it as a module. So there are some layers of the neural network that are better expressed as Subgraph so so that's where the module comes in handy as well What this module allows us to do is to essentially use what I said on X. So it gives us compatibility to Neural other models that have been written out using by torture tensor for any other framework and and then we can read it Through on X as a sequence of module essentially What this allows us to do is if you have pre-trained models that have been trained non encrypted But you want to do inference on encrypted data. You can use those pre-trained models. You don't have to retrain anything if you have models that It was specified in in a by torch like fashion You can use on X to read this non-trained model and you can train it from scratch also using crypto So I've kind of seen all of the components that major components of crypto has I'm going to spend a little bit of time on Communication as well in the next slide This term might not mean anything to most people here So already is is a communication pattern where you have a bunch of parties or peers who have a value And they need to exchange these values with each other some all their values And then broadcast it back to everybody So this is essentially open to all in in NPC speak And already so on everything that was shown on the left is something that already exists for distributed machine learning distributed machine learning Need that needs this communication operators as well The next part is reduced, which is the same thing where you don't do the broadcast back So you send data to one party that party sums up all the data and You reduce to that party and that's open to one You can also do a broadcast you can one party can send different values to different parties So this you can think of as communicating from trusted dealer to multiple parties So these communication patterns already exist for doing distributed machine learning I also realized that this is constrained because This is fine when you have n out of n shares n parties and n shares It won't work for any kind of threshold scheme But it turns out that another form of distributed machine learning, which is called model parallelism Needs more flexible communication libraries and those communication patterns used for model parallelism are also Can be used for doing more Flexible communication that is used needed in in in threshold Sharing for example. So those are coming as well One thing I should also say is These communication patterns have been around in in at least in computer science for a long time They actually come from this very old Communication library called MPI. I don't know if people have used MPI here message passing interface It's used in in scientific computing a whole lot because in scientific computing you have a bunch of Compute going on in parallel and then you need to broadcast results every once in a while So we found that quite interesting that it looks very much MPC ish from from that perspective So I wanted to give some Examples of code. I've shown one example in the very beginning, but I wanted to show how similar Code in Krypton looks like to code in PyTorch. So I'm going to start with some code on the left in PyTorch These are very small examples So this is something that takes two tensors and adds them And the code on the right is Krypton doing the same thing There is not a lot of difference you import a new library and Everything has a Crip in front of it in some sense and then you do have some in it So this makes it very intuitive for somebody who I used Not just PyTorch even even TensorFlow looks very similar to this If you wanted to gradients So here's some code on the left and right which which does Gradients so on the left we have a tensor. There's a cross entropy is what is called a loss function On the right. We have the same variant of code in Krypton. It looks very similar And what we have done also is we have machinery in place to run Krypton completely in a browser so a Jupyter notebook is a very Common tool in the space so we can load up for example Krypton and Jupyter notebook To kind of get started very easily and this has proved very very beneficial for people who are starting on this Starting starting to kind of play around with this And in fact all our examples in our source code is actually they have examples in Jupyter as well So people can and take can get a taste I also wanted to show you what loading a real model and a real data set looks like And this is a very small inference example with image net, which is a very popular data set in computer vision for example So on the right, I'm going to show you code So what I'm doing is here is importing some library initializing Krypton Then there is I'm initializing an image transform which props the image for example I load the data set which is it's loading it from a folder image net folder This is a data set about in a 1.7 million images or so And then I load a pre-trained model a resnet 18 is a full-scale model It's not not a toy model And then we encrypt the model we encrypt the image we get an encrypted output And you can reveal the encrypted output and you're gonna get the same result as you would have done Everything in plain text So it doesn't look all that different from what you would do if you are not encrypting things So this is what we have now So where do we go from here and by we I mean not just people at Facebook I would we would love to have participation from both people here and in the machine learning community as well Because this is a long road So the first thing on our mind is improving performance We work in as I said an in 64 space. There's very little optimized libraries in 64 space So we are writing some of like a more optimized version of in 64 matrix multiply in 64 Convolutions Facebook has a library called FB gem for matrix multiply. So you're writing vectorized code usually using AVX 512 for faster Matrix multiplies We aren't done yet. So we have done some initial implementation. We have to do more here On the hardware side my wish list would be support for wider data types 128 would be fantastic wider vector SIMD lengths would also be fantastic in the space The next one that we are working on is a trusted third party So our current trusted third party the trusted dealer which I Think is fine as a first cut, but we want to explore other options of doing of generating beaver triples One idea may be using something like an Intel SGX or other form of the encryption schemes in the space Do I sometimes wonder why we don't have a service for generating beaver triples that will make our life a whole lot easier These are some of the near-term things that are working on going forward longer term would be support for other secure computing techniques as they alluded to Should we have home or pick encryption based tensors should we have Some sort of enclave based tensors. We don't have the right answer and we don't know where What people would like to use so in some sense we are looking for feedback as well And also other research Like things for example Privacy and security are not the same thing in many ways You can do secure computing, but at some point you have to open the result to actually take an action So you want, you know, maybe your model and data is encrypted, but at the end of the day You want to know what the classic the cloud what the classifier told you and that might leak information about the model and the data So How can it quantify how much information you're leaking? and these goes to like maybe marrying things like differential privacy techniques when we open from a secure computing domain and How much noise to add is is one research direction that we want to pursue But one thing that we want to keep in mind is no matter what we do as a as a community in the space We should work in models and data sets that are actually practical There's a lot of research that works in the space using say MNIST, which is a very small data set or C410 Which is also a very small data set These are not very useful. They're useful in some in proof of concept But it's very hard to take something that works in just say MNIST and C4 and then extrapolate it to something that is really practical today So with that in mind I Want to set forth the challenge of some sort for everybody here and also in machine learning in some sense, but a little bit of history So machine learning has had this challenge called the ILS VRC challenge. I think it has run since 2010. I believe I don't think it runs anymore So what this challenge? Set about was this first it created a large data set of a million images each image came with a tag of what object was there in the image And the idea was to train a model on the image and then classify with high And and on a test set classify with very high accuracy Um, so this is salt in some sense We can do this with very very high accuracy and only that we can train a Model on a 1.7 million image data set in minutes I think the fast in the record is two minutes 43 seconds or so, but with some degradation in in Accuracy, but within 15 minutes we can train, you know, very very good Models and this has completely changed Machine learning as we know it in the last five years in some sense the popularity of neural networks now is because neural networks were shown to be the best way of doing this So the same vein I have a question to say we Want to train on a million encrypted images save from image net Classify with high accuracy we can cut ourselves some slack obviously we are working in our encrypted domain We are not gonna get everything we want. So maybe we want an accuracy of say a relative 20% of what we can do now in clear text And then instead of being done in a minute say we give ourselves a week I don't think anybody has done it to my knowledge But if we can do it, I think this will be a step function change in in in the community It'll be the same step function change as how Neural networks are first able to do the ILS VRC challenge. I believe in 2012 With I I forget the number, but it was a massive increase in accuracy. It'll be that that kind of a change So with that I wanted to introduce you to the krypton team at Facebook. We are a very very small team Alphabetically we are County Brian Lawrence Mark Shoba myself Vinnie and Shane But we are very open to collaboration. We are a research group. So and everything we do is an open source So we're love for love for people here to be interested in what we do and collaborate with us. Thank you So in the examples that you showed I Didn't understand you can actually specify multi-party machine learning situation like If you want to train a model using data from multiple parties, so do you declare the parties? Yeah, so I what I didn't show you is the tensor constructor that you saw as a source argument So I'm basically showing one party here So you can have one party have the data and another party have the model and you specify the sources So MPI has this notion of a rank So you can say if rank equals zero, which is the first party Then you have the data a rank equals one you have the model and you and we can do multi-party compute in any number of parties Yeah, so we have it in the examples, but for convenience. I didn't show you here Hi, thanks for a nice talk So when you spoke about the performance coupling between sort of pytorch and the lower layer and then the krypton at the upper layer We spoke about it very positively But it can also have a negative side to it because I assume the developers of pytorch are interested in optimizing pytorch for the regular case And that sometimes might make it slower for NPC like if they reduce the amount of operations greatly But at the cost of increasing the amount of multiplications say right that would be faster on up in a regular execution sense But an MPC will be slower. So do you do you have any thoughts on that? Yeah, I mean, that's a good point and that I think battle will exist Just the fact that I'm in I know the pytorch team very well, so I get It's easy for us to influence. Let's say those those decisions and for example being able to do, you know Convolution in the n64 space if you went to pytorch team and said I wanted to convolutions in the n64 space We like you're crazy But here we are But yeah, we are very cognizant of this And we are trying to push to have more of these features in pytorch So other the entire the community in pytorch is huge so the so the Motivation there is if we can do these changes in pytorch core then everybody benefits in some sense, but yeah, absolutely that that will exist Thanks You mentioned that you have to do some transformations potentially to do like the max type of operator Do you implement specializations for a particular number of parties such as those in like a pui 3 year secure in it? Okay, that was an explicit design goal not to have Party specific optimizations, and I don't know how far we can go With that at some point we may have to have Party specific Optimizations, but we wanted to have as any number of parties that was actually a problem for us I think at first Implementation we were using two parties and then we spend into three and everything broke Hello, Sahar Mazloum from George Mason University. Thank you for a great talk. I really enjoyed it So my question is about so the framework that you just described is another framework that how to do machine learning in secret computation and So as we saw yesterday, there are a couple of other like frameworks out there that does similar Functionality to provide similar functionalities. Do you have a kind like a do you have any idea of how your functionalities compared to others in terms of Because everything goes down to how you approximate those functions like for example lock that you just mentioned or Max pooling or some other functionalities. So, do you know how it like a do you have a benchmark in mind that how do you compare with others? How the approximations differs in terms of like the performance Delicate that you just touch base on and some other like features Yeah, so max is actually not approximate or max is done using a circuit. So it's exact max Actually, we are doing this now for our own sake the approximations. We we do our valid in number ranges that we see in training So we want to do it. I should also say the frameworks that we discussed yesterday They're much more general frameworks. You can write a code in that specific framework with particular Annotations obviously, we are not doing that. We are everything we do is very very specific to Machine learning and this is a decision that we took very very consciously So our goal was to avoid compiling as much as possible because what we have seen is Languages take a long time to get traction in a community Just going from Python 2 to Python 3 to 10 years and that just one language So we wanted to stick to an interface that is familiar to community and start from there But yeah, we are starting to look at Comparing with how good our approximations are or how fast they are Actually our slowest operation is a division operator, which is kind of weird the Newton-Raphson So the other thing with like using Newton-Raphson is you cannot in secure in in a secure space You cannot say when did I converge? Because that reveals information. So we have like a fixed number of steps of Newton-Raphson that we do And and yeah, and that works for some number ranges, but we have to see what we can do What you say Thank you so much So in your time at the n64 for deep learning your dynamic range How I mean is there a decimal point floating over to the left somewhere because otherwise six point right now So we just represent it as a n64 essentially. So long is the lookup in a table of 64 numbers Log is not a lookup. We actually do series approximation for log So log you can expand of the series, right? Right, but that's not I mean if you're taking a log and you're rounding it down the integer run an n64 It's the map. There's very few outputs There what there are very few outputs as in the Precision or the map the biggest log you can have is 64. Yeah And then the smallest log you can have is zero Yeah, and So why do we need a Newton-Raphson iteration for this? so you want like because in 64 we Basically, it's a fixed point representation in n64 So it's a decimal point that is fixed. Oh Over over in the over in the left. Yeah. Yeah. Yeah That's thanks. You again next speaker. Love you From IBM research. Thank you. Good afternoon everyone This talk is towards a homomorphic Machine learning big data pipeline for the financial services sector so everybody's talking about machine learning today and I'm glad a lot of people talk before me so I can cut a lot of the parts that I was going to say this talk It's About a collaboration that we did with one of the banks With Banco Bradesco in Brazil In the first in the second quarter of last year, I would like to thank my co-authors in this work and also My collaborators shy and Victor That where we does that and that also the reviewers of this conference because Since I couldn't review the name of the institution until today It was kind of a very dry abstract that I had to submit But hopefully I'm going to be able to do some present a lot more today So just to put things in context Banco Bradesco in Brazil Latin America not everybody heard of it, but it's the second largest private bank in Brazil It's in terms of a brand is the most valuable brand in the country and this is important because If you consider data leaks data exaltation and all sorts of things that can happen that damage your brand so Securities paramount for them the number of clients individual current account holders is 72 million They do 70,000 Tasks, this is not only the transaction This is the task that involved the transaction that commits on a database and everything else that happens on the back end per second And you're gonna say well why they are looking at homomorphic something to protect the data is Because they are They embrace Advanced technology very early, so they want to be ready for when technology is available So what was the challenge that? They came to us is Sharing data amongst the different business units, it's an interesting thing Because there are in particularly in regulated industries like financial services because there are not only privacy laws, but Antitrust and a lot of other things that don't allow people to see data from different Departments of your own company all together Right So they can be breaking some regulations somehow The order is this year The equivalent of GDPR It starts to be enforced in Brazil, so they wanted to be prepared to how they're gonna move this forward which then led to at the end of 2017 beginning of and discussions in 2018 For them to start looking at what sort of technologies could be applied to keep the data in crypto of the time so they approached us because of homomorphic encryption and The other important challenge is how do we do this in a hybrid cloud environment? Right, and there is a lot in there So When you do this sort of work What are the people that you're gonna get together? right, so Just to give an example of the kind of the breath that we had to discuss there We had our sponsors in their team which are which were their R&D team and the CTO But then you need the systems infrastructure people involved because you need to understand what the impact of this new Technology is gonna cause on everything else that people have there You need the data governance people because you need to show that What you are doing is Secure to some extent The security people are gonna say well that what the security people are doing that well Homomorphic encryption and some of the advanced crypto stuff that we talk here is not mainstream yet and These people don't understand necessarily what it is. So there was a whole education process to show them what Security can be achieved with it and The important people the data analysts because they are the ones that are gonna consume whatever we did and Obviously our team with very frequent technical Meetings and exchange along the process so Just to put things in context Homomorphic encryption allows us to process data without giving access to it Technically achieved by computing encrypted data without ever decrypting the data Which basically means that it's not an encrypted in the registers or anything of the machine it addresses that Problem that I mentioned before which is how I can share data When data can only be shared on a need-to-know basis and there are regulations and so on. So how can we address that aspect? and It's important to consider what threat model We are addressing here for this scenario. It's the honest But curious which basically means that the entity perform your computation is a legitimate entity to perform that competition, but it wants to learn from what you're doing and FHE is based on lattice cryptography Therefore quantum resistant to the best of our knowledge today So let's have a look at the problem Banks and financial institutions use a machine learning Something what people call traditional machine learning which is basically a regression based machine learning For a variety of things and you're gonna say well, why not to the fancy neural network stuff? Because they are regulated industries it needs to be you need to be able to explain Easily why a given prediction was done in a given way So if you have many hidden layers there and you can't say why a decision was taken that can be tricky so Banks use it is a lot for certain tasks like Marketing loans approval and so on. So the data sets that we used Where comprises real financial data over a window is a sliding window of 24 months of Measurements that they make about every one of us So basically the bank measure me in 546 individual explanatory features which is a mix of a quantitative categorical and binary features And the other important thing is the amplitude of the values take take just one thing Let's say your current balance your current balance might be minus few hundred dollars or plus Few million dollars Right, so when you're trying to do machine learning this type of things Things get tricky because of how you do How you manipulate to the precision you want to do So this is what they do right and with that group that we put together we had to figure out What these case and what we're gonna do so we look at okay, so let's put a fully homomorphic encryption Subsection data store and see if we can do predictions if we can do Machine learning with that So we took the marketing scenario The marketing scenario that they work with is one that can they predict whether someone who is going to need a loan within the next three months This is in this is an important task they do because they can upsell loans But look at the second bullet there it's a rare event in that data set is Around 1% Which basically means that if I didn't do any machine learning just said no I'm gonna be right 99% of the time, but The golden nugget is in that 1% if you can find that 1% in your transactions That's where you make money So that's why the importance for this And the data is very sparse So what was the success criteria for us to do prediction Homomorphically The first one was if I have an existing model and Existing data, can I encrypt the model and the data? Run the predictions with the same accuracy as the predictions done without encryption and this is an important aspect because You don't go to a bank and say well, you know these ten years of modeling that you have through it all away Because I'm gonna start a log in now, so you have to be able to do that with what they have the second is To perform a task in machine learning, which is very important and quite often overlooked Which is Variable selection, right so remember I said 546 explanatory features But the models will have tens of features because although we have a lot of features Many of those features are highly correlated so you have to Get rid of those features and find the ones that are the most relevant for the condition that you try to to predict and the question is can we do this variable selection Homomorphically with the same accuracy that's done without encryption and So those were the two main Success criteria and obviously With some acceptable overhead because if you're doing everything as we saw from the last talk Things can happen in seconds or a week or so So how do they do it today and this is an important Aspect is everything is done on premise because this data is private. It's confidential. It's sensitive so they don't put this information in the cloud and To prevent exeutration of data The environment where the data analysts work is a secure environment So you can't take your cell phone that sort of stuff in there Right, so and you can't take your laptop and come back with the data either So the data and the machines to stay wearable Which is very costly for organizations To do it that way and some of the organizations because of their regulations the data has to be physically separated and You can't use you can't even use Multitenant environment systems Right, so you can see that the cost of infrastructure if you have to do this in house is very high so We came along and said, okay, let's do this in the cloud right So if we own the premises, which is secure we take our transactions we encrypt our transactions And that we send them encrypted to somewhere in the cloud Where we can do Predictions if I've already got a model Or I can run some machine learning to derive new models. I have Encrypted predictions So the cloud cannot see anything. It's honest environment, but curious And I bring back the results and then I can decrypt So that was the premise that we did so when and we did that Right our paper shows All the mats and how we organize the data how we encode the data how we try to optimize everything for a sim day like Computation but what I want to show you is More of the the results that we got So our experimental platform a mainframe And you're gonna say what mainframe why a mainframe right? So why can't you do this? Elsewhere Because most of the transaction data is on the mainframe out of the financial institutions They are also currently evaluating How they use the mainframe in an as integral part of a hybrid cloud strategy So you have the elastic environments that you can consume in the cloud and how this can work there Our library HLA runs in the cloud Sorry runs both in the cloud and on the mainframe and It's open source. So these were some of the characteristics that were appealing to them To come to us and hope and the requirement of the mainframe So results. How does this look like? prediction We took an existing model. This is a 16 variables existing model We took the Data we encrypted data into the model. We run through an encrypted register progression based prediction model and That the accuracy was pretty good, right? Because we are using here an approximate number a scheme for more encryption and the accuracy was very good So that part number one So when she proved that we could do predictions with the same accuracy The next step was well, can we do variable selection? Can we do the training? Can we retrain? that model with new data but now encrypted and We did this too so What we're showing here is Dip log loss of the variable selection Based on how many steps We do in the training Versus The sigmoid approximation that we used right, so That was mentioned before because you cannot Stop your you cannot check how good you are and you cannot stop your computation. You have just do so we did that and we show that for five and 60 steps with the sigmoid a Sigmoid approximation of third degree or seventh degree Was pretty much the same when compared to this yellow curve is Doing it in the clear without encryption The next question you're gonna ask me is well, how long does it take? So this is the computation overhead computation overhead in terms of Depending on the security level For 256 bit security was 50 times and you're gonna say oh dear takes 50 times longer to compute now This is pretty good when you're talking home of encryption, right? In 2019 would be a few hundred times The memory overhead for the prediction Wasn't bad either is about 20 times for 256 bit security So once we have this how do we put this together, right? Remember this what's wrong with this chart the wrong with what's wrong with this chart is that this is the Kind of a more research academic way of looking at it where I have Machine learning thing and I have encryption and so on but I'm missing this part Which is how do I deploy? How do I generate my keys? How do I store my keys? How I manage my keys how I make sure That the whole thing system Works and there is their coordination that I require Right, which basically means that I still need the secure environments Because at that side I have data in the clear becoming encrypted to be deployed in an unsecure environment. I Can I have to retrieve my keys from my key store and Remember these are homomorphic Keys they are very very large compared to everything else that we have been using so far and and When I decrypt And I decrypt I need my secret key to decrypt. So again, I need this Trust the environment otherwise I can leak My secret key and then everything gone away, right? So this is this is the Environment that we have been working recently and how we integrate Everything in a framework that can actually be consumed for whether you do machine learning whether you do searches or Some of the others Okay, and I have one minute for questions Do you see homomorphic encryption as a way of reducing? consumers exposure to like Maybe data leaks and stuff. Well data leaks is still gonna happen But if it's if it happens in an encrypted form with a strong encryption then There is no damage on the data being leaked a lot of the data has already been leaked just by going on the Internet And it has been captured in an encrypted form But people are trying to to decrypt it so If we get one computer somewhere Of some day that could be vulnerable with a lot is based encryption then it's less of less of I'll be curious to know, you know, you know based on your experience and all the work that you guys have done How far away do you think we are from? You know fully homomorphic encryption to be at a point where it is sort of feasible for most of us to be able to use it serving a commercial setting or You know a lot more than what we are able to do now Which is not much because of the performance cost associated with it. Well, actually for it's very use case dependent And that we tend to say that right now we are at that inflection point whether the performance is adequate For certain use cases Most of what I show you runs out of the batch system the predictions. They are not sub second predictions They it's an overnight task. So if it takes an hour or ten hours to run But with security and I can outsource that to the cloud instead of having to To do everything in house that makes a lot of sense Thank you great. Let's thanks it speaker again. Thank you. It's breaks now