 Hello everyone, welcome and thanks for coming down to listen to me. So the talk is about introduction. It's an introduction to private machine learning. Just so i get a sense of the audience, how many of you work With data in general, machine learning or not? Okay. And how many of you work in cryptography Or privacy or security? Okay. All right, so cool. Yeah, that's good because most of the talk is about the privacy And the cryptography rather than on the data analysis. So i'm mostly assuming that you kind of know how Machine learning works in general. All right, so let's get started. My name is Shankar. You can follow me on twitter At mail shanks. So currently i work for a company called manual life. I lead the data and AI research efforts For the companies like research labs. And then i used to sort of work in Generally in data and machine learning at NUS And at a bunch of other startups as well. So the reason i'm talking about this in the first places As we speak, there is sort of a quiet revolution Underway in AI and in machine learning in general And the revolution is really happening Because of the marriage of these two fields of cryptography And AI. And by sort of combining the concepts It's now becoming possible to build a much Much more decentralized and encrypted and private Privacy preserving machine learning systems. And the point of this talk is really to sort of get you started As an engineer in building these kinds of systems. So i'm going to cover sort of the landscape of options That are available if you want to build a privacy Preserving machine learning systems. And sort of compare the tradeoffs that are involved Between most of these options. And sort of dive a little bit deeper into the last one Because that's the one that's sort of most mature and practical And you can actually start building systems using that one. So basically i'm going to be talking about differential privacy And federated learning and homomorphic encryption And secure multi-party computation. So all of these are sort of sub branches in cryptography And i'll sort of briefly touch what you can do with them And i'm going to sort of dive a little bit deeper into the theory And code for secure multi-party computation. Yeah, why does any of this actually matter? Like why do you want to take the trouble to, you know, Learn all of this thing? It's not easy to learn all of this, right? And then sort of stuff all of that over right into your head Why should you care? And the reason you should care is because we as a society right now Struggle to balance on the one hand the benefits of building These data-driven systems and the risks and sort of violations Of privacy that naturally occur as a consequence of the data Collection process that is necessary to build, you know, A data-driven system. So, you know, I don't have to like tell you the headlines But I mean just one thing that personally just boggled my mind Was that every single Yahoo account was hacked. Like there's nothing left to hack. Every single Yahoo email account so far, right? All of them are hacked. So and that's not even like the smallest sort of large-scale Data breach. And beyond just like corporate data breaches, sort of the debate Around Cambridge Analytica and all of that, right? We sort of start having to ask the question, have we, you know, Entered the age of mass psychological manipulation? This is Alexander Nix, the CEO of Cambridge Analytica. He's up on a stage here sort of bragging to, you know, Crowd of marketers about how sort of he can tailor The same actual same message but sort of worded in two Different ways depending on your so-called psychological Psychographic profile, right? Now some of these claims are a bit tall, but they're not empty. So, you know, we have sort of plenty of reason to start Thinking about building machine learning systems which can Respect user privacy. And the other thing is that sort of over the last five years, There's been a lot of talk about, you know, Transforming your company into a data-driven company. And a lot of people have, you know, talked about how data Is maybe the biggest asset that you can have. But data is also, to me, it's also like nuclear waste In the sense that if it leaks, then that's completely Out of your control, right? Like once it leaks, then you have no control over it. You don't know where it's going to go. And the way we respond to that today is with regulation, right? And this is just sort of a pictorial, you know, Here are the users. They give up their data. And then this is AI Inc. And AI Inc. trains models to maybe serve you Better cat pictures. Like that's kind of where we are. And sort of the point is that collecting data is An intrinsic, like it's an intrinsic feature of building Machine learning systems. And the more data you collect, Sort of the bigger honeypot you create for hackers. So, and that's where, you know, cryptography and machine Learning come in. And the point is to be able to Analyze sensitive data and still maintain privacy. So you can encrypt data, but if you encrypt it, then, You know, you can't analyze it. So really the challenge is how do you, sort of, Maybe not encrypt or maybe encrypt it in a certain way While still allowing analysis on it. So let's get into the details now. So sort of the first option, if you want to build any kind Of privacy preserving ML or data analysis in general, Is what's called differential privacy. So differential privacy is, it's a, it's a subfield of I would say of cryptography and data analysis. And differential privacy basically concerns itself with The following question. Let's say i want to study the Incidents of diabetes in this audience, right. So i do a survey and then i collect a database and Then i come to you and i ask you to participate in my Survey. Basically you sort of, you know, You tell me whether you have diabetes or not. Right. Now if i have the results of the survey before I ask you for your response and then if i compare that With the results after you responded, then i know Whether or not you have diabetes because, you know, My diabetes count will increase by one. If you have diabetes, if not, it won't. And so i can infer that you have diabetes or not. So differential privacy basically deals with building These kinds of databases and queries where you can still Collect all of this data but do it in such a way that i Don't, you know, that i don't know for sure whether or Not you actually have diabetes. So in a more sort of formal setting, imagine that there Are two sort of data sets, the squares here are Two sets and they differ only in one entry, in one record. And differential privacy is all about building query systems Which will run the query on the database as you would Normally expect but the sort of, but the result will Not be exact. It's not like an exact count. It's the count plus some probability, like a Probabilistic noise added over the count. And you want to do it in such a way, you want to control the Noise that you're adding and you want to control it in such a Way that for these two data sets where the difference is only In a single row, the outcome of the query, right, looks More or less the same. So that's what sort of the blue And the yellow lines show you is that, you know, the Probabilistic distribution of the result, you want that To be identical because these two data sets are, you know, They are the same except for one line. And then if you can do that, then, you know, that's sort of That's from an adversarial perspective, if i want to figure Out whether or not you have diabetes, right, i can't do that Because no matter what query i run, i'm going to get the Same sort of more or less the same result on both Versions of the data set. So i can't compare. So differential privacy is all about sort of how many Queries you can run and how much noise you want to add and You can trade off between the noise that you add and the number Of queries that you can run. And based on that, you can Actually compute a probability of violation of privacy. So differential privacy, i guess it, i mean, based on this Description, i would say it sounds like a fairly promising, You know, solution to all our problems. But yeah, it's not the solution to all problems, it's A solution to some problems. Basically, the sort of the Biggest issue with differential privacy is that you end up Creating this trust boundary. So you have all of this raw data And you have the analyst or the data miner, right? And the analyst issues her query and then that gets Transformed by sort of a differential privacy transformer And then that's run on the actual database. So everything to the right of that trust boundary, That's still insecure, that is still sensitive. So if you lose that, then you're in trouble. So that's kind of a big issue with differential privacy. The other issue is also that you start losing accuracy. So the stronger you start making your guarantees around Privacy, the less accurate the results become. So the next option that you have is what's called Federated Learning. And so Federated Learning works In the following way. So you have all of these users And they sort of use some service from AI Inc. And rather than have all of them upload their data to AI Inc's, you know, central database, what you do is You split up your model and you let the users Download parts of the model. And then they train the Model locally, maybe on their phones. And they upload the gradients back to the company. And then the company basically aggregates all of those Gradients and builds their own model. And then, you know, they can make money with it. So Federated Learning is sort of amongst the four Kind of options that I'm going to go over in this talk. I would say Federated Learning is probably, I mean It's kind of famous only because Google does it. But other than Google, not too many people do it. And there's not so many libraries and tools that you can Find out there in the open to actually build Federated Learning systems. And then sort of the third Option is what's called homomorphic encryption. So homomorphic encryption is basically a special kind Of encryption scheme or a family of schemes. And they all have a special property. The property is that when you perform arithmetic on The encrypted data, that will correspond to the same Set of arithmetic operations performed on the plain text. So let's say you have, you know, the numbers three and five. And you have what's called a homomorphic encryptor. So you pass three and you pass five through the Homomorphic Encryptor and they come out looking like Garbage. You won't be able to decipher, you know, Any bit pattern out of the encrypted, out of the Encryptor. And the encrypted data is what we Normally call cipher text. So you have cipher A and cipher B. And homomorphic encryption schemes guarantee the following. So if you do, you know, if you take cipher A, Multiply it by two and then you decrypt it, You're going to get six because three times two is six. Similarly, if you add the two ciphers, cipher A And cipher B, you will get eight because, you know, Five and three are eight. So, so you can do, You can do arithmetic on encrypted data. That's kind of amazing if you think about it because the Whole point of encryption is to transform plain text into What mostly looks like noise and sort of a necessary Property of encryption is that any encrypted piece of Data should be sort of statistically indistinguishable From noise. So you know, it, it needs to be Sort of spread out over the range that you're guaranteeing and Shouldn't sort of have like a particularly high distribution In any one particular value. And being able to do arithmetic And still preserve it on the plain text, the fact that You can do that is kind of interesting. Quite mind boggling actually. But there are some issues With homomorphic encryption as well. Firstly, if you're going to do homomorphic encryption in Sort of a distributed fashion, right, the problem of key Management comes up. So like who manages the keys? Now, for the homomorphic and cryptor to work, obviously You're going to have to have the keys, right? So if the company distributes the keys, the company can Sort of decide to, you know, cheat at any given time and And then sort of your privacy then is at risk Because really it's, it then comes down to a promise That promise between the user and the company that the Company won't ever peek at your data. So key management is a huge issue. And then the other issue Is that currently homomorphic encryption is still sort Of computationally very expensive. And then there is again loss of accuracy. So, you know, so in a practical sense it's hard to Sort of build fully like full blown analytical systems Using just homomorphic encryption. Here are some references that you can, you know, refer to There is a homomorphic encryption library. It's called ecilib. So it does implement some of these, A lot of these things actually. There is a library called python palier, which does A limited form of homomorphic encryption. Okay, so a big picture. Where are we right now? So differential privacy, right? Gives you strong privacy guarantees like mathematical Guarantees, but suffers from centralization risk. Federated learning does not have any centralization risk, But there are very little sort of practical open source Implementations and I guess only Google does it. And then you've got homomorphic encryption, which Is very, very promising, but is very computationally Expensive. And then the last one is secure MPC. MPC stands for multi-party computation. So that's the one that we'll sort of delve into, Into a little more depth. All right, so secure multi-party computation basically deals With the following question. Let's say there are parties p1, P2 and all the way up until p6 or pn in general. And let's say all of them, each one of them holds a piece Of data, x1, x2 up until xn. And the question is the following. Given a function f, any arbitrary function, Which might operate on x1 to xn, or on any subset of x1 to xn, Secure multi-party computation asks the following question. Can you compute f without revealing any of x1 to xn? Right, so that's a little bit abstract, I guess. Let me, let's go through sort of a more concrete example. Now let's say there are six people in a room, right. And let's say they want to compute the sum of their Salaries, but they don't want to of course reveal their salary. Now sort of the naive way to do this is to sort of have A central trusted party and each person tells them their salary And then the, you know, the central party adds them up And then announces the sum. Now obviously this is not ideal on many fronts. And hopefully there is a better way, and there is. And the better way is secure multi-party computation. So for this particular concrete instance of the problem, Of the problem of sort of summing up n inputs without revealing Any of the n inputs, let me give you sort of an intuition Around how you can do that. So you can do this in the following way. The pick any party, right. So we arbitrarily pick the first party x1. Now what x1 does is he chooses a random integer r, right. And then what he does is that he adds up his salary to r And passes that on to x2. So now x2 has a number. Now x2 does not know x1's salary because, you know, it's added to a random number. And then x2 basically takes the message that he got from x1, Adds his own salary and then passes it along and so on. So each person in the ring basically takes the message from the previous person And then adds their own salary to it and then passes it down. And then what you do in the end is when x1 gets the message back, All he does is that he takes m6 which is the final message And he subtracts r. x1 is the only person who knows the random number r. So he subtracts r. And now you have the sum of salaries without any person knowing any other person's salary. So that's kind of, like, that's the flavor of the kinds of things That you can do in secure multi-party computation. And basically, multi-party computation, or mpc, right, Basically deals with the following question. It deals with the design of protocols to be able to sort of do these Kinds of computations with multiple parties involved. Now, this is a really simple, like probably the simplest Possible protocol that you can design to answer this question. Now, there are a number of sort of weaknesses and assumptions in this protocol. The most important assumption is that people do not gossip, right? So if, let's say, if x3 and x1, if they get together, They can basically subtract the messages that were sent between x1, x2 and x2, x3 And then, you know, you can then find out salaries that are not yours. So the question is, can you design protocols which are Resistant to collusion and to gossip? And the answer is yes, it depends. It depends on the kinds of assumptions that you make Around how people collude and the assumptions that you make Around how people gossip. And basically, there are sort of two results. Two sort of big takeaways. The takeaways are that, yes, you can design absolutely sort of You can come up with the most sort of malicious gossip And collusion protocol imaginable. And as long as t, which is less than half of n, Number of people stay honest, sorry, are dishonest. There is, as long as you have a majority honest population, You can design multi-party computation protocols which will be Resistant to that kind of collusion and to that kind of gossip. So that's kind of the big sort of take away or like the big Message from the field of multi-party computation. And it's sort of the most practical of all sort of options And you have like a number of libraries. You can refer to some of them here. So let's dive into a little bit into how exactly you can Analyze data with multi-party computation. So now before we get into like all of the integrity details, Right, there is a building block that is very, very sort of Important to building mpc systems. And that's what's called secret sharing. So secret sharing is basically, you take a secret, Like a secret piece of information and split it up into Multiple parties such that no party can figure out what The actual secret was. Each party has only access to sort of their share. So I denote that by s in angle brackets. So when I say s in angle brackets, that means s is secret shared. Now a simple way to do that is to just compute s As a sum of s1, s2 up until sn. So you distribute s1, s2 up until sn to n parties. And none of them knows the actual secret unless all of them Get together and add up their shares. So that is what's called additive secret sharing. Additive secret sharing is perhaps the simplest of secret sharing protocols. You have other secret sharing protocols. You have things like what's called Shamir's secret sharing protocol Where what you can do is split up your secret into n parts And make it such that you need at least t where t is less than n Obviously parts to come together in order to get the final secret. But this is a much simpler scheme where everybody has to come together In order to reconstruct the secret. So basically this is what I'm talking about. If you have a share, you pass secret, secret is an integer. And all you do is you find a random number between 0 and q. The reason we have this q thing here is because generally speaking In cryptography, you do all operations on what's called a finite field. And a finite field of q basically just means the set of all numbers From 0 to q minus 1. So yeah, that's just kind of how you do things in cryptography. And if you want to share the secret, basically you generate A random number between 0 and q. And then share, that's like your share 0. Share 1 is the secret minus share 0. And then you return share 0 and share 1. And then if you want to reconstruct, you take the 2 and you add them up. And you just do it mod q because you do everything in mod q. So that's what's additive secret sharing. Now additive secret sharing is the sort of the building block For doing data analysis in a secure multiparty setting. And I'll show you how. So now that you have this share and reconstruct methods You can basically construct this class. Let's call it private element. And you can create instances of private elements. And you can sort of do operations on them. So you initialize the private element just by either giving it the 2 shares Or by taking an input and then splitting it up with the share method That's defined above. And then there's of course the reconstruct method as well. So reconstruct basically just, you know, it uses the reconstruct over here. You just add them up, add up the shares. That's all reconstruct is. So now that you have private element. What's left is being able to define operations and run them on these private elements. So if you're feeling a bit lost, let me sort of bring your, you know, Like bring the mental picture together. The idea is how do you go from mpc or multiparty computation to machine learning, right? And sort of the big idea here is that if you can decompose machine learning Algorithms to what's called arithmetic circuits. And if you can evaluate those arithmetic circuits in a secure way, Then you can do data analysis in a secure way. So an arithmetic circuit is just like this network with arithmetic operations in them. And it's a graph. So there are either operations like addition or multiplication, Or there are data like X1 and X2 are pieces of data. And these pieces of data are basically, you know, the shares which we split. So the idea is the idea is to be able to evaluate these kinds of circuits on on pieces of data Which have gone through the share method earlier, which have, you know, You take a piece of data and you you compute the share on it and you get shares zero and share one. And then you push it through this circuit and you evaluate that circuit. And if you can evaluate that circuit and if you can decompose a machine learning algorithm down to this circuit, Then you can basically do multiparty computation without like revealing X1, X2 or Xn to anyone else. So basically in order to be able to evaluate circuits, right, You basically have to be able to add and multiply secret like secrets which are split among n parties. So let's look at how do we add like I said earlier a in angle brackets basically means a secret a which has been split among n parties. So what we have here is we have a and b which are two secrets and these two secrets have been split between parties p1 to pn. So you have parties p1 to pn each party holds a1 to a n. So you take a you split it into n parts distributed between p1 to pn. You take b split it into n parts distributed amongst p1 to pn. So this is sort of like a you know for a mental picture p party one has a1 b1 party two has a2 b2 and so on. If you just to sort of remind you we are using additive secret sharing. What that means is if you take a1 and if you do a1 plus a2 up until a n you get a similarly if you do b1 plus b2 up until bn you get b. And so the question is how do you do multi part how do you do addition in a multi-party setting more precisely what we want is we have secret shared a we have secret shared b how do you compute a plus b. So it's actually really simple to do that what what you do is each party right does a local computation all they do is they add the two local pieces of data that they have and then the broadcast. The result so p1 basically adds a1 and b1 to get c1 and p2 does a2 plus b2 and p2 gets c2 and so on. So p1 to pn now have c1 to cn and they just broadcast c1 to cn and if you add up you know c1 to cn it is basically a plus b. Because I guess I guess you know why right because c1 is a1 plus b1. Therefore c1 to cn the addition is the addition of a1 plus b1 and like the addition of all of that. So so so addition in multi-party computation is pretty simple operation. There is there is no computation that happens between parties and is just a local you know a local one ad and then a broadcast. So that's like a really simple protocol. So this is what's called the spdz protocol in multi-party computation often pronounced as speeds. So just to make sure that I have everyone here in the room you know you do this. So ad x and y is share 0 is x dot share 0 plus y dot share 0 mod q share 1 is x dot share 1 plus y dot share 1 mod q. Here we assume that there are only two parties. You can of course generalize this two n parties and then you return a private element with you know initialize with share 0 and 1. And that will contain the result of x and y like the addition of x and y. Any questions so far yeah. Yeah this is like can be any arbitrary piece of data however you wish to represent it. Yeah so this is like really like a very granular building block right like yeah. It does not. Yeah I'll come to that but yeah I'll come to that. Yes. Yeah so in the spdz protocol or the speeds protocol which is the one that I'm describing. And bear in mind that multiparty computation has a number of different protocols so we are just going through one of them. In this particular protocol you can't really divide so you have to be sort of clever. What you have to do is basically figure out how to decompose your machine learning system such that all of the division operations are not really privacy centric. Right and then that basically all the division operations are are are public information and you keep only the multiplicative parts private. So so in the end sort of building a holistic system actually requires you to modify the the machine learning algorithms as well because you can't do division here. So you have to kind of bring them both together. Right not covered in this talk but just sort of you know to give you a flavor of the kinds of things that you you know. You have to do actually that goes to your question as well. Often what you end up doing is either either the division is something that you that you keep public. So for example if you're building a convolutional neural network right and what you have to do is rather than do a max pool you can do an average pool. So an average pool is basically additions followed by a division right and then the division is like you you keep that public. You because you can't like divide like you can't you know that you can't do that if you're using the spdc protocol you don't. You can divide and therefore therefore people will know the size of your average pool filters like that will be public information. Right but you can still compute the average pooling in a way that you don't know the inputs or the outputs. So that's the kind of thinking that you have to apply in order to sort of transform your ML algorithms to be able to use this. So that depends on your like security model right like if you're a company and you want to use MPC for like learning from your users data on their phones. Then you like you know then they are naturally sort of stored in multiple like locations. But you could also sort of have a different model where maybe maybe these are just like different VMS on the same like cloud system. Such that you know if you one VM is compromised does not mean like your whole system is compromised. So that requires like a more holistic thinking of. Yeah yes it doesn't matter where they actually set like these are the conceptually you do this. You can adapt them to sort of any physical physically they can be anywhere. Okay so so addition is simple then that's we're left with multiplication. So the multiplication basically is you know if you have secret shared X and Y how do you compute X times Y. And this is sort of this is perhaps the more the most actually this is the most intricate part of of my whole talk actually. Because you can't just do you know local multiplication the way you did local addition because you can you know you think about it you multiply the individual like splits. And then you add them up you will not the result will not be the product of the two whole numbers. So so we are like this is the setting you have parties P1 to PN you have the shares a you have the shares X and Y. Secret split and you want to compute X Y now in order to be able to do this there is sort of a special like requirement that the protocol requires. So what the protocol requires is that it basically requires a triplet a triplet of a B and C such that a times B is equal to C. So any random numbers a B and C which fulfill this property that the property is a times B is equal to C will do. So you need like a triplet like this and you need to have it secret shared amongst the parties. So basically you end up with this you have all of the parties P1 to PN each of them has X1 to XN and Y1 to YN and they want to compute X times Y. And they also have these sort of triplets they have even B1 C1 you know P1 has even B1 C1 P2 has a to B2 C2 and so on. Now the sort of the multiplication protocol goes in the following way. Now the first thing that you do as a first step is each party takes computes this quantity you know X1 XI minus AI for each eye. So party P1 computes X1 minus A1 and everyone else does the exact same thing. So they basically it's a local subtraction very simple step. And once they've done that they broadcast the result. So X1 minus A1 is broadcast and then X2 minus A2 is broadcast. And what you do is you take all of this and you add it up and you get this quantity X minus A like the full number you know X minus A. And for the time being I'll call it K1. So this number K1 is a public number because you broadcast all of these you know tiny pieces of information. Everybody knows what's K1. So that's step one. Step two is very similar to the previous step. What you do is YI minus BI you compute YI minus BI locally. So it's a local subtraction again and then you broadcast the result. And whenever you broadcast the result you end up with a public number we'll call this public number K2. So K2 is basically Y minus B because you know if you add up all of the splits of Y like Sigma YI is Y and Sigma BI is B. So K2 is Y minus B. So that's the second step. And the third step so just I have sort of the first two steps you know in the upper half just for your reference. So that's where we get K1 and K2 from the third step is the one at the bottom. So the third step is each party PI right. They do CI plus K1 AI plus K2 CI plus K1 BI plus K2 AI. So you take you know each of them has CI the CI comes from here. If you're if you're wondering when we started off we had this arrangement. Each party had XI YI AI BI CI right. And like I said before you know if you take the whole numbers A and B you multiply them you get C. So now we are here and what each party does is that they take their CI and they add it to these two other quantities. These quantities are K1 the public number which you got in the first step times BI plus K2 which is the public number that you got in the second step plus AI. And then you get this other local quantity let's call that CI right. So now P1 to PN have Z1 to ZN. So that's the third step. And now what you do is you pick any arbitrary party. So I just arbitrarily picked the first party and the first party what did what what what he does is he basically takes CI or Z1 in this case and adds K1 times K2. So like the product of the two public numbers is added. So just bear with me. We're almost done. This does get somewhere. Actually what you do next after this is you broadcast. Right. So everybody broadcasts their ZIs which all look the same except for the first party. The first party took their CI and then they added their added the sort of the public product K1 and K2. Now if you go home and sort of do the algebra if you do the sum of ZIs right it'll turn out to be X times Y. So in this like entire process you can sort of verify for yourself that no private information was ever exchanged. Everyone had sort of the pieces of you know X, Y, A, B and C and they all do like sorry. And they all do this sort of local computation which is usually just a subtraction and an addition. Right. And all of the subtraction and addition it just happens with their local pieces of data. And then you end up with the ZIs and that is what you broadcast. And once you broadcast you get sort of that's how you get the product. So sort of if you you know for some people like a code snippet might be a better way to sort of understand. So basically the class that we were writing private element you can add a multiple method to it. So mal X, Y is you generate this triple. So here's where you have like a caveat. Now you have this dependency on this gen mal triple method. Right. So what gen mal triple needs to do is it needs to generate a random number A and a random number B and a multiplication of A and B which is A mal B. And this has to be sort of cryptographically secure. And you can't reuse the triplets. So each multiplication basically uses one triplet. So you can you know pre generate these triplets in advance. And then you compute so K1 and K2 are basically alpha and beta which is you know X minus A like you reconstruct that and Y minus B if you reconstruct those. That's basically your K1 and K2. And then you you just do this you know alpha dot mal beta alpha dot mal B a dot mal a mal dot so a dot mal beta and a mal B. You add them up and you basically get the product of X and Y. So we were here right like now that you can add and multiply you can evaluate arithmetic circuits and yeah you all sort of spotted the problem here is that you know you can do division. And that is sort of that is where the machine learning aspect of crypto ML comes in is now you have to sort of modify your ML algorithms such that they can work and be private even if the division operations are not private. And so here is a sort of here's a reference implementation for the speed Z protocol the SPD Z protocol and like I said earlier there are multiple ways to do multi party computation. This is not the only protocol. So there's other protocols listed them here and you know if you depending on your particular ML algorithm it might be harder or easier to sort of use one of the others. One of the other protocols so that kind of so we are past like the most technical part of my talk. I just want to sort of highlight this project it's called open mind so open mind is basically it's an open source project where. It's really maybe has it's very promising it's not there yet but the promise is that it might just be the sort of the first practical implementation of a lot of the concepts that I went over. So open mind is like it's not a company there is no ICO or any you know shady business is just like a Linux style open source project. It's a word play on open source actually so like you know open mind like you mind data in an open way so yeah. Look them up but open mind basically where they are at right now is they have essentially two big components. There's what's called sift and grid so sift is their sort of portable deep learning framework which compiles like deep learning graphs into actually into a variety of like devices other than like GPUs. And then great is basically a decentralized like compute network where which helps you with all of the splitting and sort of you know sending information across multiple nodes and managing all of that. And what's really interesting about open mind is that they take all of these concepts and basically add like money to it. So what you can do is you can like promise payments for training so it works in the following way. You like all of this is very fluid as I speak but roughly speaking you can add like open mind basically adds crypto economic incentives for training and privacy. So if you have you know spare compute power you you take a share of data and then you train models on on your particular share and you get paid. And all of the payment and the management of the models and sort of like the encryption all of that is encoded into the smart contract. And you know the idea is that for people who want to get who want to train models they sort of sort of upload money to the contract and then kind of other players in the ecosystem. The ones with spare compute they sort of they you know execute the contract and then the train some of your data. It's not there yet but that's sort of like the vision for open mind there you know building a lot of the component pieces as I speak. But if you're interested in this kind of stuff take a look. And yeah so my call to action really is if you're a data scientist you know learn some cryptography and then come talk to me if you're interested in some of this. So yeah that's that's all I have for you guys and happy to take questions if you if there are any. Yeah. So the field has to be the you know the parameter Q is a large prime number and B can be actually any random numbers. So. Yeah. The number Q at. Yeah. That's a large number. But it can be anything. Like. Excellent question. So yes. Like yeah. As you probably noticed you do all of these operations on integers. Right. So if you want to like actually train models and you know your models have floats you have to basically pass them through an encoding function. Essentially what you do is you convert your float to an end by multiplying it by you know a large number such as a million or 10 million depending on sort of the depending on your bit space essentially. And then you have a corresponding decode function. So you encode it by multiplying by let's say 10 raised to 6 you decode it by dividing by 10 raised to 6 and that's how you go between in sand floats. Yep. Yes. The bigger problem is that around Japan. Yeah. Yeah. Yes. So like let's say you have an incoming batch of like images so let's say 256 by 256 by three images. And let's say you develop like a convolutional layer with maybe 32 filters. Right. So the number of multiplications is 32 times like the number of convolutions on each image times the number of images in a batch. Right. So like that many multiplications. Now if you do it in a naive way that is going to be intractable turns out that what you can do is you can sort of collapse some of that multiplication. Not all of them have to be private turns out that only a tiny subset need to be private and still you can you know preserve privacy. So those are the kinds of tradeoffs. Yeah. So if somebody stole this database of encrypted salaries and they know the distribution of salaries in the real world can they not effectively convert the input values for each individual user and assign them back. You've been in the differential privacy setting. The one where you compute the sound. Yeah. So you don't encrypt your salaries there. You. Yeah. There's no encryption involved. In fact. So what you do is basically. No. They have one. For this. For this. For this. For this. For this. For this. For this. For this. For this. For this. For what are we conclusion database often cookie. Yeah. They know. Mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm. So the answer is no. And the reason is because sort of the test for whether you have a valid encryption scheme is that these kinds of analysis fail, essentially speaking. Basically, if there is a one bit difference between the distribution that you know from the real world and the actual data that was interpreted in your database, the single bit difference, like a single bit of difference between those two real sets, create the encrypted ciphertext to be completely indistinguishable from any arbitrary data set that has gone through encryption. So I guess the requirement is that data needs to be sufficiently varied. For homomorphic encryption. Say if there are 10 people who have a salary of 100 and one person who has a salary of 300, that is pretty easy to guess about. For homomorphic encryption, no. Because the result of this encrypted database with 10 people having $100 a salary and $300, like the encrypted version of that data set, will be completely indistinguishable from another hypothetical data set where, let's say, everybody has a salary of $10,000. Or maybe everybody has a salary of $0. Or even actually, the point is that it will be completely indistinguishable from another data set, which is exactly identical to the first one, except that maybe the last person had a salary of $101 instead of $300. And I think that requires a key this long enough that the ciphertext space, the big number that you can't cover, that you chose the ciphertext space. So that is extreme, that's a stream cipher thing. Like that's one band. But if the ciphertext space is the same size as the plaintext space, in this case, you should be able to tell the difference. Also, you're not only doing these operations more computer-like, they'll all be the same, the same size. I mean, it's probably the key is so big, you want a space much smaller than the other. I guess my question out there was that you have two people who have three every, those that produce the same ciphertext for different ciphers. Oh, yeah, great question. So that actually depends on the exact monomorphic encryption scheme. So there's some where you add a bit of a randomness there, and that's how you hide these kinds of attacks. Any other questions? For fragmented learning, sorry, fragmented learning, where you're sending off the models into individual local computational sites, and then you're collecting some kind of data which you're putting in the final model of the mother ship in some sense, right? Yeah. In that, you have to retrain, like the model has to be the same. That becomes a big problem, because a lot of times you're involving the model, and you have to run the same model through all the data again, now that you know the model, between the main model. Which means each of the local guys have to keep all the data all the time. Not necessarily. I mean, if you, let's say, you train it on a subset. So you already have the gradients for that subset. You actually only require the gradients for the new data, that's fine. Looks like no more questions. Thank you very much.