 So maybe I'll start with some kind of, like, how many of you know what's a SAS application? Stupid question, but OK, quite a few. That's great. That's nice. And how many of you heard of Split Neural Network? OK, three. That's good to know. So the reason why I'm asking is probably people are very familiar with CNNs or RNNs. In five years' time, SNN is going to be one of those padded to the bucket list. That's what is going to happen. So my goal today is to kind of introduce you to this why it is important and stuff. And before asking, I want to go to the next kind of slide. What do you guys think? What's the next big challenge in ML world? So just maybe a few answers. Access to data? OK, that's a good one. Why do you think so? We always had access to data. What's changing in this world? That's valid. Once it is there, getting access is another term. So if customer data is there, they can't access data. That's right. And it has become even more worse. So any other answers? Privacy? Yes, that's going to be, sorry? Compudation, yeah, to some extent, yeah. But the biggest challenge? Sorry? Model size, yeah, maybe. Yeah, that's a good one. But actually, from what's going on with respect to, it's not just GDPR. It's actually things were worse even before. Like, say, health care, for example. In the US, everything is protected by APA compliance, which means you don't really have access to any of the patient's data at any point of time. And imagine you want to, I'm just trying to motivate you to where I'm getting into. Imagine you want to build a machine learning algorithm, deep learning, whatever, just to predict cancer. Can you do it without access to any of the radiology data and stuff like that? Probably the answer is no. So if someone says, oh, I'm doing all this deep learning on health care data, and most likely it's probably done in what I call as a petri dish, where you have access to some sample data and you're building something, and that's about it. It cannot go into production. You cannot really scale it. You cannot really do anything with it. It's pretty much. So what's going on? The next five years, you will see more and more papers similar to what I'm going to talk about today. And its applications are actually varied. And some of you mentioned SaaS applications, and that's what I'm focusing on, because my company, Workday, that I come from, is a big SaaS provider. We are basically one of the leaders in the cloud space. I don't know many of you heard of Workday, but so we are pretty big. We have almost 80% of our Fortune 500 customer, people are our customers, so it's pretty big deal. So the next big challenge to me, I mean, like to a great extent is to, how do you handle privacy and the machine learning? You want to make sure that data, which you have in place, is accessible to the machine learning algorithm, whether it's deep learning or anything else. And the raw data still sits in some place. It never comes out, but it's still useful for building a model, right? That's going to be the biggest area of challenge, which is going to be, and there are a lot of people working on it, and I'll just talk about it and stuff. So that's a big piece. Privacy and ML, they don't go together actually. It's very, very difficult to do both together. And I'm going to basically talk about, pretty much my agenda for the day, is to talk about in general, machine learning. How do you handle privacy in the machine learning world? Because we need access to the data. And then what are the standard, some of the techniques, which have been invoked for a long period of time, and what are the drawbacks and stuff, and motivate you into what's happening in healthcare, and compare it with the SaaS applications, which is my world. And then introduce you to something very, pretty recent, I would say, in terms of imagine someone was talking about in the earlier talk in Microsoft, that they say they publish a paper in 2015, and it's implemented in 2018. So this one is even more recent. 2018 is a published, paper is being published, and they've been just testing it out on some healthcare data for now. So, but I believe that this is a lot of promises. That's one of the reasons why I wanted to kind of introduce this to the evangelize this, and now it can be applied in different ways. And then we'll talk about some small use cases, and then what are the lessons we learned? Go from there. Sounds good? Okay, so when you talk about machine learning in general, like, so you have data, you have to do something to the data so that if some hacker or an adversary jumps into this, you know, wherever the data is hosted, then it doesn't make sense. The data should, it cannot make sense out of the data. That's a real goal. You think about when you talk about privacy, and things like that, it needs different techniques to make the data private. That's pretty much in the sense if someone holds the data and they should not be able to make any sense out of it. That's like kind of the biggest goal, right? So there are a multitude of techniques. Let's talk about one standard technique which everybody, it's been there for a long period of time. It's very well known. You have data, just anonymize it, right? So you can just blank them, like especially the, what is called a PII data, which is personally identifiable information. You want to kind of mask them or you can just hash them, which is very standard technique or you can also do masking. So various ways of doing anonymous, anonymizing data, right? But the problem in anonymizing, you probably, most of you heard of it, is it's easily broken. So when Netflix, how many of you know about the $1 million price of Netflix? Probably everyone, right? So Netflix thought, okay, we are going to put this data out and it's completely anonymous. What they did was they changed the user account name and then they also did something with the ratings and stuff. But the key thing is they also had the movie name, which, you know, and then this was exposed, this entire data, they had like 100 million ratings, almost 400 and K users, that's a lot of numbers and almost like 70 K movies and the movie ratings were publicly available for people to, you know. So this clearly talks about, right? So I just anonymized the data, I put it out, no one knows about who that person is and stuff. And there comes this paper and I think for God's sake, maybe 2012, I don't remember the date, by two people in University of Texas, Austin. They took the IMDB data, I'm pretty sure everybody is familiar with IMDB data and the Netflix data, put them together, they were able to kind of pretty much figure out what does this 480K users are. So in other words, there's no anonymity anymore. Everything is exposed. So we think, okay, anonymizing something is very, very powerful, but most of the time it doesn't work. So it's a hammer to solve privacy issues. It's very easy to implement. It's scalable quite easily, you can easily anonymize data. But the problem is de-anonymizing is also very easy. You just need to put in some effort. And there is a view even more bigger, you guys should take a look at it if you're really interested in this. There's an hospital in Massachusetts where the governor basically said, oh, I'm gonna open up this data for everyone. And that killed, it's like, his governor's data was basically, they thought he's gonna expose this data and everything was anonymized, but people are able to de-anonymize. And they knew the entire medical history of the governor. So you can imagine how bad that is. They know everything about you. So anonymization is not the best way to do protection, private, you wanna do privacy, right? And then the next one, which is kind of interesting, which is also kind of an interesting area of field is called differential privacy, which comes into the folder of what is called obfuscating the data. I mean, I wanna explain this example, but think of this in a simpler ways. You have data. So when you think of the adversary, which is the hacker who gets into this data and he sees him some data and then if he makes sense out of it, it's good. So what do you do is take the data and then add some noise to it and then expose that. So when someone gets exposed, they're basically getting an information where it's mixed back. You don't really have the real information. It's mixed with noise and you actually know what's the distribution you had and stuff. So you are the only one who can de-noise it, when they do not know whether you're actually done an obfuscation or not. So this is called differential privacy. This is an old, so each and everything which I'm talking about here is basically what it's like two, three days of talk. It's like very, very deep fields, but I just wanna introduce the idea that there is something called differential privacy, which is pretty big. It's all about adding noise and the whole amount of research going on and what kind of noise we can add. Lap less noise is very common and what are the things we can do. So in this example, what happens is there is a, the bad guy just wants to know how many people have a rating of bad. So if you just threw, there's like three count, like the problem with this, many of the data is sometimes the count is so low, it's very easy to figure out who that person is and stuff like that, right? But imagine you do had some noise to it and then whenever he queries, he gets an answer of 2.915 and next time he gets 1.882 and the next time it's one. He's like completely thrown off. He's not, the first time he's very close, but he's not given the exact number, like he's trying to do something. So one drawback for this, you can already see is that like over a period of time, it will actually converge to this, the actual count, but the problem, so in other words, the drawback of this, right? It's not going down. So drawback of this is like, you really want to add noise, but at the same time, the hacker is one guy who wants to get your data, but there are all the other users who also needs to do this. So you need to, as a balance between utility of what you're trying to do and then the privacy part. So you need to balance these both and you need to do, so it's kind of, you add up random voice, make it harder for the adversary, which is basically the hacker. At the same time, you need to do something so that it is simple to implement and understand and stuff. So you're caught in between, but the only problem you see in this differential privacy, which is a big problem, that's the biggest bottleneck of differential privacy. It's very scalable and stuff. It's very easy to implement. That's the whole notion of it, but the problem is over a period of time, if the hacker gets into the system over several queries, he's able to figure it out because the pattern comes out, that you had added a noise to it and stuff, so he's able to figure it out. So that's what it is. It's more and more queries. You can actually get what the true data is and stuff. So I mean, that's a big drawback. So that's another one, but it's very well known and things like that. So the next one, which I want to talk about is the third big one. So we talked about anonymity and why it's bad. We talked about differential privacy and then good, bad of it. The third one is, how many of you heard of homomorphic encryption? Everybody knows encryption, so yeah. So it's basically, it's actually working, it was done by a Stanford PhD student, this PhD thesis was on this, that you have encrypted data and then all your applications actually happen on encrypted data, which is amazing. You have data and you encrypted it, but usually what happens is okay, so you are encrypted to the data and if you want to work on it, you want to decrypt the data and then work on it. That's like natural notion that you're encrypted and it's all safe and it's, and in a very safe location, now, okay, let me decrypt and work on it, right? In the world of where we're living right now, no place is safe for data especially. You can be hacked anywhere, you put it in anything, anything can be hacked and once something is decrypted, all your encryption goes for a toss. But imagine you can actually work on the encrypted data. It's already encrypted and you can actually do stuff on top of it. That's amazing because you don't have to, at any point of time, the data is never decrypted and you're still able to do a lot of operations. But the problem here is that it's an infancy, I would say, although this is great, fantastic. What you could do is still, it's a huge literature. Again, this topic alone can run for three days. So you have very, very simple addition, like say two and three and then you can encrypt two, encrypt three and then you can add, act on the, so we can do only simple math, like addition and multiplication operations and stuff. There are some complexity which gets added, which is great, but still you can see, imagine you want to do this and do machine learning. You can already see it's not scalable. You cannot really do homomorphic encryption and machine learning. But there are a lot of research going on and it's very exciting field to come in. So in fact, all of the three things that you talked about, like differential privacy and this one is a beautiful area of research and where applying machine learning to, how do you take machine learning and apply these techniques is each and everyone itself is a huge area of research. So it's pretty exciting, right? So now comes, now I think I've motivated you on different ways to do privacy and things like that. And next goes to the big, very big thing which happened in recent years from Google. It's just called the federated deep learning. So probably I think there was a few talks in this conference too which talks about this on how do you do anomaly detection on edge devices using federated deep learning. It's a fantastic thing because this kind of, now that I've kind of given you different techniques, whatever I'm talking about, federated deep learning and the split neural network which I'm going to talk about are overall frameworks into which you can actually add anonymity, differential privacy and any kind of encapsulation. I mean that basically adding other techniques and make it even more powerful. So you can add the privacy but what federated deep learning does is solves one of the original problems I started with that I don't want to, as someone was saying, that I need to do my machine learning but at the same time I don't have access to the raw data. The raw data in your cell phone and other iPads and others cannot leave the iPhone. So you have a lot of private data in your phone and then my machine learning, deep learning, I'm referencing only deep learning because there's a lot of things which deep learning does which works for federated deep learning but I'll explain in the end how a logistic regression and other things can also be applied into this space but the goal here, federated deep learning, what happens is that your training starts somewhere, you have a neural net and then you just say, okay, deploy this model across all my cell phones and then the training happens, like each and every individual, you have a phone, a ESF phone, a ESF phone, every one of these phones actually train the neural net on their phones and their idol, okay? It's basically training happens and then the parameters are basically, the gradients are basically sent back and then the main server, the main machine learning algorithm retrains or in other sense builds a new model out of all the inputs it gets and this new model is again sent back and then they take this and then start retraining. So if you look at this, the computation on, it's great, it's all like you can basically distribute this deep learning into a lot of cell phones and stuff but the problem is the network bandwidth you can already see, like every, imagine like there are a billion devices and they have to send all these parameters back to one place, so there's a huge clock which could happen. So this actually has some impact on how federated deep learning actually is done. So if federated deep learning is great, it's a fantastic technique but it has some scalability issues. You can already see like the network bandwidth as some issues and I'll show you some examples why this is not ideally suited for certain situations and stuff. So now let's going back to this, the actual topic of this thing is about introducing what is called a split neural network. It was a work done at MIT by Pranit Vepakumar who's a grad student there and Professor Rusker. So what they do is a very similar concept like federated deep learning. They do a federated deep learning. It's basically distributed deep learning but they do it in such a way that some of the bottlenecks and scale issues we have with federated deep learning are actually solved in a very, very novel way. That's pretty much what it is. They just do it in a slightly different way. I'll go into it, I'll show differences about each and every one of them, right? So now to kind of summarize what I've done so far before I get to the split neural network world, this is what really happens right now is that you have differential privacy, you obfuscate with noise, that's what you do. And homomorphic, I mean these are the top ones right now. So hammers, so-called hammers to do all these things, right? And homomorphic encryption, it's simple operations but not scalable, as I said. And federated deep learning, what happens is the training happens with the client and then everything merges as servers, a huge bottleneck of when it comes down. But what split neural network, which is our topic of interest, which I believe in a few years would be pretty viral, is that it's actually, the old training actually happens in the server as well as in the client and it's a continuous thing. There's everything is sent to what is called a cut layer and then it's kind of exchange over the period of time. And right now the research is focusing on doing deep learning, but it's a work in progress where they want to extend this to other things like logistic regression and stuff. Because I mean deep learning has, so it sounds very, very fancy and great, but deep learning are pretty hard to implement and pretty hard to actually understand. So there's a lot of pros and cons. So it doesn't really apply widely to all the use cases we have. Logistic regression could solve like 90% of world problems. I'm just saying that from my experience. And XG boost is probably the only thing, algorithm you need to learn most of the time to do most of your work. It basically trumps everything in the world except for some of the deep learning techniques. So it's coming, but it's not there yet. So some comparison between why split neural network is kind of better than split neural network is, as I already told you, it's the bandwidth. It's huge difference. You can already see the computation in T-flops. You can actually see the convergence, actually with the validation accuracy hub. So you can see federated is somewhere here, but split neural network seems to converge much, much faster. And it's kind of makes sense because you're continuously training in both places. This so-called wait time doesn't really happen. The server is sitting like idle. It's not doing anything. Until the clients come back and say, A, I've done something. In the federated, it's just like a design. By design, it's like that. The server kind of waits, not so. It's not doing something on its own and stuff like that. It sounds like, but you can imagine when you're trying to do deep neural network training, it takes a lot of time. So it has a lot of wait time and stuff which gets added. So that basically results in federated deep learning being not that great. But of course, split neural network has been inspired from federated deep learning. That's really true. But the point is getting better. And in a few years from now, you can see even better things coming out of it. But that's pretty much, you know. So now let me get into the world of what is this whole thing about to do with me? So why I'm inspired by it and stuff like that. So SAS applications, as you all know, software as a service and a pretty big thing about. So maybe I'll go to the next slide. Kind of, you can always see, you know, data is the oxygen for machine learning. You want to have it. And so privacy concerns, access to raw data. You need raw data, but you don't have it. And SAS applications, we produce a lot of data. It's not about question of data availability. It's data accessibility. It's a huge problem. And it's going to be pretty big. Privacy, I don't know. It's going to hit like, you know, huge waves like a tsunami. It's going to hit. And every company is going to, it's already happening. I can see it. Every company has to start doing things in private. And there is this AI world, which is saying, oh, we are going to change everything in the world. And there's this privacy, which is like, they actually, you know, the correlation is like, reverse. You won't have more and more privacy. And then AI goes, so you're like, basically, you know, tug of war. How do you do? So which means you're going to be caught in this kind of a problem that you need to do more and more towards improving privacy. At the same time, you have to do machine learning, whose whole notion goes about, I need raw data, access to raw data. So these are like going to against it. So we need techniques which converges both. We need to make them converge. Right now they are like this, if you just think about it. Like imagine if someone comes and tells you, hey, I want to build this machine learning thing. And then I will not give you access to any data. So what do you do? So you cannot keep working on synthetic data for a while. At some point of time, you need access to the data to test your models and stuff. So that's going to happen. Like, you know, all the machine learning models and stuff in glorified terms. And suddenly the GDPR and all these things is going to come and hit. And then says, oh, the current architecture you guys have cannot work because you cannot have access to raw data. So it's going to happen. It's coming. So in five years from now, I can bet on it. Like there will be more, in this conference itself, there will be more topics around privacy and machine learning than the other things. Because that's going to be pretty big. It's going to hit. So obviously, we want to build machine learning. And the paper I'm talking about, Lesplit Neural Network, actually talks about health care data. And in the US, there's something called HIPAA compliance, which basically says no access to customers, I mean, patients' data you can never have. If you want to build machine learning for cancer, you will not have access to it. But the same kind of what my thought process is, I was trying to solve a similar problem in SAS application. So my thought process was, oh, it's very similar. In because of HIPAA compliance, I cannot really have access to the raw data. And I still want to do it. And Split Neural Network does solve that problem because you can still have access to the raw data. But the raw data is never taken out of the patients' radiology centers or something. Yeah, I have a similar problem. I have a lot of customers. And every data sits in a tenant space, like their own silos. And I don't have access to it. I cannot export the data. I mean, traditional machine learning goes, oh, get me all the data in one place. I'll do all the things. I'll do training, get the models, and put it in production. It's all very good. But if you can't even train on any of the data because you don't have access to the data outside, that's system. So now, if I cannot bring the data, I was thinking if I cannot bring the data to me, let me go where the data is. And it's exactly what Split Neural Network does. It basically goes into the space where the data is. Does it make sense? That's the beauty of it. And just to give you a flavor, so I kind of said health care, which is what the paper was, and SAS application. They have very, very similar problems. We don't have access to the data. And this is something which I wanted to note about why is this even more a better application for SAS application? So you kind of need to know that. So if you think about federated deep learning, one of the problems I said was I have like, say, 10,000 cell phones today when I'm training on. But I would love to have from 10,000, I want to go to like a million. And from million, I want to go to billion. So look at this last one. You can kind of maybe look at that one. But look at the last one. Federated deep learning has 500 clients. It's basically, it's not really optimal. It's still going to be from a 3GB kind of thing. So when I have this number of clients, it doesn't really scale that much. But look at what happens in split. It's just the opposite. When I have more and more clients, my training speed increases drastically. Like it's unbelievable that I could actually scale. So this is so more optimal. So imagine you are a SAS company and you have 1,000 customers today. What do you want? You want 1,000 to go to 2,000 or like 20,000 or like a 2 million or you just don't want to increase. So obviously everyone wants to increase the client more and more, right? It's like adding more clients. Spirit Neural Network just solves the problem beautifully. And this is probably the same thing with when you have federated deep learning, when you have number of devices go up, you hit the bottleneck where things don't scale very well. And this particular one is what makes it very, very, what should I say? It's beautiful. And it actually, when you had more customers in the SAS world, it actually does a better job. I would love that kind of a feature to be in my machine learning, right? So it's one of the things. And let me just get into, I try to simplify it as much as possible because given time, there's a lot of math involved, a lot of things involved. My goal is to just motivate you on what it is. And there are other things which you can try. So what really happens is that, say, the raw data is not transferred between the customer and the server. It's very similar, similar to federated deep learning. And that would be true for any distributed deep learning. The real motivation behind distributed deep learning itself is that, yeah, I can't just go where the data is kind of a concept, right? Forgetting about privacy and stuff. It has other implications that you don't have. You already have a huge infrastructure for storing the data. Imagine for machine learning, you want to take the data and put it again. So it's like kind of redundant. It's actually, it's very expensive, right? So you don't want to do that. Let the data be where it is. I'll go and grab that whole note. Beyond privacy. It's another advantage. And then what happens is that, so there is a layer called the cut layer. It's true. So every client basically takes the raw data, they do some, you know, generate some features and then some gradients are sent. And then it's sent to the cut layer. And from the cut layer, so, and then the cut layer from the outputs, it is sent to the server which completes the rest of training. So this is not what happens in federated deep learning. This is the key part. Everything else would remain the same. Is that the server in a federated deep learning pretty much acts like an aggregator. It gets everything, puts it and then, okay, new model, go, right? It doesn't do training as such. But in this, what happens is that there's only some components of the training which is complete. The rest of the layers are actually done at the server level because it continues on that, right? That's what makes it very, very lucrative in the sense the server is continuously doing some work, right? And this is an example where I have one server, one client, I'll show you one other example where multiple clients and stuff. It's, which is a very classic scenario. And then the gradients are obviously, so there's one for propagation which starts with client, it goes to the server. And then now the server basically does a back propagation and it sends it back because it needs the information about the gradients to continue to the next. So this completes one forward propagation to the server. The server brings it back and now it completes an epoch. Now the client starts all over again. It basically does that, right? And the gradients are sent back to the customers. In this case, the only thing is you have raw data and then none of that raw data ever leaves the customer. It stays right there. So that's one of the biggest reasons why we want to do this. And so again, yeah, of course, you know, it converges till continuous, till number of epochs continues until it happens, right? So the finally, the built model is within the server. So the one thing which happens is the same concept, but now you have more clients stuff, but everything happens in the same way. It's basically whenever every customer, everything happens, stops at the cut layer and the gradients are sent. It's the same thing, it's an extension. Think of this. One example I can think of, it's like think of this as a relay race kind of between the clients and stuff. That's what really happens here. It's like a bet on is given to the other thing and then it continues. So it's like over a picture, the big picture is that it's a continuous stream, like the server, clients, everything is continuously working on it. It's not like somebody's waiting for it or anything. It's both are equally involved, you know? It's, I'm just giving the high level overview. Of course, there's a lot of things which can be done. So how does it apply to like a SaaS application in general? It's very similar. You have multiple clients or customers here. And then, so one of the things which can be an additional thing to it is that typically when you have a SaaS application, you don't actually have the customers raw data information. That's obviously blocked. But most of the time, say, if you're a SaaS application, you have a lot of customers, you do have some sort of an information related to, because you log a lot of data about the customer, like say performance. So there's some amount of information you actually have, but I want to use that too. I'm very greedy, right? Machine learning, you know, almost every data scientist is very greedy. Any data I have, I want to use it. Like, that's very normal. So I don't have access to raw data. Oh, okay, I want split neural network to solve it. And then I also have this data for which I have access. You know, I have the data. I want to use that too. So that is something you can actually combine where the server gets this information. There is no client or anything involved. Everything is merged. This is like classic machine learning. And you can actually combine this with split neural network as well. So it kind of, you know, like a thought process like where you could actually use other, whatever data available to you. It's not like I can use only this and not the other one, right? So it's kind of, it's a thought process. And there is one of the things which is initially we thought about was, okay, now that I have raw data, one of the things which you always want to do is can I add more privacy stuff to this raw data itself? Like I'm like very scared that I have raw data. I'm doing something in it. Maybe an hacker gets into it and then now you can, you know, figure out what the raw data is. That's what I'm always worried. So maybe I do something like an anonymization is bad, but I'm saying, can I have something else added to it so that I can add more protection to it, right? It's, it is what is called leakage. Leakage is a term used whenever you have all these distributed deep learning and stuff leakage basically means that someone can still figure it out what that raw data is. So like the amount of, if you have a lot of leakage which means it's really bad. The old reason why you started with gets you're not solving that original problem. Because the original problem is I don't want to have access to the raw data. I don't want to give it to you because if someone takes it, it's bad. So you don't want to introduce a process which actually does that to you. So one of the ways anonymized but that was the initial thought process but there was a paper which again the MIT team released which is fantastic on how do we solve this whole leakage problem in a better way? You know, like can we do something better? So I'm going to kind of go into some sort of layman terms so that I explain this in a way it's understandable. So what really happens is that it's very clear you have the server. Say imagine you're doing like some sort of a classifier, right? And you know, I see most of the people here are experts that there's one of the last function which is called the categorical cross entropy which is widely used in most of the deep nets. And that's what happens here. So basically that's the last function. But if you look at the goal, I have raw data, right? And then I have like multiple layers which is actually happening. Like say basically extract, do something, do features, new layers and stuff. At some point of time, this is taken over by the server and stuff, right? What the motivation is that if I have a layer right next to the, I'm simplifying it but it's not really for the way. But so you have raw data and you have another layer on top of it. What is that you want? I want no correlation between this raw data and the next layer, whatever data to be very different. I don't want to have any kind of a notion of similarity between these two data which means if I have access, I don't have access to the raw data maybe and then if it's access to some layer, I can actually figure it out what the raw data is. You always need to make the assumption that, okay, the hacker will get into it. Which is, you know, that's how cybersecurity starts with the saying that you're always act, you know. So that's a problem, right? So which means the motivation is that you have a raw data which is a layer and then you have the next layers. You want to make sure there's the correlation between these two data pieces. Just in the form of data is very wide in the sense it's a negative correlation. You cannot really, there's no similarity. So the similarity needs to be very far away. And there is this thing called log distance which is kind of not the Pearson. Pearson is very linear. This is like the log distance. I don't, you know, we can talk about the part but the idea is you want to come up with like a log distance correlation which is non-linear but at the same time which basically so it's, in other words, now you have a cross loss function share which you want to minimize and there is this log distance correlation. Again, you want to, this correlation to be, you know, negative as much. So it's basically, you know, you want them to be very far apart. So now you have two loss functions to basically optimize and once you include a diet, your data leakage kind of drops down quite a bit. So you can imagine like, so this, the reason why I am like bringing this up is you can imagine how fast this thing is evolving. And this data leakage, I think the paper was done in like March of 2019. It's like a few months old. So it's basically a great, if anyone is interested in this area, it's a great area of research and it's been done at MIT. So of course they've patented it and stuff but still it's a great motivational thing to start on what do you want to build? It's not just for SAS application. This opens up an old new world of doing ML and privacy. So you can imagine the data leakage depending on the, as number of epochs go, it actually goes down and the distance correlation kind of goes down which kind of technically says that it's doing its job and making the draw data not similar to the next layers and stuff. So even if there is a hacking happens, it kind of prevents it. So it's a very nice feature to have and date and this whole leakage thing itself is like pretty wild. And good thing about split neural, one more thing which probably some of you have, my questions is that, hey, can I apply to different scenarios because the scenario I talked about is you could do obviously, if you have labels then you do supervised. You don't have labels, you do unsupervised but you want to understand whether, there could be labels which sits in the server. There could be labels which are sitting in the client. The paper talks about the multitude of configurations and how you can actually put it. But the idea is to have a cut layer but how do you put the, where do you put the cut layer kind of changes and it basically gives you what I talked about was a vanilla configuration but you can have multitude of ways to put this mix and match so that you can make use of where the data. Because that's a problem, right? Once you want to start doing it, sometimes the clients have all the labels. Basically say someone has cancer and that's in the patient's report, that label is sitting within the radiology report. It's not outside. So all the labels are within the client. So that's one configuration. But there are some other cases where the server holds all the labels. The labels are sitting in it completely different. It's not within the specific clients, it's there. I can't think of a scenario but I'm just thinking. So it's very different sets of scenarios which does exist and the beauty of this is it gives you a lot of different options to do it. So it's kind of gives you different flavors. So it's not restricting you to okay, you have to do this only this way. You have different ways to do the split neural network depending on what is the need you have and stuff. And where the data sits, which is very, very important because again, there is no way you can guarantee where the data should be and where your label should be. You cannot really do that because your customers, the customer data could be in different way and you still need to have this flexibility and change the configurations in different way. And the paper talks about a lot around it. So you guys can go and take a look at it. So I mean, that's pretty much what I had but the whole notion of this is I want every one of you practicing machine learning to start worrying about privacy to a great extent because it's coming. It's because of all the restrictions and things like that, data value. The view is to start with the process. It's kind of very interesting in the world of machine learning. Initially we said we don't have data. Then big data explosion. We have all the data in the world collected. And then we said, okay, now we can increase the computation. We have Hadoop and stuff. And then that actually led to saying, okay, we need to have a better methodology. So we are like kind of going back and suddenly we are like talking about, you know, we said, okay, we need to bring all the data in order to do machine learning. That was the original thought process which has think about it because there's initially no data and then there's a lot of data. Okay, I want to bring all the data in one place. Let me do machine learning. Now things have started the other way around because nobody talks about machine learning or like privacy and things like that. No one cared about it actually, honestly. We were just doing it and we were like going. And suddenly these things come back and now it looks like, oh, I want to distribute data now. And I want to also, I let the data be varitors. And then I want to, it's just the reverse notion. It's like we have come a full circle, right? If you think about it, like we started with, okay, I want to get all the data in one place. Now we are saying, keep it there. I'll come and grab and do whatever I want. So this is going to be kind of the way, like more and more customers, especially the number of data breaches we have and the number of billions of dollars lost in. And it doesn't matter how much secure and things like that, everything can be breached. But at the same time, all these things forces the machine learning community to be, the first question people ask or the legal people come and say is, how safe is the data? You can do machine learning, it's all exciting. So I want to do artificial intelligence, change the world, but I won't give you access to data. So it's coming. So, it's basically my talk is just about motivate you on what's happening in the current situation so that you guys can get exposed to it. That's pretty much what I had. So thank you. Open to any questions. Federated deep learning, pretty much. Its comparison is with, so the whole motivation of split neural network started with federated deep learning. But they figured out what are the drawbacks of it. When federated deep learning was introduced, it was just, people are like, wow, this is just totally against how deep learning is. I mean, ever, this is not that, if anything about machine learning, I want my data and then I want to get going, right? This is different. So like it started it, but at the same time, he tired as I showed you some of the examples and differences that this actually has some nicer properties than federated deep learning. It's not like you cannot, one is there was another talk on federated deep learning. Of course it works in many of the situation. It depends on what's your situation. And the classic example I gave is like, say if you have 10,000 or 1,000 customers today, I really wish to have 10,000 customers. But I don't want my machine learning framework to not scale with it. I actually go down. That's a big problem. You know, like when you're doing especially enterprise machine learning, the biggest thing is to, how do you scale from small set of customers to more? And if you have designed, if you are using something which doesn't scale with more customers, then you are in trouble. So, you know, it has some nicer properties in federated deep learning. That's a good question we are. So it's, I mean, I really wanted that we would make progress in such a way that I could actually do a demo. But yeah, the use case is the anomaly detection. Like basically that's one of the things like, where there are some, you know, fraud behavior within our application and the problem is, because we don't have access to customer data, we don't really know what's going on on the other side. But so for example, maybe I'll put it in a different way. Say assume that you have, your account is compromised or something like that, right? Somebody is doing something with that application and your account is gone and it's gone and someone is just doing something. And how do I know? I cannot really know because it's all, someone goes inside, changes some configuration of your system and things like that. If I had raw access to raw data, then it tells me, hey, I was in this state and someone changed the state and I didn't make the change at all. Like, you know, all this information, it's pertaining to the particular user whose code has been hacked. I actually have those information and stuff. And that could be a lot of, you know, help in like generating features for me in order to do it. Is there a change in state and stuff? If the customer is not giving you access to the data, then you never know it. But if I know it, then my machine learning model can act on it and then say, okay, stop when it's actually happening. So he gets into the account and then he tries to do something and my machine learning is basically acting on it. It says, oh, compared to the last one, something else has changed. So I will not allow you to make this change at all. So you can actually block that guy. But for that, you need sometimes access to the actual raw data, which if you don't have, you may not have that feature developed at all. So in this way of doing deep learning, when you're training the model, is this specific to that particular customer or is this a combined knowledge of all the customer data that gets applied to then one customer? So, yes, it's across the board. So, but the reason why we say, imagine if you don't have access to the raw data, the bottleneck is here. There are certain features which you can never develop without that raw data with that specific customer specific data. And that is needed when you're actually making inferences. So that is what is good. And obviously the raw data will not be pushed when the model is being trained. It gets transformed, but that information is available to the model and it can actually do it. So it's picked up from the customer data. And then, see, right now, the problem is training is the biggest problem because you don't have access to the data. Once you're trained, the inference is not a problem at all. That at the point of the inference, you do have access to the product customer data. You see the difference? Once your model is trained, you can put it on the server, like the trained model is running and it has access to all the data in the world because that's not a problem. It's each edge. So similar to the federated deep learning example in the other talk. So you do the infra, the trained model, once it is deployed, does have access to all the data. That's not a problem, but only in the training it doesn't. So, but that's the delta. If you have access to the raw data training cycle, that's not a problem at all for any SAS application deep learning, but the training is the biggest problem. What it solves is the training part. The inference part is not a problem. You put that as part of the application. Application has access to all the data in the world. Nothing, your application will run if it doesn't have access to the customer data, but it sits within the customer space. It doesn't get out. So inference, when you just think about it, separate training and inference, so inference can happen. Yeah. Okay. Thanks a lot. Thank you, guys.