 to everyone at our talk at the OpenAI plus data forum. I'm Shwai and with me is Rishith and today the topic of our presentation is federated machine learning with the help of givenities. So a very quick introduction about ourselves. I'm Shwai I'm a developer advocate at Millie search and also contributor at layer five, which has a few projects like misery that come under the CNC landscape. Over to you, Rishith. Hello, I'm Rishith. I'm a high school student and an incoming student at the University of Toronto. I'm pretty excited about machine learning and most of the time we'll find me creating machine learning open source projects or contributing to projects including but not limited to Kubernetes, TensorFlow, and more. Well, over the past decade or two machine learning has kind of become the central point for a number of different applications around a huge variety of domains, and it's probably unimaginable to find a domain where machine learning or data science might not actually being used. So, fields ranging from healthcare to autonomous vehicles have been transformed with the help of machine learning techniques and machine learning importance specifically within these real world applications has brought to us, brought us to a really new field as well. And that's MLOPS, because today machine learning is not just for use case like you know being able to write a model and then use the inference machine learning is also being used in production. And that essentially brings the point of how we can actually leverage the use of machine learning with the help of DevOps as well to make machine learning production ready and make it scalable at the same time. And that's where being able to actually run machine learning not just in larger applications, but also in very small resource limited applications as well. And that's what we are actually going to be exploring in today's presentation. Over to you. So, interestingly, it might be, it might be a really nice time to put it over there that a lot of data is created on the edge, take your standard smartphones for example, and if you think about it, a lot of the data that you might want to train your machine learning model with is actually created on the edge. And you would want to be able to leverage all of these data that is created on the edge. And traditionally how machine learning systems have hope is you have an edge device, which has all of this data, and there is a central model on the server. The model is being run on the server, it sends any predictions to your mobile device, mobile devices in our case, a representative of any edge device. So it sends the predictions to your mobile device and the mobile device with the data that is created on the edge would probably send feedback to the server, and server would probably re-drain the model. So this is a very standard approach that has been used for a lot of time. And this has worked for a lot of applications as well. And you just scale it out and you get multiple edge devices, multiple feedbacks, the server is retraining on all of that data. But with the centralized approach, there also come the questions of how good is it in terms of latency, because each time you are making a network call, you also require network for that, because you're probably making an HTTP call. What about privacy? Sending user data directly to the server so that in terms of feedback, so the server could re-train on it. And this is also highly power consuming. So all of these shortcomings with the traditional approach is what led to something we call machine learning on the edge. Yeah, and that's where we talk about how our primary goal actually being that how we can actually balance the accuracy, but also at the same time, we are aware of the resource constraints that come with machine learning on the edge. Because of the factor of limited resources and compute power, the goal is to optimize the accuracy with the runtime resource consumption as well. Now, of course, typically you might look at this problem with a couple of different steps. So the first approach that you could actually take is to actually take the existing larger models and then try to compress them down in size that are more better focused towards being able to actually run on the edge. Now, one of the other techniques that comes to mind is essentially kind of a bottom up approach where we take new math and we actually build these machine learning models from scratch where your predicted classes are actually specially designed for these resource constrained environments and they are much more well better suited for smaller competition footprint. But of course, the approach that we are seeing today is that how we can actually take the tasks that help to compute our machine learning models on the edge by first essentially taking some of these tasks that can actually be computed on the edge itself while sending some of the more resource intensive tasks directly to a cloud data center. So being able to actually combine both the being able to actually do the computational directly on the edge device itself while sending the rest of the data to a cloud server where it can be done there and then of course you're managing the results that come from the cloud and then are interpreted on the cloud itself. And this actually helps us to use this in a number of different real world scenarios. And there are actually some existing applications that are utilizing machine learning on the edge including things like a lot of voice devices such as Amazon Echo or Google Home and then of course they're being utilized in a number of different educational and healthcare applications such as being able to actually utilize machine learning on the edge with predictive sensors. And in the future, we might see being able to actually use machine learning on the edge for things such as being able to manage your patient heart, heart levels, glucose levels, being able to use them in cameras, motions and sensors as well. And of course the main idea is to actually process most of this data locally on the edge to also provide more safer inference as well. This paradigm talks about inference on the edge and well that solves one part of the problem but today we'll be talking mainly about as you might have guessed on the title and now we come to federated learning and we talked about how we went from centralized training to inference on the edge but federated learning takes a step forward from there not just inference on the edge but we pose the question can we do the training on device because in the inference on the edge scenario still we'd have to it only solves a part of the problems. A lot of the data that is created on the edge needs to you need to train a model with that. So, okay, so how do you do training on device, because training is one of the more computationally intensive tasks. And you could, and there are a couple of problems you face with, let's say, just taking all of your training code which was on the server earlier and putting it on your mobile device. Mobile device is the representative of any edge device in for the purpose of the stock, but just putting your algorithm which was used to train a model on the server on a mobile device might not work due to a couple of problems. First and foremost, there is often too little data. A single edge device will often create a lot of too little data to train a proper machine learning model. And there also comes a problem that other devices are not contributing. So, you probably have deployed your machine learning model to multiple edge devices. None of those other devices are contributing to the improvement of the model. Each of them are probably training their own model in this setting. So, the answer to all of these questions and also the answer to can we do the training on device question is federated learning and incomes federated learning. So, a very high level overview of federated learning and then we'll of course move to demos seeing this in practice. But what we want, but in the case of federated learning what we want is multiple clients or edge devices to collaborate and learn a combined model, because you also want probably the other clients to have an impact on the model and edge devices running. But how and this would probably be coordinated by central server, but then comes the question, how do you preserve the privacy because you don't also don't want the raw data to be shared. So, that is something we need to take a look at. And let's see how we can not share raw data but still do federated learning and do the training on device while also being able to use the model being trained on data by other devices or other clients. So, the way this works is, well, you have an initial model on the server. And you also have that initial model on your edge device. The edge device trains the initial model with a little bit of data it has. So, there you have the locally trained model trained with very little bit of data that is collected on the edge device. And this is a very simple task so you can probably do it on on the edge device very easily. And right now let's just say the locally trained model is being shared. But what actually happens here is the updates from the initial model to the locally trained model are actually being shared. But if you see that the raw data is not being exchanged at all. So, the locally trained model would come to the server and you'd probably have this on for multiple edge devices. All of them would send in their locally trained model and no exchange of data. And you'd probably apply some aggregator function to create a collaborative or combined model using learning from all of these. And if this seems like too simple of an idea, well, you have to probably do it multiple times to get to learn from the data collected by all of these. So, the combined model now becomes an initial model and you repeat this process again. So, it is pretty interesting how federated learning in a moment just tackled all the problems we had with being able to train on the device while maintaining privacy, speed and all of that. So, for the motivated reader you could actually take a look at this paper from 2018 about how federated learning is being used for Google Keyboard. It is a pretty interesting paper on how they use federated learning in Google Keyboard at scale. So, that might be something interesting to see for the motivated reader. So, what we'll be talking about today as the demos will be seen today will also be about TFF that is TensorFlow Federated, which is an open source framework for machine learning and other computations and decentralized data. So, we'll also be taking a look at TensorFlow Federated and though we talk about machine learning all this time, you are not limited to machine learning. The idea of federated learning, the idea of applying computations to distributed data without sharing the data to a server without sharing or without getting the data out of an edge device is not just limited to machine learning. And there's a ton of things you can apply. The ideas we talked about today for one, analytics is a very booming field where federated algorithms are very commonly applied and it's well not machine learning. So, you can definitely think the ideas we talk about and we'll show in the demos to computations which are not machine learning. So, we'll be taking a look at TensorFlow Federated as well as how you can use Kubernetes to simulate multiple devices in the demos. So, now we come to the interesting part that is demos and let's get on to that. So, now we'll be looking at a federated learning demo and we'll try to create a federated machine learning algorithm using TensorFlow Federated. And this will not be a deep dive and we're essentially just using the federated learning API is to give you an intro to federated learning. If you are more excited about federated learning or probably want to add in your own computations, you could definitely build on top of what we show in this demo. But due to time constraints, we'll actually just show a minimalistic demo of training of federated machine learning algorithm. A minimalistic but complete demo of training of federated machine learning algorithm. So, let's start out by getting the input data. So, in this example what we'll be doing is we'll be using the MNIST dataset and there is essentially an E-MNIST dataset which is built for federated learning. And you'll see why this is built for federated learning. But something we'll be doing is we also want to see how federated learning datasets might be different. And so this is just a famous MNIST dataset but in a federated learning environment. So, the MNIST dataset is essentially a set of hundred in numbers and your goal is to create a machine learning algorithm to classify them. And what we want to see is there are, so in our case we now have a lot of different clients. Each of them have their own dataset locally collected dataset because as we talked about a lot of data is actually created on the edge. So, let's take a look at the datasets we have and if you see like all of these datasets, all of these different colors of graphs of bins actually represent different numbers because it's a hundred and digit dataset. Not all of them have the same number of examples and this is very much expected because not all of your clients or edge devices would have the same kind of data. Another is this example. So, this is like a mean of all the images in the dataset and just as an example, if you see the dataset that client one has two over here and the two on the dataset that client two has is very different. Those are just two different styles of how people write to us and this is because the data is collected locally. So, as a user everyone would have different styles of writing to and so the federated learning datasets for all of these clients are actually pretty heterogeneous. They are pretty different in their style, they're pretty different in the distribution of the data and not all of them even have the same number of examples. So, this also in a sense just shows you what we talked about why is there a need for collaborative learning, why other edge devices should also contribute to the machine learning model. Well, just to show you the datasets on different devices are actually different. Next, we want to pre-process the data and this is essentially very simple steps. What we are doing is creating a set of two sets of tensors. Essentially, X tensor which contains the pixel values of the image and a Y tensor which contains the labels. So, this is a 28 by 28 image. We are just converting it to 28 by 28 image to a 1 cross 784 image. You already know you can simply convert a 28 by 28 image into a 1 cross 784 image very easily. So, that is what we are doing over here. We are also adding them into batches. So, we are also adding them into batches of datasets. So, that is all that we're doing this pre-processing set. And if you see, yeah, this is actually one batch of dataset we have from a particular client. This is one batch of the dataset. We have the X values. All of these are a single image. This is a 1 cross 784 image. This is a 1 cross 784 image and so on. All of these are the Y values which just represents what is the number of... Oh, what is the number of... Oh, what number it is in the image. So, a very simple event till now. And we'll just pre-process our data in this sense. What we do next is... So, if you remember we earlier put in that we want 10 clients. So, what we have is essentially 10 different datasets. So, another thing I just wanted to put out over there is in an ideal scenario, you would probably have multiple devices, probably a thousand devices. And then you would choose 10 or so devices out of it. But in our case, we just have 10 datasets. So, we are assuming that the selection process is already done. Ideally, you would want to do the selection process somewhere, let's say a mobile device. Again, this is the representative of each devices. But let's say when a mobile device is being charged and is not on a high-speed Wi-Fi, connected to a high-speed Wi-Fi, that might be the best time to train your model and send out the updates of the model to the server. Since it would be not a very big constraint on the mobile at this time. It is already being charged and you also have a decent connection. So, ideally you want to select your devices this way. But in this example, you are thinking that, oh wait, you already selected your devices and now you have to send datasets. The dataset to a model part would be done on the client itself and the client would then just send the updates. So, we'll get on with that part and we'll just think that there are only 10 clients. All of them have their own dataset, which we just created. So, now you can actually define a simple curious model. So, of course, you can modify this as well. And one of the things you can also do is not just use this idea for federated machine learning but all kinds of applications and distributed decentralized data. So, you can also implement them with TensorFlow federated. You can implement other kinds of computations or distributed data as well. And you don't necessarily need to have a curious model. You can customize all the computations and go in as well. For the purpose of this example, we'll simply use a curious model, a very simple neural network. And we'll wrap it up as a tff.learning.model. So, this method actually creates a TensorFlow federated model out of this. So, for a TensorFlow federated model, it will take in a curious model and then create a TensorFlow federated model out of it. Or you can think of it as a wrapper for our simple curious model. Of course, this can be any model you want. Here, it's just a simple neural. So, the other important part was the federated averaging algorithm. So, if you think about it, we actually have two optimizers over here. So, as you might have already guessed, one of the part of getting the updates from the trained model is on the client side. On the client side, you train a model, then get the updates and send those updates to the server. The server then also has to create a new model, which is represented on the updates which has gotten from all the other devices and apply them to the initial model it had on the server. So, it needs to apply these updates to the initial model it had and that would be on the server, which is why we have two optimizers over here, two different optimizers over here. So, as an example, if you just see... So, the TensorFlow federated actually puts all of these into a single iterative process. And what you can also see is... Let's say this is for the... So, this is a simple example of what things are there on the server. So, the server essentially has a global model, the weights for the global model. It has the distributor, which would be required for the client-server interaction. It has the aggregator and then the finalizer. We already talked about these components earlier and now it might come properly into the picture if you remember the diagram. So, that is what we are doing over here. And so, we also have... So, we can do a simple round of it. So, we can do a simple round of it using .next. That runs a simple round. So, how does a round go? A round goes... If you remember the very first image when we saw federated learning. So, a round takes the initial model, gets the updates from all the different devices, and updates the initial model on the server. We probably need to do this multiple times, but that is how a single round is here defined. So, we actually ran a single round and as I talked about earlier, you probably want to do this multiple times, right? So, that is why we'll just put it up in a for loop. Yeah, as simple as that. And run this round for multiple times. If you actually see over here, if you see the losses over here, the losses are actually decreasing and the accuracy of the model, the model being able to classify an images is actually increasing. This is just now for 10 rounds. You could do this for a lot more rounds and it would, of course, increase by a lot more. Let's run this again for 10 more rounds. So, if I just run this again for 10 more rounds, I can see that the accuracy is still improving. I just run it for a single round. Sorry. So, let's run it for 10 more rounds and I can see the accuracy is actually increasing. The losses are actually, the loss values are actually decreasing. So, the model is improving over time and this is happening from 10 different clients. None of them are actually sharing the data they collected on device and we are still able to create a better model, taking the learnings from all of the data that is collected on device. So, this is a great introductory example to federated learning and next up we'll see how we can run this simulation in a Kubernetes cluster. So, next demo we'll see is how you can simulate this for a number of devices in a Kubernetes cluster which also makes it easy to simulate this run federated learning algorithms. So, we'll actually use a pretty similar code that we used last time. It's essentially the same neural network and essentially everything is almost the same. So, you might see that this is a pretty similar code as the last one. So, we'll use the similar code as we did in the last time but this time around we'll also try to simulate the same model, the same training of the model in a Kubernetes cluster. So, let's try to do that and the first thing I'll do is I already have a Kubernetes cluster. You could create this anywhere you want and there it is. I have three nodes on my Kubernetes cluster already up. So, first what I'll do is I'll create a deployment for the TFF workers and this will use the TensorFlow federated and they're also providing image. So, you could do remote execution very easily. Essentially, you remotely execute the federated learning algorithms that we'll probably write over here. So, every time like you make a round, all of that will happen on different instead of the clients that we had on our own system. I'm not even clients just different datasets. This time around we'll have a proper simulation to do that in conflict as a simulation in a Kubernetes cluster. So, we'll create this deployment to start off with and this has actually been created. So, what I'll do now is also create a load balancer and we can probably use this. We'll now use this to run our federated learning algorithms. So, this shows me that my load balancer external IP is still pending. Okay, now we have it. So, let's get our IP and we'll actually use CRPC to run our code so I have to run the federated learning algorithm just as we said earlier. So, there it is. I have in my IP address and I'll actually do the same example, create 10 clients but this time around it will be much like a proper simulation. So, all of this happens in a Kubernetes cluster. So, let's run this and now we'll run the evaluate function. So, what the evaluate function is is essentially running the one round. So, the evaluate function runs all of these rounds and let's say run it for 20 rounds. So, we'll now run the same algorithm. We had earlier for 20 rounds. So, let's execute this again and then we'll just call the evaluate function and this time around, as I told, this is actually running like a proper simulation. The evaluate function for you for 20 bucks, it was taking quite a bit of time so I just split it up and there you have it, a simulation of the same code, the same model for 20 rounds and Kubernetes. So, that was about the second demo. All right. So, thank you so much for attending our talk. So, in case you have any questions, do feel free to reach out to us on our Twitter handles how to help and thanks for taking and in case, of course, now we are open to questions. So, we'd like to take up any questions and of course, we would love to see you next year in person at Open Source Summit Latin America. Thank you very much.