 Today, so my name is Bargo and I am a co-founder of a startup, Binais Labs, based in Bangalore. We do cyber security for industrial IoT. So we use machine learning tools to do that. And I'm excited to share some of the work that we do in this particular space. So the agenda falls in four different steps. First, we'll talk about how currently machine learning happens and the current challenges that it has. We motivate this and introduce what is federated learning. We then give an overview of what tools are there to do federated learning and what challenges and ways forward for this. So the way currently most typical machine learning applications work is that the data gets generated in a number of devices. So these devices could be your laptops, your phones, your data centers, a myriad of devices that generates these data. The data is then uploaded to a centralized server, typically a cloud. And then the models are built. And they are deployed as an API, typically on the cloud. So this is a very standard paradigm. You can use TensorFlow, PyTorch, Cycadlearns, a whole bunch of tools that helps you do this. But then increasingly, we are seeing more and more edge devices that are deployed. So you can think of self-driving cars. You can think of industrial setups, specifically where I do a lot of work in manufacturing plants, in oil and natural gas, energy sectors, where there's lots of sensors, lots of connected devices. Introducing two major challenges. One is that it's not always feasible and possible to do data transfer. So if you are going to work in some remote place which does not have internet connectivity, then there is a challenge. If you're doing self-driving cars, the cost of transferring the data to a centralized server and then having an inference back to the device is quite expensive. So you probably don't have the time to wait for that particular application to give a result at that point of time. The second major issue is in the space of data privacy. So for example, if you have a camera that's there in a shop or in your meeting room, you ideally wouldn't have a trust if it's going to track who's there, who's talking. So if it's going to send the data outside to cloud, there's huge data privacy issues. And then obviously there's a whole bunch of complaints. California Consumer Protection Act, GDPR, this whole bunch of complaints that's coming up which is all revolving around privacy and that we feel is extremely important in preserving privacy and something that's not so widely talked about in the machine learning space. All right, so we specifically do work on cybersecurity for IoT devices. Typical use cases are denial of service. You can have insider threat, a device which is sending data to an external agent. You can have IP thefts that happen. So number of these use cases for which we need to find if these use cases are happening or not. Typical models are anomaly detection models. And these are time series data. And the goal is to find anomaly detection in these time series. We have a number of sensors. These sensors capture a wide variety of information. And they are deployed in a network architecture. So there's a network traffic pattern that gets generated. And the goal is to find if there's an anomaly that exists or not. So you can have these kinds of time series patterns across various devices in a particular network. And you want to find if there is a cyber attack that's happening or not. So in terms of translating this into how machine learning happens, you have many devices and the network configuration. So there's packet level information. You have the host that come in. You have various network level parameters. These get fed as features to your machine learning models. And anomaly detection models are built. A typical approach that we have is that all these sensors that are there are connected to some gateways. So think of these like your Raspberry Pis or many of these gateway providers that as a host of sensors, use Bluetooth to send the information to a centralized network. Currently, if you see most applications, they send the data to the cloud at some frequency. And then the models are built on the cloud. And inference happens on the cloud. And this doesn't always work very efficiently for reasons that I told. An additional reason is that many of the times these devices are moving, they operate on coin batteries. So device life is also something that's very important to be cognizant about. Now, in this setup, doing it on cloud doesn't work, right? So it doesn't work all the time. So how can we use federated learning and what's federated learning to help do that? So we'll talk about three major things in how federated learning helps us solve this. First is in decentralized learning. How decentralized learning work? How does it help in preserving privacy? And how can we have secure computing on this particular framework? In typical decentralized training, this is same as how you would do any distributed computing, distributed machine learning. There's a model that gets initiated in all of these devices. So there's already a pre-trained model and you initiate, you send these models to these devices. So on day zero, these have these pre-trained models. Then on the data that is generated on these devices, the models are trained. So think of this as individual agents, individual agents training their own machine learning models and it's updating the models that it has received. So the models were received in the previous step in past part of the federated construction. And now there's learning that's happening only based on the data that particular device is generated. Now, after the training happens, these model weights are sent, the models are sent to the centralized server. It could be a cloud, it could be an on-prem server, but it is sent to the centralized server where a secure aggregation happens. And once the aggregation happens, the centralized server sends back all of these, the updated model again to the devices. So a simple way to combine models from various devices into one particular model is to just take the average of the weights of the various models and tell that that's the new version, right? So simple averaging, a simple ensemble model and that gets sent back to these devices. Now, this is what anyway happens in distributed machine learning. So what's the problem that we are talking about in terms of privacy and in terms of secure computation? So it introduces two major challenges. One is called an inference attack, right? So if you're going to send only the model updates from a particular device, if you already know what data is coming from the particular device, it's very trivial to reverse engineer what's the data that comes from that particular device. So once you have the weights from individual devices, you get access to that and you know what the model is and you know what the model is after the update, very easy to reverse engineer to see what data came from that particular device. This is solved by what is called differential privacy, differential privacy done in three major ways. The first thing is that the impact of a particular device on the final model that gets published is clipped. So you don't share the model, you share only the gradients and the gradients are clipped before sending. So there's only a range in which in one training iteration, there's only so much that a particular model, the particular device can impact a particular model. So that's one. The second more important one is you add a Gaussian noise to the data before you train. So it's not trained on the actual data itself, but strained on a data plus random noise. So even if it gets reverse engineered, it's not going to learn the actual data that was generated. The third thing is that the final gradient, so generally when you finish the model, you publish the final gradient. Here, rather instead of the final gradient, you only publish the average of the gradients over multiple training steps. So for example, if you're going to train, say, five epochs, 50 epochs, so you take the average of the fifth epoch and the 35th epoch and the 50th epoch and take the average of that gradient and then publish. So these three steps help in making it very robust, very hard to reverse engineer what the underlying data is. So this is a very core concept of how differential privacy works. So this is one problem. The second problem is slightly, is in fact quite a harder problem. It's called model poisoning. The underlying assumption in fredator learning is all these devices are equally important and as the data comes, the importance for each of the data, because you don't know, the centralized server doesn't know what the underlying data is. It has access only to the model updates, so it updates the model. It's possible that there is a rogue device that is sitting in the network and is publishing and sending poisoned model updates, which creates a backdoor to the model. So think of this like introducing a new class or heavily influencing one particular class in that particular model. This is a really, it is a very hard problem to solve. There are few approaches, but this is more active in the research stage right now on not many open source tools available for solving those two ways in which it's solved using adaptive learning. So you learn for each of the device you adaptively learn. The second thing is to use something like a generator adversarial network and then do, but then again, as I said, this is more research focused and haven't seen good solutions working out for us as of now. Sometimes to compensate for this, you might be better off actually sending the data to the server itself and then have the centralized server handle it, but you don't want to send the actual data, you want to encrypt the data and send. So there are ways in which you can send only the encrypted data and the model is trained on encrypted data and the models are updated on the encrypted data. So one common way in which it's done is homomorphic encryption. So this is something that's quite popular in terms of doing this. Why, which applications would this be useful for is that the two kinds in which ferreted learning happens. One is in terms of single-party ferreted learning, which is typically what most of the applications that I do are single-party applications. The other is multi-party. Single-party is one where there is only one company, one agent who is responsible for the data governance. So you might think of this like, say I specifically work for say one automotive client and the automotive client, the security officer is responsible for all the decisions that happen there. So we have one centralized server which is responsible for data governance. So most applications currently are in this space. The more exciting thing is in multi-party ferreted learning in which you can collaborate with different organizations. So if you want to work with multiple organizations, you want to share your models across multiple organizations, multi-party ferreted learning is the way to go. There are two kinds. One is called vertical ferreted learning. The second one is horizontal ferreted learning. Consider two organizations, organization A and organization B, who are in the same space. So for example, these could be two banks. So for example, they work on different customer base. So they do the same banking operations. So for example, say they do credit cards, but they work on different customer base. One way in which they might want to collaborate is if they want to share features amongst themselves. And horizontal ferreted learning allows you to share features across organizations. So that's the model is shared across organizations which is predominantly feature-driven. It could also happen that there are two different companies which are working on different kinds of problems but have the same customer base. So it could be think of a bank and an automotive company. So similar, same customer base, but they work on different problems. In that case, you might want to share customer information but in a secure way in which you cannot reverse engineer what data is used from the other company. So that's vertical ferreted learning. Okay, so we talked about what ferreted learning is, tools to do that, two major frameworks exist currently to do right now. TensorFlow, which has TensorFlow federated and which does decentralized training. And TensorFlow Privacy, which helps you do differential privacy and it's pretty straightforward. So if you write any TensorFlow code or Keras code, it's just an optimizer which you call a differential privacy optimizer. It's literally like one line of change in your code to get this working. The other framework is PySift, which is created by OpenMind, which is based on PyTorch, which out of the box does encryption, differential privacy and decentralized learning. If you want to deploy, again, PyTorch just announced PyTorch Mobile earlier this week. Haven't tested it out. Most of the things that we do are based on TensorFlow Lite and the ARM processors tool called U-Tensor. U-Tensor is a lightweight inference engine for IoT devices. TensorFlow Lite also works really well on IoT devices. In respect of the framework you use, the typical practice is to do three things. First, you have a model in which the model is huge. You want to, after you train the model, you want to prune the model. So you want to find a way, a more efficient representation compression of your network. That's one. The second thing is quantization. So you might want to learn shared weights. Most of these libraries do it very, very easily right now. So quantization is, say, for floating point, instead of doing 32-bit, you might want to use five to eight bits as representation. Shared representation helps you reduce your model size. The inference technique, as of now, is always this, that you take your model, you find the graph representation of your network. You export it as a C++ code. TensorFlow Lite does this quite easily. U-Tensor does this. And then you copy this C++ code onto your Edge device. So that becomes a binary that runs and does inference on your, say, microcontrollers. All sudden then it has its own set of challenges. Three main challenges that I tell is one is that the data is not identical and independently distributed. Major challenge if you want to do statistics around it. The second big problem is that when you have centralized data convergence happens faster because you have access to a lot of data. If you're going to do it in a piecewise manner, spread over multiple devices, convergence does take a lot of time. It does take weeks of time to converge. So once you initiate the model to have a very good model that's working in production, it does take a lot of time. While there's been a lot of significant progress in the community, in having deployment on Edge devices, it's still in its infancy. The number of options that we have, especially on inference on low engines, most complex models don't work. You need to find very simple models. So it's still a big challenge when it comes to deployment. It's summarizing, feed data learning allows you to have better model accuracy. It has very low latency because it's going to be on the device. It has very low power consumption. It reduces your network load. This is something that we are very excited about that it's privacy preserving. As we talked about different types of feed data learning, it can be used across organizations and with pre-trained models, it can be used immediately. We have one repository which we are doing this in open, but just this whole bunch of repositories with OpenMind and PySuf and TensorFlow Federated which has some examples on how to get started. So we are building a small community to help do more privacy preserving machine learning if you're really interested. That's my Twitter handle. Please reach out and we'd be excited looking forward to talking to you. Thanks for having me here. Thank you. Thanks so much. Bhargava, we have time for one question for us. That was a nice talk. So since you said you have a lot of devices integrated, so what are the data preprocessing techniques you follow? Since each data will have different formats of data, and what are the data you consider for training your model? So specifically, most of the applications that I do are on network traffic. So in some sense, we know the domain on what kind of network traffic comes. So it's going to be packet level information, bytes and bytes out, upload, download, ratio. So we would have what firmwares are there. So the kind of network level information is something that we have a good handle on. So if you have any device, if you have a laptop, what information it sends via a browser, so if you have a device, what information we send are things that are fairly well understood. So it makes it easier for us to know what kind of preprocessing that needs to happen. So some of the things happen on the sensor edge devices. So sometimes we do it on the gateway. And very occasionally we do it on cloud. So that's typical processing that we do. Okay, I'm going to give you one more free question. So who wants to ask one more question? Can you take us one more question going once, twice? Twice? Gone. I didn't miss anyone, right? Okay. Okay, thanks everyone.