 Hello, everybody. My name is Henry Zeng. I'm from Myanmar, China. I'm based in Beijing. I'm happy to be here to deliver this talk about federated learning and unlocking the values of distributed data for enterprises. I have to shoot another speaker, Lane Pung. But unfortunately, he didn't get a visa to enter Canada. So I will be the only presenter today for this talk. Just a little bit background about this. So I'm working in the Octo of the VMware R&D. I'm currently the director leading a few efforts, mostly open source projects. I'm currently the TSC board member of the FATE Open Source Project and also the creator and maintainer of Open Source Project Harbor and CSF graduated project. And I wrote a few books on Harbor Authority Guys and the blockchain technical guys. And also, Lane Pung is a staff engineer also in Octo VMware. And he's a TSC board member of the Open Source Project, OpenFL, and a maintainer of FATE. So we did a lot of federated learning and open source projects. So today, we'll go through some of the key points there, as well as giving some of the thinking on the recent privacy protection and the data protection and all kind of stuff, as well as the large language model that we did by using the federated learning technology. So first thing first, what is federated learning? I'll just give a quick concept, a quick intro of the concept. So the idea of federated learning actually came from Google back in 2017, when they're trying to improve the user input method of the mobile devices. They want to predict what the user will input as the next word. So they use federated learning by some computation of different devices or different users. And then they collect those data. They use those data to do a distributed training. And then we run a globally aggregated and have a globally optimized model and send back to every devices. So all of the devices will eventually have optimized the model that can predict the next word of the input method. That's originally the idea that came from the mobile device. And later on, the idea has been used in different areas. This is from the more formal definition of formal description from Wikipedia is that federated learning, basically, we can have a central server that can send out the data to different places and then send out the model to different places as the initiation of the model. And then each of the devices will have the local training using the data locally and doing some computation. And then after a certain round, the data will be aggregated into the central server to form a combined optimized model for the overall clients. And then eventually, this optimized model will be sent back to each of the devices and then repeat the step 1, 2, 3, 4. And then it will become a globally optimized model eventually after the convergence. So let's give this animation for some illustration of the idea. So federated learning can be think of as a paradigm shift of moving the computer to data. So on this animation, we can see that ship here. If we want to raise this ship, we need to feed them with grass from different grasslands. So here you can see that the grassland actually, the grass need to move out of the grassland and to put into a place to feed the ship. So if we think of the ship as a model that we want to train, then the grass can be as the data that we are using to train the model and the grassland as the organizations that own the data. You can see here that each of the organizations need to bring their data or the grass to the ship, to the model, feed the model, and then gradually growing the ship. So in this way, the ship is gradually going and eventually, we get something useful as the model. So in this paradigm, we can see that it is the model does not move, but the data actually moves because they're moving the data to the compute. So obviously, we can see that the grass needs to be moved out of the grassland. If the grass is data, that means that the data needs to go out of the boundary of the organization. That means, obviously, you can think of some of the concern about leaking the data, leaking the privacy information. So people are thinking this is not a good way to do the training. So they're doing it in another way, or the other way around is that moving the compute to the data. So in the animation on the right-hand side, we can see that the ship actually is going to different grassland to eat the grass. And the grass will never go out of the boundary of the grassland. It always stays inside the grassland. So in this way, the grass will never go out and then keeping the data inside the organization and then remove the risk of leaking the data out of the organization. So in this way, the ship can always can still be in growth. And then just that the ship can just need to move to the different locations to eat the grass. So we call this data moving to the, so the ship, the model moving to the data and the data does not move. So this is the idea of federated learning by changing the original or the traditional paradigm of moving data to compute by moving the compute data. So there are two reasons behind the federated learning. One is to preserve the data privacy and confidentiality of the data. The other is to reduce the communication cost for doing the training. So suppose we want to move the data to a central place to train that it's very hard because there's a large amount of data to be transmitted. So in that sense that the data need to stay locally for the computation. So most of the two reasons behind any federated learning. So as I talk about just now, federating originally was designed for devices distributed everywhere so that they can provide local user local data to train a global model. Later on, this technique has been used in enterprise, in data center, multiple clouds in order to do federated learning by utilizing the data in different clouds or locations and places. So hence the federated learning can also be used in an enterprise setup. So suppose for the federated learning can be used inside an enterprise. This enterprise has multiple locations of data. For example, in multiple clouds or in different geographic locations, or in some cases, for example, they may have data in locations in different countries, especially for any multinational company. They may have data everywhere in every country. But usually the data from different countries, they have some laws and regulations that prevent data being sent out of the country so that they need to have local computation inside a country. And then after they perform the computation or the training, they eventually can aggregate the data or the parameter or the gradients of the model and collectively into a central place to form a global model. That's how we think that this federated learning can be used by an enterprise. The other ways to multiple enterprise can actually form a federation that they can use that federation to train a model. That means they cross the organization in voluntary, they collaborate with different organizations, cross their location and devices and so on. So in this setup, the federated learning will have much requirement, a higher level of requirement of the security. So usually the data here is encrypted. Or not the data or the parameter or the gradients of the model is encrypted. There are generally three kinds of federated learning, three categories of federated learning right now. The first one is called horizontal federated learning, which means that the data coming from different parties, they share the same feature space but with different samples. For example, two banks, they may have different users but same features. The data of different users, but they have the same features. Feature, I mean the dimension of the data of the user, such as each of the user may have credit score, history, age, occupation, income, and so on. So from a traditional database point of view, this data can be think of as aggregated on a row by row basis. That means the horizontal federated learning. That means all the data can be thought of as aggregated like a row of the data. In contrast, there's another way of federated learning. It's called vertical federated learning, where the data from different parties, they share the same samples but with different feature space. That means that, for example, a bank and another company like Energy or E-commerce company, they may share the same user. But the user's data or the dimension of that user in different companies, they're quite different. For example, the data in a bank or a particular user, they may have credit scores. But in another company like Energy company, the same person, they may have different data, like usage data, service purchase data, and so on. So by the sense of combining the data, not by using the data together, we can have a more dimensional data of that user profile. And then based on that kind of data, we can train a more accurate model for use in our applications. So in a relational database point of view, this kind of virtually can be thought of as joining the data vertically by column. That means that, so let's call the name, we have the name vertical federated learning. The last is just the federated transfer learning, which that one of the data set, they may not have the label. So we need some transfer learning technology to create a label in order to have the federated learning. So these are the three types of federated learning that are used. In most likely, people can start with the horizontal federated learning, having the same dimension or same structure of the data, and vertical can be used to a complementary data set from different vertical. So we can see the federated learning being required a lot of technologies to use. So normally we can just start from any existing federated learning frameworks, mostly like open source projects that available like in the currently, in the FAI and data foundation, we have open source project called FATE. Substrate and OpenFL. They are all the open source projects that are related to federated learning. They provide different functions and also have different user scenarios. So our team actually participated in two of the projects, FATE and OpenFL. So I'm going to introduce a bit about how they can help us to do the federated learning. The first one is called FATE, F-A-T-E. It's called a federated AI technology enabler. It was currently hosted under Linux, LS, AI and data foundation. Right now we have a very large community, 4,000 plus engineers and developers and for 4,900 plus GitHub stars and also about 20, 200K line of code and so on. So this is the first open source community of federated learning in the world and provide the industry-grade platform for the privacy computing and also the federated learning for developers and contributors. Here's the path of the development of the projects. It was first open source back in 2019, February, roughly four years, more than four years ago and it was donated to Linux Foundation in 2019, June 2019. It was originally developed by WeBank in China and then later on it donated in the June of 2019. And after that you can see that there are more than 40 releases in this open source project. And most recently we have the 1.10 and also the 2.0 for the interoperability of different federated learning systems. And just last month we released a feature it's called a FATE LLM, large language model. This is also helping people to build up large language model fine-tuning using federated data. I'll talk about it a little bit because it's very people, many of you have great interests in this LLM area recently. I'm here just some of the data that from the Linux Foundation. We have the contributor strand you can see here. We've grown the contributor size by more than 200 and 18% in the last three years. And in terms of the commits, we've grown it like 254% in the last three years. So you can see there's a lot of adoption and contributor and participants in this project. And here's the framework of this FATE project. So underlying you can see the TensorFlow and PyChorge as the deep learning framework supported. And also we have a compute engine of the ACRO and Spark for the distributed computing frameworks. So if you want to perform local training, they can leverage all these technology, open source, TensorFlow, PyChorge, Spark and so on, just to build up your local training capabilities. And then we have the multi-party federated communication layer that help us to perform the task of the federated communication. And we have built up in building six kind of secure protocols like the Palea, the half homomorphic encryption, secure sharing, MPC, oblivious transfer, secure education and so on. So this will be the foundation layer for the underlying security protocols that are implemented in project FATE. And then on top of that we have about 30 plus building algorithm for different federated learning six scenarios like the vertical FL and horizontal FL and also federated deep learning. Basically it can just use it out of the box by using something like the linear regression, decision tree or any CNN or the deep learning stuff that can directly you out of the box. And on top of that, we have the different repos on GitHub that actually can combine all these underlying layers to build some useful tools for the users like the FATE flow is a pipeline management framework that help people to doing the deck parsing and having defining the pipeline of the training and then eventually perform the federated learning jobs. And the other one is called Kube FATE which is a cloud native operational tools for operating a cloud native federated learning platform using technology like Kubernetes, clouds and multi-clouds and so on. And then we have FATE board for the visualization of different models. So during the training we can monitor the progress and also see the results or the logs of this training is called FATE board for this federated learning framework. And we have the FATE LCM federated learning lifecycle management for multi-cloud setup. So basically it's for the federation management instead of a local site management. And then also we have FATE serving which is help people to serve their federated model by using this tool for online inferencing, model management and model monitoring and so on. So these are the basic key features of FATE framework. And most recently we are building up a feature called FATE dash LLM. It's called federated large languages model. Right now we have planning a few modules like the different hubs like communication efficient hub, FATE LLM model hub and also the FATE LLM protection and evaluation hub. So right now the blue boxes are representing the module that has already been done. And the gray ones will be done and they are on the roadmap soon. We're going to release one version this month in May to be more, to contain, to handle larger models. Right now we can only have BERT and GPT-2. It's not a very large model, but in the roadmap we will have GPT-J, Lama and chat GLM dash 130B and so on. All those are the large models that we aim to provide for the users. So this is the latest feature or later update for the LLM to address for the need for the industry or the users that have the week for training the large networks model using local data for different parties. The idea is that we have the models from, for example some pre-built model, pre-trained model like BERT, GPT-2 or GRMS, whatever, and Lama, whatever, and as the starting point of each of the party. And then we perform the fine tuning on each of the party. We call it the distributed fine tuning for the LLM and then using the organization's local data. Nobody was showing the data, but they're using the data for fine tuning. Eventually we have a trained federated large-level model eventually that will help us to be used across the different locations. So here's one of the adapter way for the fine tuning. So adapter is one of the approach to fine tune a large-level model or actually any pre-trained model or foundation model. So here we have adopted an approach called AutoFed L NLP. Here you can see that on the left-hand side we can see the diagram for the most of the pre-trained and fine tuning stuff. So this is normal to all the AIR people. Firstly, you have a pre-trained model as in use as a foundation model. And then once you get a foundation model, you can fine tune it to be for a different purpose of tasks or the downstream task. And then you can serve the downstream task for your application. For the fine tuning, because for this fine tuning usually you need to train the whole model, which is a lot of resources. It's almost hard for a general organization. So we have another way of called efficient parameter fine tuning, which is just the adapter here. It's a very lightweight layers inserted into the original network called adapters. And then use that to only tune the adapters to achieve the final result of the fine tuning. That will reduce the parameters and as well as the time and resource used or required by the fine tuning. That's a lot improvement. According to the data from our FATE community, we have about the parameters have been reduced to about one to 10% roughly. And the time for the training actually reduced to about 20% for the fine tuning. So you can see there's a lot of saving in the training in terms of the training of the federated learning by using the adapter. Here's the FATE LLM for the high level architecture. Basically, you can see that on the left hand side there's underlying the hardware layer, CPU, GPU, FPGA, whatever. And then we have Kubernetes on top of it for management of the cluster and the hardware resource. And then we have the distributed runtime called Agro as Spark to run on top of it and using the underlying hardware. And then on top of that, we have our open source project called QFATE which will manage the single instance of the federated learning cluster. And then the yellow two boxes are the most recently FATE LLM which will on top of the existing QFATE and the other stuff that will help us to do the fine tuning of the large limits model in a federated learning way. And then this purple one just exchanged originally we have the OSX open site exchange server for coordinating different parties for the training. And the FATE LCM is a management tool for the different parties inside the federation. This is how the FATE LLM projects works. We just released the point 1.0. We have 2.0 most recently in this month and maybe the next release in August. So having talked about the federated learning in the FATE, we now move to another one, OpenFL we are working on. OpenFL is mostly for the horizontal federated learning. FATE just now is that they can support well for the vertical federated learning while the horizontal one we think the OpenFL may did a better job. So in a way it's more simpler as lightweight. So like right now in OpenFL we can see that basically the two nodes here, node one and node three and then we have a node called the aggregator here in the middle as the coordinator. So the first step is that the collaborator here, the collaborator will actually will, it's a client that the federation can access a local training, has access of local data. And then it can have to perform some tasks. You can consider it as the client of each in the federation. And the aggregator is, we can think of it as a parameter server that sends the model of parameters to collaborator or also send the instruction to collaborator for the training, for the federated training. So there are different personas in this OpenFL. One is called the federated learning federated experiment manager. It should send out the instruction, send out a task to the Python API component and then we'll trigger all the training. And then the director management manager usually will set up the infrastructure for director or the aggregator as the coordinator or the manager for the monitoring the whole training task. And then the collaborative manager will manage one individual collaborator for the training tasks. So director is the coordinator and the collaborator is for each of the party that are participating in the federated learning. So there are many use cases in the OpenFL. They are mostly active in the healthcare industry. So they have federated learning enabled for to detect some cancer boundary from the radio images, so X-ray images. So they can detect a tumor boundary from the X-ray pictures. So they are doing that by using by Intel and the University of Pennsylvania middle school in there. So I just mentioned two of the open source federated learning projects that are under Linux AI and Data Foundation. So in the ecosystem, we think there are different roles or different priorities that will involve in this federated learning stuff. So the first one is the actual use of the Fed. So the first one is the data sources, a data provider. So suppose an organization, they have data. They want to provide to other people for training, maybe become a commercial service so they can do it by, they can act like a data providers to others. And then there's a data consumer. So some company, they may need the data for the training to build up their application. They're calling the data consumers. And also their technology providers that help these two people, two organizations to connect with each other and have the federated training, also the multi-party computation and so on. They all into the governance of the same governance, some governance set. So this is how this ecosystem will interact with each other. Most recently, we do see the need for a problem that in the privacy computing area is called the interoperability between different federated learning frameworks. The problem is that different organization, different people, they may have solution, different solution for federated learning or for the privacy computing solutions, but they cannot talk to each other. So when a customer, when a user, they want to talk to different data provider or the different users, they have a problem of how can connect to other solutions. So here in this diagram, the framework A and framework B, they cannot talk to each other because they're using different underlying technologies. So the community of faith and other federated learning, we are talking about, we are proposing a solution for the increase the interoperability standards of different heterogeneous frameworks. So like here, we have the framework A and to work with framework B, so we need to comply with common interoperability standards, like the management API, controlling API, and the ML secure protocol and also data planning management so that they can talk to each other. So we are proposing some of the architecture change or common APIs in federated 2.0 to aim for the interoperability between different architecture or different frameworks. This is underway for the faith 2.0. So the last I want to mention a few about some of the projects that we help to be used or we develop to help manage, help develop the help manage the federated learning. The first one is called Q-Faith or we are one of the contributors of Q-Faith. We use the cognitive approach to manage the federated learning either for the instance in the cloud or on-prem. So we currently support the Docker Compose and as well as the Kubernetes in any cloud environment either it is public cloud or private cloud. So currently Q-Faith can operate one of the instance of one of the party for support three different frameworks like faith openFL I mentioned and also another commercial solution called avatar. It's a vendor for the federated learning. And also underneath we have the Q-Faith to use to leverage the hem chart to deploy on Kubernetes and any because Kubernetes right now is basically everywhere in public cloud and private cloud. So by using that hem chart management we can deploy Kubernetes. We can deploy the faith openFL or amata onto different cloud instances using that support Kubernetes. In addition to management of one single party or single party of the federated learning we also have federated lifecycle manager for the management of the whole federations. That means that containing multiple parties. So the idea is that we have a single view of all the parties that participate in the federation and then we can help them to manage the instances of faith openFL so that they can work together in as a federation. That's the open source protocol Fed-LCM is currently under faith. And the list one is called the how we're going to manage the federated learning from the operator's view. Basically if we have multiple sites or multiple data sources from different places either it's on-prem or they are in different clouds or then they can always be managed by our federated lifecycle manager here. The FADLCM we just mentioned using the we just sent the red dotted line representing the control flow to the underlying platform called QFATE for each of the site and then the user can choose the faith openFL or whatever frameworks that are supported by QFATE and then the data scientists can use that framework to perform the federated training and together. So also federated lifecycle manager can also manage the central aggregator as well using this dotted line. So the idea is that we're using the federated vision learning lifecycle manager to perform the management task, relatively complicated task in different clouds and different either on-premise or public cloud so that they can have the efficiency, manage efficiently and securely for the audience, for any enterprise if they have data in different locations. That's mostly for my talk. Just see if any question I can help answer here. Thank you. Yeah, thank you. So can you talk a bit more about the difference in performance between the federated and the regular centralized learning and how you deal with this because homomorphic encryption is not notorious for being slow and you're kind of impossible. You mean the actual performance like not the accuracy but the performance, right? So in our experiment that we need to pay the overhead or encryption as you said, homomorphic encryption is slow. Yes, that's right, it's slow. We test it like 10 times, roughly 10 times to 50 times depending on the situation and algorithm and they're working. So that's the penalty you will have by gaining the privacy preservation of the data. I would say in that multitude. And so from an application perspective, I noticed the slide was like 2019. I mean, can you give us an example of maybe, like you said that this is up to 30 contributors. Maybe an example of application of this. I'm sorry? Example of application of multiple parties basically using this federated learning to train models. I didn't list that. So for example, in a bank in China, they are doing the risk management, trying to assess the users, the risk level, whether we want to lend them money or not. So by doing, in order to do that, they first using their own banking data as well as social media data, like 10 cent, whatever, for example, they have a large amount of user data that can be used collectively to form a model. So by using those data set from different organizations, they can improve the accuracy of their prediction, like by a few percent in the training. So in this case, there's like two organizations, but do you have an example of maybe 20 different organizations using this model? So the example that I've shown here for OpenFL is the one that would be maybe in here. This one, you can see all these organizations here. There are many, I would say 30 or 50 something, roughly to that scale. Hospitals around the globe, they are sharing the data of the, not sharing, they're collectively training the model, right? So that means that many participants in this setup. Thank you. Thank you. Thank you. Amazing presentation. Okay, thanks. Are you familiar with Opaque, the secure and clay-based data sharing from Berkeley? All of them, but not quite aware of the detail. Okay, it's being offered as a high-performance replacement for what would otherwise be handled by a homomorphic encryption, because all of the data packing and unpacking is being done in the secure hardware enclave, so I've run across it. This is the first really meaningful use case I've seen for that technology, so I thought you might have heard of it. Okay, you're saying Opaque is like a replacement or enhancement for the homomorphic encryption, right? Yeah, like a replacement. Okay, yeah, we did, one time I talked to one of the people there, like, you see Berkeley, right? Yeah, he used to at CozLab, at CozLab, at CozLab, exactly. I talked to one of you and I heard about that Opaque thing, but I didn't get into detail though, but I think the talk tells me that, I think, it's very interesting to take a look, yeah, to think about. Thanks for the suggestions. Thanks. Could you go into more detail about the aggregation? Aggregation? The server creating like a best model out of the client model results. Okay, let me pull up that slide. Aggregation for like, you mean the process or what, sorry? Yeah, like the process, so if you could... This one is better. So the one, two, three, four here. So the first one is that we train the model locally. Each of the parties, they have data, right? Local data. So this party one, participant one, two, three, each of them has their local data. So they will use that to train some algorithm like for example, LR decision tree, whatever, they're using locally for the training. At a certain point, they pull up the parameter of the model. For example, gradients, the cohesions efficiency, the all kinds of model, not all the data itself, but the model, right? The model we will send. So the second point, second step, we'll send the data in an encrypted way so that nobody will know other people's data, not other people's parameters. So that, and then we can have a way using homomorphic encryption or something similar to aggregate those parameters together to get globally weights so that you, and then this will be sent back to different parties. I see, and what are some strategies for getting the globally best weights from? Because you can take advantage of other people's data. Because locally, if you train locally, you are only seeing your own data. You don't see other people's data. That's why we have federated learning. Okay, thanks. Is there any distance limitation or requirement between the participants and your central? Yeah, I think usually we go, there's a few ways of, if not locally in a network, usually they go to the public network, like internet or VPN or dedicated line, whatever. So that will affect the eventual performance. Yeah, because the amount of data divided by your bandwidth, right? So that's the time you spend on the communication. So you don't do this like instantaneously? You might just do it like once? Yeah, once for like, no, no, no. So the idea of federated learning is that we do the training like several epochs, epochs eventually like 10 epochs, and then we send the data over once. Not doing it every time. That's one of the techniques that federated learning uses to saving the bandwidth. You don't need to send everything every time. So that was saving, that's the saving coming from. And also it's expensive as you can see, right? The communication is expensive. Do you have any kind of a persistence built in? Persistence like what, sorry? Yeah, so what if the transmit got interrupted? And so the model on the cloud doesn't have all the updates, the other models. So you have, it's like database, you do roll back. Oh yeah, we need to have some kind of techniques for the consistency as you said. In case of some broken lane, whatever. There's some kind of weight or just abandon some kind of one round update, whatever. We do have some kind of that. Right now we just have the synchronous update. If everything works out, then it's good. If not, then probably we'll abandon one round, whatever, something like that. Thank you. Okay, time's up, right? Okay, I think that's all for this session. Thank you for the participation. Thank you.