 Hello everyone, thank you to attend my session. So the title is building Edge AI Stack with AI as a service in the cloud native way. So my name is Indy, I come from a future way. Very nice to meet you guys here. So first let's talk about the problem and challenges. Why we need this Edge Cloud computing and Edge AI? So with the 5G and all the IoT booming, this Edge Cloud computing is more and more urgent. So why we need it? Because with IoT or remote side sensors, there's a lot of data generated in the remote side. So we want to efficiently compute with this data. So near data computation is a request. That's why we want to process data on the Edge side instead of a transfer all data back to the data center. Also, there's other three main problems. First latency, you can see even though the internet is faster, however, you cannot exceed the light of speed. And if the data center in the other, another cost of your place, the data generated. So it take maybe 100 or 200 millisecond to transfer the data. However, I have some user requests say, okay, can we do the processing within 20 millisecond or 40 millisecond? If the data center is cannot guarantee is very close, you cannot, I mean, conquer this problem. You have to put some computation near to the data generated site. So that's why we need that edge cloud computing. Also, because a large amount of data generated, so the bandwidth is precious, you don't want waste all the user bandwidth, upload the raw data there because some data redundancy, some data is not a cleanup. And we want to save some bandwidth for the other using usage, we cannot do that all for the transportation of this data. The third, not the last, but very important is data privacy. So not all people want to upload data to the cloud for computation. Not only for personal, they have a PII also for the industry or the factories, they may not want to operate, upload all this raw data to data center because it could expose their business the confidential information, even with data mining or other things. So they want to do the data cleanup. So then they can take care, I mean, take advantage of cloud computing, the large resource, computation resource to compute this data is cleanup and already hide sensitive information. The last one is the newest AI offloading. For example, we have a more and more powerful devices, especially mobile phones, mobile devices. However, the mobile phone have only limited battery attached. So this AI inference, I mean, not mentioned the training, but the inference also could be a very compute intense. That means your cell phone may not keep up with the algorithm. Also more important, this computation take a lot of energy. That means it drain your battery really fast. That's people don't like. They want their mobile device to hold for a long time. That's why we want to offload some this AI inference to the edge. So we don't need to transfer all the way to the data center to have a fast and low latency response. However, it will save the energy and also take advantage of more complicated models for the inference. So now we know why we need this edge computing and edge AI. So the major challenge we need to solve is first, network reliability. So, you know, it's not seem like within data center. So the edge node and cloud is connected through the internet. So the internet is not reliable and the latency is not consistent. It could be fast, could be, I mean, could be slow. And also the network bandwidth is limited. So as I said, the user only purchase a limited amount of bandwidth. It's not guaranteed to be wide enough. And also, most of the time, edge devices have relative constraints, resource constraints in the edge node. That means it could be as small as a small IoT gateway within some someone's home. Also, this could be a large server. However, compared to the cloud is still constrained. That means the hardware may be outdated or it's a previous or even older generation. They have a limited computation ability. And it's very important for the edge autonomy. That means, as we said, the network is not reliable. When the network is disconnected or the temporarily done, we need the edge to run autonomously at the edge side without the communicate with the cloud. They can run autonomously. The last one is highly distributed and a heterogeneous device management. So with the 5G and IoT, you can see old edge node and devices are geo-distributed all around the system. And also they are heterogeneous from different vendors, different operating system, different hardware architectures. We need to solve this problem. That's the challenge we are facing. So when we try to solve this problem, we started with our core open source project. It's called Kube Edge. So Kube Edge currently is a CNCF cloud native foundation incubation project. So we graduated in this year, September. So this Kube Edge built upon Kubernetes. So it's take advantage of all the Kubernetes have for the application orchestration, deployment, lifecycle management, et cetera. And it provides fundamental infrastructure support for networking or also this app deployment and the data synchronization between the cloud edge. So the developer is only focusing on developer data business logic to solve their problem. We don't worry about all of these issues, the network that the platform can provide. So with this, the Kube Edge provides first similar cloud edge communication. So this communication includes not only the data communication, but also metadata and edge autonomy. That means doing temporary internet connection issues. So the edge can run autonomously without connect to the cloud and also when the connection restored and we're going to re-synchronize the metadata to make sure the edge is running as we expected in the desired state. The third one is a low resource ready. That means as we mentioned, the problem I'm facing is sometime the edge node have only constrained the resource. It's including probably low memory, low bandwidth and the low compute ability. So Kube Edge can vary to suitable for high computation resource, even the low computing resource. So we can, the Kube Edge can deploy to an edge node at low as a 128 megabytes memory. And we recommend it at least 256 megabytes. And also we provide simplified device communication. That means we provide device twin, device shadowing. So from the cloud, you can easily manage the IoT devices without some actual works. Now here, let's go where the Kube Edge architecture, you can see on the top, on the, this is three part, we show the cloud edge and device collaboration. On the top part of this cloud, you can see in the center is the Kubernetes. That means the Kubernetes is the, we require Kubernetes deployed in order to deploy Kube Edge. And with Kubernetes, we use the standard of Kube control as the command line. So we inherit the most of the, we support the most of the Kube control commands. And you can see the core part of the Kube Edge is called the cloud core. That's include edge controller, device controller, sync controller and the cloud hub. That means edge controller, of course, that means it's controlled edge node. So we have this dot line, the edge node draw there, but the actual edge node is on the bottom left. That's the drill down. The edge controller is handle the, how we manage the edge node to make it the join the cluster and the delegate all the command from the cloud to the edge. Device controller is, we use a CRD, device CRD to define the devices. So with device controller, you can control or view the status of a device attached to the edge node. Sync controller is for the synchronization, especially when the, you first drawn the cluster or some network issue happened that the network connection is restored. So that means with sync controller, it synchronized the data and metadata between the cloud and the edge to make sure edge node will run in the desired state. The last part is the cloud hub. Basically a set of the connections between the cloud edge. As we know the edge probably, I mean deployed in behind some firewalls, either the coconut firewall or even your home firewall. So that means is that if this not set up a net, that means there's no public IP for your edge node. So it's impossible for your edge to build cloud to access the edge directly. And so we set up this a web socket. So when the edge node joined the cluster, we set up these a web socket connections. That's duplex. This means we can send a combined from cloud to the edge. So the edge core part have an edge hub, I mean, compared to the cloud hub to for the connections. And also the edge currently support a content ID, Creole Docker container. And we have support some CISI, entry CISIs for good head storage to the container running on the edge side. Also you can see that some mosquito there is the pop-up broker. So we have these devices that we support a different IoT protocols include the model bath, Bluetooth, OPC, UA, that's popular industry, industry, IoT protocol. So with this MQTT IoT protocol, we can let system control and manage the devices. Now I'm introducing another open source project. So we based on Kube edge, we think it's very useful to demonstrate AI abilities. That's why we are in the LF edge a criminal community we set up is a Kube edge edge service blueprint project. So this project is focusing on device edge cloud collaboration framework build around the Kube edge. So this blueprint, the verticals focusing could be IoT, could be MEC, the Mac scenario. So the key component of this project is the Kube edge. As we said, this is a CNCF open source project. So the first type of we introducing into this project is focusing building an ad stack. The user case is ML inference offloading to the ad servers. That means if you have a mobile device, you want to do some inference, you want to offload that to the ad server. I'm going to give a demo in the following slides. And then, so this print blueprint family is kind of end-to-end open source project solution. So we'll leverage various infrastructure. So that means they support 86 arm or a risk, risk of five. So this blueprint is an infrastructure neutral. We want to support all kind of infrastructure, heterogeneous infrastructures. Here is the offloading function block diagram. So you can see in the central part is the edge and the central column is the edge. And in the middle of the horizontal one is the Kube edge. So that means in the cloud, we build a Kubernetes, we deploy Kubernetes. And in the edge, we deploy Kube edge. So the Kube edge have covered cloud edge and devices. So the first user case we are going to demonstrate is called Emotion Recognition. Basically we are running the training service in the cloud. We train the new model. We deploy using the Kube edge, deploy the model and application to the edge. Then host this services to serve the devices. The device is only do the image pre-preparation or pre-precessing, then offload this inference to the edge. I'm going to demo this later. So that means if our devices has a limited resource or want to save energy, so they can offload this inference to the edge. And this is a typical offloading approach. We offload the inference from the device to the edge. And all the training is happening in the cloud. So this collaboration framework is very essential to the amount of loading. So Kube edge provide underlying software platform including the application deployment, model deployment. In the future, we are going to support a dataset deployment update also. So here is the user case for this Emotion Recognition we, now the abstract blocks diagram we see that we show some details is device edge cloud collaboration. So the cloud we are running the training and provide the model resource and the edge will run the inference services and provide Emotion Recognition Service and provide this offloading APIs. So it will accept the application, deploy the application, provide the inference model to do the inference. The device do the image preprocessing include resize, convert to RGB, email to a pixel array also. And then it's offload, upload this pixel array to the edge for inference. Then the edge will run inference, algorithm then reply the results to the device. Here, let me have about a four minutes demo. Let me show this to you to see how this inference, ML offloading inference are running look like with how the Kube edge is working. So in this demo, you can see we deploy our cloud part to a AWS EC2 instance. The edge part is a physical server running behind a corporate firewall. And the devices we, because all this pandemic and we work at home, so we use VPN connect to the server. But in the real life, the devices probably within connect to the same subnet behind the same firewall of the edge server. So we don't need to have this VPN issues. So in the two terminal window, the top one is the cloud, the bottom one is the edge. First, we can see, we already installed, we pre-installed the Kubernetes in the cloud. So there's only one node is the master node. It's on the AWS public cloud. So now we are going to let our edge node behind our corporate and join this Kubernetes cluster. The cluster master plan, control plan is running in the public cloud. So in the cloud server, we only need to open one port is configurable. It's now is default is 10,000. So what we do in the edge node is say, we use this key admin tool is similar to Kuber admin to join our edge node to the cloud server, the cloud cluster, the Kubernetes cluster running on the cloud. So you can see it's already fast. It's already drawn. Let's verify that the server is joining the cluster, already drawn the cluster. So you can see there's a two nodes right now. One is the master running the cloud. Edge server is the one running behind our corporate firewall. Now in the edge node, you can see nothing is running. Then we are going to deploy our offloading services application from the cloud to the edge. So this is a tense floor, we are using the tense floor framework. So you can see we do this deployment, Kuber control applied. And from the cloud, you can see the part is running on the edge node. You can see there's already successfully deployed. Let's come to the edge to verify. Yes, we can see the Docker container is running on the edge node. Let's do some, yep, let's tell this log to show what happened when we have this inference request come in. On the left, we have a Android emulator running to demo what's happening. So first we open our for the book to upload. Yeah, you can see the service is quickly running. We upload a, we pre-process a portrait picture and we upload it and give answer back. Let's do another one. So this even is a blur picture. We convert to picture array, upload the service, a request for the service. Then we have a number come back. For this one, we got the result back too. So the emotion we recognize is angry with a confidence of 999, is 0.99977. It's pretty confident that's an angry face. So that's the base on our, the model we trained. So in this demo, I showed how we deploy a set of a, how we set up a cluster, Kubernetes plus Kubernetes cluster. So I assume the Kubernetes cluster already deployed. Then we have the edge node join the Kubernetes cluster. Even the control planes in the running in the public cloud, the edge node behind firewall, we don't have any issues. And with this web socket setup, it's duplex. We can push command from the cloud to the edge node. Then we show we deploy a application based on TensorFlow to do the emotion recognition. And then we use the emulator. So to emulate a mobile device running in the same network with the edge server to do the AI of loading. So the emulator, the app, the mobile app, let's do the pre-processing the image, convert to the pixel array and send this pixel array to the inference service running in the edge server to get the result back. So that's the summary of a demo. So let's come back to our talk. So this demo is a relatively simple. It only show the inference, the inference of loading. So how about more complicated cases? So we have more challenges. So first, all this edge is geo-distributed and they have a dataset geo-distributed. Can we take advantage, I mean, to conquer these issues? And also these samples are not universally distributed. Some node have more data, some node have less data. And also because of this non-universal distributed data, so the performance of a universal AI model is degraded on the edge and also the resource is constrained on the edge. And when you try to run some federated learning issues, then it's hard, I mean, with few short samples, it's hard to converge. So we are thinking, so we are going to build an edge AI framework based on Kube edge to see if we can help to solve this problem. Here is our design. The purpose is we want to have an edge cloud collaborated machine learning framework based on Kube edge. With this embedded collaborative training joint inference algorithm, it helps the developer to develop new algorithms. So we are trying to work with existing AI frameworks. For example, TensorFlow or PyTorch, we are not inventing a new AI framework. So it's have building in three features, joint inference, increment learning, collaborative training is federated learning. So our target user is a domain-specific AI developers. So they are building and publish edge cloud collaborative AI service function. They can do that easily. And also we are targeting the application developers so they can use the edge cloud collaborative AI capabilities so without any learning curves. So the central part you can see is based on the Kube edge, we build the edge cloud collaborative machine learning framework. So it support heterogeneous hardware either as 86 arms servers and also based on this framework you can easily build a computing region, speech, NLP applications. So architecture you can see we divide it to cloud and edge. On the cloud part we are running Kubernetes platform. So they have a worker application running and this support TensorFlow or PyTorch we have our SDK libraries there and then you can train the model in the cloud. And we have this called GC is a global coordinator to coordinate all this services. On the edge we run Kube edge. So this also have the worker, I mean communicated with the workers running on the cloud. So it's probably instead of a run training is could run inferencing. And also when you do the increment training or collaborator training, you do your run training in the edge too. So based on the Kube edge, you can deploy a local resource management for the job monitoring management and the peer management also. So that's our architecture. And let's elaborate all three features. First is collaborated join inference. In this case, you can see we have a, as we demoed in the ML uploading in the previous demo, you can see we run the inference in the edge. However, the edge could only have restrained resources. It cannot run come more very complicated model. In this case, we do this collaborative join inference. That means in the edge, especially in the low resource edge node, we run a shallow model. That means it's probably handle 70, 80% of the scenarios if we get a very competent results back. However, if the result come back with only a 40, 50% of the confidence, so we cannot tell the result back to the user. So we offload this another one to another layer to the cloud. So in the cloud, we run a more deeper deep model. This require more computation resources. So that's one is should calculate our inference have a better result back and send this result back to the edge, edge pass through to the user. So all these models is trained by the AI developers. So when you do the training, you train, you generate a deeper model and a shadow model. The deep model is require more competent resource. It's suitable for running in the cloud. And the shadow model is required only a part of this result. So it's suitable to run on edge. Especially in the low resource edge, incrementing learning. So in this case, so the AI app developers use this AI library. So to integrate the collaborative increment learning function. So when the sample detect algorithm in the edge, identify a sample with a low inference confidence. So in this case, similar to the collaborative inference, we upload this one to the cloud for the laboring services. So in the cloud is running a label services. It's manually or periodically with AI assist, we are labeled the samples. So the system automatically perform an increment training. It's based on the current model, you train and generate a new better model. Then you push this model back to the edge. So the edge have a better model. So if you get another hard example or difficult example, you cannot achieve a high confidence level. You upload to the cloud to further labeling and to generate an even better model. So that's called increment training. That means you don't have the data set at whole data set at the very beginning. It's your increment training your model better and better federated learning. So this one is especially for the data privacy issues. So the raw data is never transmitted out of the edge. So it's only stay locally. So the model is generated by knowledge aggregation. So that means in the edge node, we have our local, I mean data set. So that's very data, probably a sensitive data. So you never transfer to the cloud. You do your training locally at the edge node. Then after you're trained, you upload this model to the cloud. Then the cloud will do a cross add transforming and a model aggregation. So that's our library will provide. So do the aggregation algorithm and then on the cloud. After that, you send back your result back to the edge to refresh to get a more better model back to the edge. So with this, first you save your time, transfer your time and the bandwidth, transfer the raw data back to the account. Also, it's very important for data privacy. You never transfer data out of your edge. You have a confidence, you have full control of your data. Now let's show is how easy is that. So we are not, as we mentioned, we are not targeting to invent a new AI framework. So we are compatible with current AI framework. For example, TensorFlow, the current example is our, we use a joint inference based on TensorFlow. So you can see most of the functions or most of the writing is similar to, very similar to the AI developer use for TensorFlow or for inferencing. The only difference is we have our, include our library and have this transfer the cloud algorithm. So in case your competence level, the competence level does not match the goal, this automatic transfer the cloud for the second layer is a joint inference. So the developer do not need to change other part of the code only need to using our SDK to config and generate this edge cloud joint inference. Federated learning, the similar thing, you can see we are compatible with all the style, everything. Most of the code, you can use the same. You don't need to change existing code. So, and also you don't need to learn a new framework. You, if you are familiar with TensorFlow, if you are familiar with PyTorch and PyTorch, so there's a no learning curve. So to use this, the library, you only need to import our library and then use the training loss function optimizer and the collaborative train function from the library, then you can achieve federated learning very easily. So let me conclude my presentation. So in this talk, I talk about, mainly talk about how we build the edge AI framework. So this framework is based on the Kube edge. So first the Kube edge is a CNCF incubation project. So here I list the project website and also the code repository is public to GitHub. The Slack channel are mainly list and all the meeting, community meeting are recorded, uploaded to the YouTube. You can take a look if you meet any meetings. You are in, you are interested in any meetings. And the meeting also is hosted in the Google doc. It's public to everybody. So here is the link to the meeting calendars. So it's hosted in the night. This every week, one week is suitable for the United States people, North American people attend. The other is more friendly to the European people, your people. So you can see the meeting calendar with some link and here is the Zoom meeting ID. Another project I mentioned is the LF edge or Kube edge service blueprint. So instead of a project website is the public or Kube edge site. It's have everything including a document and also the edge AI framework. I mentioned all this diagram and all the service samples is hosted in this wiki page. So you can go there and take a look at the document and also we have a weekly meeting. So it happened in the 1900 Pacific Standard Time. Here is the Zoom link and also there's a Slack channel. So if you have anything, you can chime in and jump to the Slack channel to ask questions. So both Linions Foundation, I have one is the CNCF and both are Linions Foundation open source project. So both are welcome to everyone to join and chime in. So thank you. This concludes my talk. I leave about eight minutes for the questions. Thank you.