 Okay, I think we can get started. Hello everyone, again, we're coming to this tutorial session of Open Cloud Infrastructure at Scale for Next Generation. This tutorial is sponsored by FutureWay technology. My name is Yinxion, I'm from the Cloud Lab at FutureWay. So in this tutorial, we will introduce you and discuss one of the projects called Centeras. And it's one of the projects we are doing at Cloud Lab. And the vision of the project called Centeras is to be the open next generation Cloud Infrastructure at Scale. So that's where the tutorial title come from. And so hopefully this project actually is open source. We are in the process to donate to the Linux Foundation. Hopefully, through this tutorial session, you will learn something and get familiar with projects on Terras. Hopefully, you're also interested in joining the community and then realize, hope to realize the vision of Centeras. So with that, we will start this tutorial session. This is an agenda for the next three hours. I will give overview of the projects on Terras. After that, Foo and Deepak from the Cloud Lab, our co-workers, they will discuss introducing you to MISA, which is a sub-project of Centeras for the networking solutions. And after that, Shaolin and Hongwei will discuss projects called Actors. Actors is a sub-project for Centeras for the compute. They will more focus on how to integrate with MISA, the network solution, make the Actors a scaled cloud networking with XDP. So after that, Dean will introduce you and discuss KUBA-H project. And actually, in fact, the KUBA-H project is not a part of the Centeras. It's actually an official, safe project. But we're also working on the KUBA-H project a lot. So we want to also give you introduction, introduce you to the KUBA-H project and discuss some information that may change or may upgrade or development on the KUBA-H site. So finally, Andy Nye from our open source team, she's an open source expert. And she's also a board member of the same safe. So she will talk about how to use the current status of Centeras and community investment partnership. And hopefully you can join the community and realize again the vision of Centeras. And since this is a tutorial, we want the session to be more interactive. So if you have any questions, we'll raise your hand and stop where we're hopefully try to answer your questions. So with that, so let's start give our view of Centeras, what it is Centeras and why we're doing just a project. And well, there's a couple of challenges as a project to solve or the problem we try to solve. And the first one is the computer network management scalability. And the current as more and more customers move this enterprise, move the workload into the cloud for cloud operator is really constantly facing the challenge of how to manage a large cluster of computers. And currently, most of the solution in the open source, the managing couple of thousand computer nodes in this project. We will try to solve this a problem for skilled out 50,000 computer nodes and also scale to make two meeting network endpoint provisioning and manage others network endpoints. So that's one of the challenges the problem to solve by the projects in Paris. And the second problem we try to solve our challenges probably is a unified infrastructure. Well, on the open source community, we have many open source platforms open source servers to manage a different type of resources like we aim, we aims and containers serverless. So one of the problem are working in the ask by many customers is they want to have a unified platforms to manage all those resources so they don't have to manage two or three or different platforms to reduce the management costs. So one of the problem we try to solve is try to develop a platform that that same platform API for deploying manage the VMs container serverless even bail matters so at the same time, with the same platforms. So this is a challenge that we try to, we try to, we try to, we try to solve the challenges are the problems. And we also the same network solutions for provisioning and routing the VMs containers and all the workloads that deploy on the sun on the same platforms. So this is the second challenge we had the third challenge that projects and terrorist try to solve is high performance cloud networking solutions to many cloud operator network is continued to be the bottleneck for large, large scale computer clouds, a large scale cloud. In fact, I will say this. And so there's always customer always want to faster faster provisioning to VMs provisioning to workloads in. And we found that a significant part at least the 50% of part of the latency is have on the network side, and the most of a significant portion of the cloud incidents actually some network related. So produce the high performance network solution is a quite a challenge in this project try to solve address these problems by working on the different, different ways, you know, within ways to solve the network scalability and performance problems. So that's the one of the other goals this projects and terrorist try to solve. The first one, but the least is the distributed cloud platform. We see many platforms doing each. Each knows in each computer will say this. But we think that from platform perspective, we want to design a platform that literally support the both data center computer knows and each computer knows with the same platforms. We want to support network, modern network, multi tendency at the age. And so we will address of the problems that we have in the money actually many operators facing is the network security network. Network policies and network and latency problem for the peer for the computer knows at age and running application at age. And also, we want we call a distributed cloud platform is that what we mean by that is that the computer problem manager knows not only the computer center knows cloud data center knows but also the computers computer knows at age side. And the difference on many of the different is on the age side as you know, the network latency is not latency has some issues, the security so a lot of problems we try to solve by this project. Also from resource perspective, especially for operator perspective, when a customer requires a location deployed on the cloud, where to actually replace the workload where we start the VMs. And this does we call scheduling and not necessarily in the company and the data centers we want to deploy the user applications on the place on the notes that close to the customer. So it could be on the age notes or could be on the data center notes. So there's another challenge or trains will try to solve by the projects and parents. So this is our view of the projects and parents and again it's an open source project will open source it. We are in the process to donate the project to Linux Foundation. We call this open, we call this project to open distributed cloud infrastructure. In fact, we rebuild it with the cloud native approach. And this project content, including to solve project. The project actors is a computer is a cloud compute project for the center us. It basically is a manage manage the clusters at a scale, and they unify the organization API is for VMs containers serverless and as well as bail matters, as I mentioned earlier. And then the, the second project. Mesa is a network project for center us and network the Mesa project and we will discuss more detail on the next section is the virtual is a network solution, a virtual network solution at scale for the cloud. We will have a data plan and the manager plan. And in the data plan we use xdb technology to do for the routing and forwarding the package speed. So this is our all center architecture, again, you include that the center as a product could compute which is actors and the network solutions which is Mesa, but we also have centers UI and a global scheduling for the center side. I'm not going through the detail of this architecture, but I'll do I want to mention a few architecture is highly highlights. The photo is the computer and a network is independent each other. So even though we this two project action are independent each other. You can use Mesa network solution stand alone and a worker with a other open source software like OpenStack or Kubernetes. In fact, with the first sense we do is actually the Mesa networking solution can work natively with open Kubernetes and also working with OpenStack. And the actors is also independent computer project. If they can work with any other network solution network solution that they sustain and contain the network interface saying like compatible. So you work with flannels, you work with chemicals, all those network solutions on the open source site. So there's two projects that independent each other. However, on the center as umbrella, we integrate this two project and become a one and platform. And the second highlight is this from this architecture, the actors actually evolved from Kubernetes architecture. However, we make a major core change to be become a more collaborative instructors. So we will discuss more maybe on that in the later slides. I think I mentioned earlier that Mesa is based on kernel XDB program and the Geneva protocol. And this is a new, a new solutions that we come out with, try to help try to solve the problem the network. Network I think is network throughput and network throughput and concurrency problems in the performance for the package forwarding. Again, we will have more slides in this tutorial to discuss Mesa. And for also for Centera's architecture all is native to support a multi attendance computer network solutions. So from the beginning of the design we actually this designer for infrastructure for build public cloud or private cloud. So a multi attendance is built in is native. I know some of the open source software's out there and is more designed for private cloud means that and it may not have a multi attendance features there. But for this project and we intend to be a platforms to build public or private cloud with multi attendance in mind. And the next architectural highlight is this architecture actually compatible with CRI container container around time interface say a site container storage interface and say in my container network interface. But we're using that for both containers and the range. So we aims also were compatible ways are those interface, which is the factory interface currently in the cloud native world. So I need to be more of project architectures. Again, I mentioned earlier architectures architecture is the drive from Kubernetes. And but we make a key redesign and changes or we call extensions to the to the platform to make a real multi attendance cloud native infrastructure to support containers of VMs and other type of work. Work resources or work not resources. So first major change we made is the for scalability. Every component in the architectures architecture can be partition that including API server that including the storage for ETCD you can partition ETCD you can partition API server to make it scalable deploy at a regional level. And they're also including the controllers in the controllers in the architectures itself. So in the Kubernetes architecture, there are many controllers or operators you will see that control some of the deployment, some of the application workload. And that part, we also make a major change so we can partition the workload to the increase the our architecture throughput, increase our architecture scalability. That's the first change we made into the Kubernetes platform. And the second is the multi attendant model. We call a virtualized Kubernetes. Kubernetes is designed from beginning with actually no, no multi attendancy support. And I know the community is still working on that. We come with an innovative approach to solve the multi attendant problem for Kubernetes. And then every for each tenant, they look at these physically one cluster, but for each tenant they feel this cluster belongs to them. So that's what we call a virtualized Kubernetes. And that's what one of the major changes we made to the Kubernetes architecture. The third major change or key designer we made is the unified runtime. Again, I mentioned earlier, one of the challenge in the industry where customer enterprise, and they really want to one same platforms, one platform to manage the resources they have. We either is virtual machine with this container with other type of work node or resources. So this one of the design we made is unify the management of runtime for the VM and containers. So we have the same we have the same agent running on the host that support manage the lifecycle of VM and containers. And we are in the process of manage the serverless life cycles to for that, for the applications or runtime. The first change we made is the introduce the network objects and so nation. Today in the Kubernetes platform, you can have, you can have a you can you can deploy different network solutions as not as the network solutions is CNI compatible. However, we introduce network objects that can you can create a VPCs and you create a surplus within the Kubernetes platform, or in this case, in the actual platforms, so that you can deploy VMs or containers. That was in your network or with you a VPC that naturally will isolate each other. And that's part of what happens as a model that by introduce a new network objects in the Kubernetes platform. And that where we can create a cost each tenant can create the money VPCs they have, they can create a money or something that they have and then they deploy containers deploy the applications and deploy the virtual machines that on top of within that VPC or within the subnet that will not communicate with other, with other tendons or resources on the other tendons or VPCs. So that's our first major change we made extension we made to the Kubernetes platform. The final one but not the least is the age support. We actually follow the cool age platform mentioned that pattern and design pattern. The extender articles can manage the computer note, not on the H side, not in the data centers by the H side, so that we manage the computers, they're both at data center and agent nodes, and the same way same API, same experience. And however, the, the notes that could be in the cloud data center or could be in the H side. So we're native to support that from the, from the beginning of the design for architectures. So there's a need to be background of octos project. And again, our co workers were just we're discussing to do some more about the octos inside. And the, the Mesa project, which is a network project for the centaurs and the base of including actually data plan and the manager plan. And the data plan is built on top is based on the xdp ebp of technology in a general protocol, as I mentioned earlier, the for those that don't know that xdp is a stand for express data pass, and it's a unix corner technology. I'll allow you to run your program within the corner. And to process the network package in network traffic. So we're using the xdp slash ebp of technology to build a Mesa's data plan. And the pre, the pre new result preliminary result is the routing in data plan and the speed achieves very low latency goals that we have. And then the Mesa management plan is building on Kubernetes CRD and operator framework. So you can manage the Mesa data plan from the Kubernetes is instructor itself. So we building the VPC CRD objects, subnet civic CRD objects, endpoint CRD object, all the network objects that were created through the Kubernetes using Kubernetes CRD and operating framework. And that way you can manage the the data plans through the Kubernetes itself. Sometimes we call a cloud native SDN, but it's a small version of a managing plan to manage the Mesa that many focus on the data plan. One of the things they are the Mesa we think it can scale to a large number of provisioning networking, including the VPC network and the endpoint is we naturally partition of the cloud network by the actually tendon VPCs tendon subnet. So the routing and the scalability that all partitioned by the VPCs and subnet. So that way we can scale out the network architectures can provisioning minutes and minutes of the network endpoints at the same time for large scale clusters. And since we use XTP, we have been proved that the net is very normal forwarding forward the network traffic between VAMs or between containers or between containers and VAMs. I think that's our our view for the projects and terrors. We do have a booth in the, they hopefully, if you're interested you can visit our booth called Futureway and has a lot of information there, including the the right paper for center us. Some of the videos we have and a lot of we have for the Mesa's for actors. So we'll come you to the the our virtual booth. I think that's the end of our my our view for projects and terrors in a properly open couple of minutes for questions, questions if you have and then we go on to the next session. Any questions? Okay, if no, I will stop sharing and then we'll continue to the next section is your full and the Deepak will discover Mesa is a novel approach to network virtualization. Deepak, I'm back to you. Thanks. Thanks. Yeah. Thank you, everybody. Hi, this is Deepak which from Futureway Cloud Lab. So just kind of a recap. So we're going to have three presentations coming up. And each presentation is going to be around one hour long. So the intent is to have 45 or 15 minutes or so towards the presentation itself. And then at the end we'll have about 10 minutes of Q&A. And what we are doing is we're using a format whereby the presentations are prerecorded. So this is proven to be pretty productive in other online conferences as well. So essentially what this does is, you know, as we are playing the presentation, audience attendees can ask questions in a chat window, you know, while the presentation is going on. So that way we can have more interaction Q&As. And then at the same time, you know, we can control the overall time as well. So I think that's pretty much. And then we can start off with the very first presentation on Mesa Cloud Networking. Magni, you want to start? Yeah. Go ahead. Audio is not there. Audio. Details for Mesa. You want to start from the beginning. Hello, everybody. This is Deepak Vej from Futureway Cloud Lab. I, along with my colleague Foo Tran, will cover details for Mesa Network Virtualization layer for our next generation Centaurus Cloud Platform. So just kind of a very high level description of what Mesa is. Mesa is a programmable data plane for multi-tenant network services at scale. To set the background, why do we need another networking, cloud networking solution? Why do we need that? So we built just, we'll kind of go through that, you know, we'll set the background. We built Centaurus Cloud Platform to meet the needs of very large enterprises, which may typically include, you know, Fortune 100 companies who deploy their infrastructure at very massive scale. Okay. So the large scale cloud platforms such as Centaurus, they need to be able to scale up in order to support enterprises' entire global footprint. So in order for us to be able to accommodate all that, we set a design target such as, you know, that we may have a scenario where you have 100,000 virtual machine endpoints per network and in combination with many of these networks. Okay. So you can see the scale and the environment, the cloud environment is very dynamic, you know. So you have all these lightweight containers and serverless functions come and go in a fraction of a second, you may have thousands of these endpoints that they need to be provisioned and managed basically. So it's not a typical old static environment where you maybe spin up VMs few times, maybe 100 times a day max. So as opposed to that, we have a very dynamic environment. So we wanted to support rapid provisioning of cloud resources very quickly and efficiently. So we set our performance goal in order to achieve high throughput, low latency and consistent network performance in a multi-tenant environment. So that's the key thing, multi-tenancy VPC is built from ground up and is the first class citizen as part of our platform. So now from a scalability perspective, so just going back to VPC, this is similar to what public cloud providers provide as well, VPC, multi-tenant environment in AWS, Google, Microsoft. And from a scalability goals perspective, our networking solution should be capable of provisioning VMs or container endpoints concurrently thousands per minute and be capable of routing and managing communication among millions of network endpoints and be capable of elastic scaling of network services, be able to create an extensible cloud network of pluggable network functions. We'll get into a lot of details about that. You know, we have kind of very interesting kind of discussion on that. And then last but not the least, the unified network of various workload types. So one common interface, network interface for various type of workload, you know, such as container VM, doesn't matter what the workload is, you get the same networking interface when you provision or manage an endpoint. So that's, that was the kind of a background. Now with that as a background and requirements, we came to the conclusion that the flow programming, current flow programming, you know, typically employed by OBS, like solution, network and solution may not be the right model in order for us to be able to scale up to this, you know, there's such a large environment. We'll get into a lot of details, you know, what that means. Now, so what are the problem? Why do we think that the flow rules are not going to be able to meet such an environment? So one of the key thing is just to kind of set the background in a typical network solution such as OBS, a control plane programs, flow tables on each of the virtual switches on each source using the open, open flow protocol. Okay. And then suppose a customer wants to add a new virtual endpoint into a specific network. So and then let's assume that network has 100 endpoints spread across multiple source. Okay. So now what control plane when you're adding a new endpoint, what control plane will do is it will program flows and all the other holes that have the VMs in this particular network where this endpoint is being added, so that those VMs can reach the newly added endpoint, VM in this case, you see. So you have to go and program, add a flow rule table on all the virtual switches on all those holes. Okay. And additionally, the control plane also program flows on the virtual switch for the VM where this host and the new endpoint is being added so that the new VM can reach the rest of the network, you see. So you can see the problem actually it's about the number of flows per holes and the number of holes must be programmed, you know, when you add a new endpoint. So the total programming overhead becomes really very massive. It becomes product of a number of endpoints in a host multiplied by the number of holes in the overall system basically. So this results in, you know, you can see it's very obvious, you know, such an architecture does not meet the scaling goal, the provisioning of rapid provisioning of such a massive scale. And then at the runtime packet processing, because you have so many flow rules to deal with, it becomes the CPU memory utilization becomes a big bottleneck actually, you see. So that's what, that's what we meant, you know, that the current flow rule programming model doesn't measure up to the such a large scale environment. So to address these scaling problems, challenges, we build, we call meets our network solution cloud networking in order to route traffic for virtual networks and be able to provision endpoints in such an environment. Okay. Now, there are state of the art, there's a lot of work going on, actually. So, so currently OVN, the controller control plane folks are doing a pretty good job of optimizing the moving away from old neutron control plane towards the new OVN controller, which is based on the optimized databases versus the rabid mq rabid mq model, which they employed as part of the neutron neutron control network so they've done a great job actually so adding a single board reduce from 10s of seconds close to to one second. So this is really great actually. So this is a good achievement, but this is still not enough. You see, especially that it still doesn't scale up to the scale, you know, the environment which we are talking about. And the reason being the fact that the underlying problem is the fact that it's still based on top of OVS and which typically uses control flow rules to be able to manage the overall networking environment. So that's the OVS flow rule programming model is the main challenge. So there's been a lot of work done, Google and Ramida, you know, these folks, they're trying to minimize the flow rules in their system. So what they do is they send all their packets to a thing called hoverboard actually and then they monitor periodically all the flows in the system in hoverboard and then find specific flows to be offloaded to using direct path, fast path with the width flow rule. So on a periodic basis, they selectively offload some of the flows out of a hoverboard and use direct path, you know, with their hardware optimizations in order for two endpoints to talk to each other basically. So you can see that all of these efforts OVN and Ramida, it pretty much boils down to a single idea is that to find ways to reduce the number of flow rules passed down to the data path. So we stopped for a second, so we asked ourselves a question. So what does it take for us to build a next generation overlay network? That doesn't use any flow at all. Okay, so what if we program the data plane just like you would build a regular distributed system application. So do away with all of these flow rules all together and build a data plane as a regular distributed system application. So we took that tag or approach and went with that model distributed system application model basically. Okay, and then there's no need in such an environment and we're working in a very, this is a very specialized use case cloud networking everything is very deterministic. So there's no need for Mac learning or flooding the channel for flooding the tunnel, you know, those kind of age old networking design design so so we, so that's what we, we, we, we kind of there's no need for such such design old networking designs. Okay, so now so XTP so just want to briefly touch upon you know what XTP is, because this is what we we use so instead of using micro the flow rules, we built the entire networking cloud networking layer using XTP. And so but before we get into that it just want to kind of a high level describe what XTP XTP express data path. So in 2018 a group of Colonel engineers publish a paper about XTP. Okay, but this has been going on for a while I think this came out around 2016. And the goal, the key, the goal is to run custom ebp f programs within the device driver of the name. Okay, so these ebp f programs are really small programs for K and instruction size written in generic language such as C, but you can actually run write them in Rust as well. Okay, and these programs are verifiable programs so that when the Colonel loads them. So without worrying about causing any possibility for a crash. So, so it verifies Colonel upfront verifies that there's no loop there's no rock pointers or memory allocation, you know, etc, etc. So so it's all these programs are verifiable. And once the program is loaded as part of the Colonel. So when the packet comes in from the neck, you can actually run pretty much very high level logic in a typical data structures, such as you know array hash tables, and then make an appropriate action to the package. And those actions are typically either pass them to the networking subsystem within the Colonel, within the Colonel, or transmit them back to the neck, or even redirect them to another interface or dropping. Okay, so that's kind of a pretty high level description of what XTP is. Okay. And so that's what we ended up doing actually so we as I mentioned so we built our entire networking solution based on XTP protocol. So what we did was so we take an approach and say that okay, so we remove everything we we got rid of open V switch Linux bridges IP tables and we replace them with XTP program on on on the main interface. And we call it a transit XTP, basically. Okay, and and then we have another XTP program on the VF pair that connects to a container or VM, and we call it a transit agent XTP. Okay, and these two programs share EPP F maps, which are programmable from the user space by by the transit demon. See, you can see that here transit demon transit XTP and then you have a transit agent, STP, and you may have multiple networking interfaces on a host. So for each one of those interfaces, we have a corresponding transit XTP program, and that corresponds to a droplet so for each interface, my network interface you'll have a droplet basically so that's the, that's the kind of overall architecture so when the management from a management plane perspective, if we would like to push the configuration, you just make RPC call to the user space program, and these user space program that the transit demon will populate your regular maps hash tables inside the kernel basically. Okay, and in a cloud, in a cloud, when you're building network functions, network functions, the functionality you typically have can be pretty much reduced to three or four constructs. And they are typically, you either encapsulate or decapsulate a packet, or you modify the outer packet header and forward it, modify the inner packet header and forward it, or drop the unwanted packet. These are the typical, the constructs you employ when you're building services such as, you know, NAT or your load balancing and, you know, so we are able to perform all of this and are able to build the network services using the 4K XTP program basically. So that's kind of a high level architecture, what we have. Now drilling down more into it, so if you drill down more into it, the perform packet architecture standpoint, we have these transit XTP programs running on the RxQ of the NIC of the main interface on every host. Okay, and we have a transit agent running on the RxQ of the VF pair. So whenever packet comes in, the Ingress packet comes in, we can redirect it to the TXQ bypassing the entire kernel in the root namespace to the container. Okay, and for the Ingress packet, we take the packet from the container and redirect it to the TX path on the main interface. And then again we bypass the root namespace. So that's kind of a high level architecture we have for Vita. Okay, now, so you can see that, so what we have done is, so we have done, what we have done so far is that we have actually we look at the network interfaces on all the holes in the data center and we treat them as programmable tiny servers basically. So you have these tiny servers running on the every network interface of the every host in the data center. So we can load programs there. We can have regular data structure and then we have a mapping table that has a mapping. So we have an endpoint. Where is it hosted at that endpoint? So you have that mapping on those holes, all the holes in the data center. And then we put all these exact same programs overall the host in the data center. So that's what our basically the overarching architecture is. Now, so what we do is on a management plane perspective, we assign label to one of the holes and we call it a bouncer. Bouncer is just an abstract object in the management plane basically. So now, when we creating a new endpoint. So as you know, as I mentioned that in initially that when you're creating a new endpoint, you have to go across all of the holes and then update the flow rules on the virtual switches of every host for that network. As opposed to that, when you're creating a endpoint container, for example, we do two RPC calls. The first RPC call goes to the host in which we are going to provision the container and basically tell the XTP program that your bouncer is on host B. Okay, so you can see that here to be adding this new endpoint and 001 and we tell that endpoint your endpoint. Your bouncer is on host B. That's the only thing we do. And then we have another RPC call that tells that RPC call to the bouncer hose that we call the bouncer hose that 10.001 is hosted at host A. So we just need these two RPC calls to provision the container and that's it. That's pretty much it. So let's say now we add another container. So we add another endpoint and we do exactly the same thing. So we're telling the bouncer on host B that we have another endpoint in 10.002 which is hosted at host C. Okay, and that's it. And we're all done basically. And on host C, we tell that endpoint 10.002, your bouncer is on host B, basically. Okay, so these RPCs are standard RPC call, GRPC call basically. They typically take, for example, 20 milliseconds for RPC call to the bouncer to update the bouncer. And then maybe more or less about 300 milliseconds or so to provision the container endpoint. Now, from a packet processing perspective, so what happens is so when the endpoint on host A wants to communicate with endpoint on host B, the first step is that XTP program takes the packet and encapsulate into a Gini packet and then transmits the packet out to the bouncer because that program doesn't really know anything about the network. Okay, he doesn't know where the host, the endpoint 10.002 is, except that you have a packet send it blindly to the bouncer. So it's sent it to the bouncer B. Okay, now, so this is where the packet goes to bouncer B. So, so host B is where we actually have the mapping. Okay, that says that 10.001 is the host A and 10.002 is on host C. So we rewrite the outer packet. So on the bouncer, when the packet comes to bouncer packet rewrites the outer packet to host C and sends the packet out to host C. And then we're done basically. So packet originated from host A went to bouncer and from bouncer send it out to host C. Okay. However, you can see that there's a problem there, you see, so there's by doing all this, we've added additional extra hop in between. So every packet that goes from source endpoint to the destination endpoint, you have to go through a bouncer to do that. So that's that's problematic. So when you're dealing with the line rate performance of packet processing, this is a major problem. Okay, so we so we decided, okay, so how are we going to solve this problem. So we what we did was we added we modified our protocol so that we actually bypass in order for us to bypass this hop. Okay, so we changed the XTP program and added in our request basically see and changing the XTP is pretty straightforward actually dynamically. We just write the C code loading actually on the fly without any packet interruption at all, you see, and also modified a little bit. The transit agent XTP program behavior in which we have to add a little bit of more information into a Geneva option. We'll get into that detail actually. So, so now what happens is, so we, so, so the art request comes from host A, okay, and comes from the endpoint 10 001, and then he says any queries for host 10 002, where is 10 002 goes to the bouncer, the art request. But the bouncer has the art responder capability built in. So what it actually does is it applies back to the host A says that I know where 10 002 is, and he modified the Geneva option saying that 10 002 is actually a host C. Okay, so when the packet is received back at host A, it adds this to the local endpoint map. So that's saying that you know 10.002 is at host C. Okay. So, so now when the first packet for the, so now the art processing is over. So host A knows how to reach 10.002 because he knows that that endpoint is in host C. Okay, so when the first packet comes from host A, the flow coming out from 10 001, we added just one single bit in the Geneva packet and say that this is a direct message. Okay, so now when the packet is received on host C, it knows that it actually came from host, which is hosting 10.001. Okay, so then we populate the endpoint maps with 10.001 on host C, and now we have direct connection for the host C. We say when the packet on the way back, he populates that table saying that host 10.001 is on host A. Okay, so the populates the endpoint map with 10.001 to the host C, and now we have a direct connection for the first packet in the flow between the two containers. So, so and then from then on all the remaining packets, they go direct basically from host A to host B without going through going via a bouncer. So we gotten rid of the extra hop in the middle basically. So what we have done so far is that we have provisioned an endpoint in a very constant time across the number of RPCs to RPCs as I mentioned. And all the packet processing is actually fast back fast back mean that directly because we take hosts are directly talking to each other after the initial out call and our responder call. And the fact that means that doesn't really take any packet and send it out to open flow controller or everything is happening at the device that driver for the name. Okay, and the containers. In turn, they receive the packet directly from very first packet itself all the time basically from then on. And there's no need for power flow monitoring just like and draw it up folks to that you know. So this is all direct direct path all the time basically so so so this is great. So this is basically a very base level functionality we build which you can typically build that using OBS as well but obviously is much more optimal and much more efficient. But we went ahead actually we went further we we we actually extended this to add a lot more functionality to it. Okay, so because if you think so far what we actually done is we have replaced the functionality of open v switch Linux page and IP tables with a simple simple 4k xtp program. Okay, so now, so we thought okay so let's actually modify the architecture of the xtp program the main xtp program it a little bit. Okay, so we have instead of having one single xtp program we have many of them. Okay, and we attach one of the xtp program to the Nick and call it a primary xtp. Okay, and then when we and then we attach multiple other xtp programs and then based on certain matching condition in the primary xtp program, like, for example, matching an IP address for an endpoint, we can call an xtp program so by doing that you can see that you can build a chain, actually the chain of functions. So the rest of the session is going to be covered by FoodTran and with that I'm going to hand it over to Foo for take it over Foo. Thanks. Thanks Deepak. Hi everyone, this is FoodTran and I'm a developer on the Mizar project. Today I will be going over the management plane, some numbers and a quick demo for Mizar. The Mizar management plane is built with Kubernetes. Using CRDs, we are able to extend the Kubernetes API with our own custom objects. Some of these objects are generic to any networking solution, while others are specific to Mizar. Using the Kubernetes operator framework, we are able to extend Kubernetes with domain specific controllers for our CRDs. Three fundamental components make up the Mizar management plane. These are CRDs, workflows, and operators. Our objects are defined through the Kubernetes CRD API under the Mizar.com resource group. The operators then expose interfaces for us to act on these custom objects. Finally, the lifecycle of these objects are then handled by workflows. Each of these workflows are triggered by state changes in the respective objects. For example, the droplet object, which represents a physical interface on a node, will trigger the management plane delete workflow if its corresponding physical interface were to be removed. The Mizar management plane defines six CRDs. These are VPC, divider, network, balancer, endpoint, and droplet. The VPC object details information about the VPC such as its site range, its ID, and its list of dividers. The divider object has information about the divider's parent VPC and its host information such as the IP and MAC of the host. The network object has information about the network's VPC, its own site range, and its list of balancers. The balancer object details information about the balancer's parent network and its host IP and MAC. The endpoint object has information about the endpoint's type, its parent network, and the endpoint's IP and MAC. Finally, the droplet object has information about its current host interfaces IP and MAC. In addition to our custom objects, we also handle the built-in default communities objects. Currently, we handle communities pods, nodes, and services. Pods and services directly relate to Mizar simple and scaled endpoints respectively. Nodes and their corresponding interfaces correspond to Mizar droplets. For example, if a pod or service is created, the Mizar management plane will trigger a workflow to create endpoints. Similarly, if a node is added to the cluster, a droplet object will be created for each interface on that node. Now we will go over some preliminary performance numbers for Mizar. In our setup, we saw that Mizar achieves near line rate packets per second at around 600k pps. With direct path, Mizar minimizes the roundtrip time and is faster than OVS. Even with the first packet going through an extra hop at the bouncer, the pps processed by the endpoints remains close to line rate. Mizar has minimal memory overhead, both during at idle and during performance tests. With 100 endpoints per host, the memory overhead on the bouncer remains at baseline level. However, the endpoint host memory utilization increases as more endpoints are provisioned. This is an improvement that we can do by having all endpoints on the host share the same transit agent. Compared to OVS, Mizar has significantly less CPU overhead on both the bouncer and endpoint host. Currently, TCP performance on Mizar caps out at around 4 gigabits per second. This can be improved by running the XDP programs in driver mode, which requires NIC support. Furthermore, XDP currently does not support hardware check summing and TSO. However, there is ongoing work to support this. Alright, let's roll the demo. For this demo, we will use Docker containers to simulate physical hosts. In this first part, we demonstrate intro and intro network connectivity. Network isolation and the effects of removing and adding bouncers and dividers. Here, we are sending updates to the daemon running on each host via RPC to provision VPCs, networks, dividers, bouncers and endpoints. In this next part, we show the effects of adding an additional bouncer to reduce network congestion. Here, we demonstrate how Mizar direct path works within the same network. Next, we demonstrate Mizar direct path across subnets. Finally, this part demonstrates Mizar on Kubernetes and the scaled endpoint being used as a replacement for the traditional implementation of Kubernetes services. So that's it for today's presentation on Mizar. Thank you everyone for tuning in. Please visit github.com forward slash futureway hyphen cloud forward slash Mizar to learn more and try it out for yourself. Also, please look forward to the next presentation about Arktos and its usage of Mizar as a networking provider. Hi everybody. Hello. Yeah, we're going to give you back. Yeah, so we just, yeah, any question. I know there were a bunch of questions in the chat window and we've been answering those questions. So before we transition to the next presentation, we set aside some time, you know, if you folks have any questions, who and I could answer. So I think we can, if you folks do have some questions about it, you know, about Mizar networking, we can always do that at the end of the session tutorial as well. So in the meanwhile, you know, we'll hand it over to Xiaoning and Hongwei for their presentation. Go ahead, Xiaoning. Thank you, Deepak. Hi, everyone. My name is Xiaoning. In the opening remarks, we mentioned that there are two projects, Arktos and Mizar in Centaurus. Deepak and Fu have talked about Mizar. In our presentation, Hongwei and I will talk about Arktos and how it works with Mizar. We have a recorded video. We'll play that. And in the meantime, Hongwei and I will be online to address any questions. Feel free to read the questions if I have. Hi, everyone. Thank you for coming to this talk. My name is Xiaoning. In this talk, my colleague Hongwei and I will talk about Arktos and Mizar. In the previous talk, Deepak and Fu have shared a lot of information about Mizar, including its architecture and how Mizar works. In this talk, we continue to discuss Mizar, but from a different perspective. Mizar is an independent network service. This means it can support different computing clusters, like Kubernetes or OpenStack Nova or other computing clusters. In this talk, we talk about Arktos, our own computing cluster, and how it network model looks like, and how we use Mizar to implement such a network model. This is our agenda. Our talk mainly includes three parts. In the first part, we will first give background information about Project Arktos and Mizar. So you can know what Arktos is and its relationship to the Project Mizar. We also give a brief introduction to the three major features in Project Arktos. As you will see, Arktos is derived from Kubernetes. So starting from the second part, we will talk about Arktos network model. You will see how Arktos network model is different than the flat model in Project Kubernetes, and also what kind of elements we have in Arktos network model. Arktos network model can be implemented by different network providers. In the third part, we will focus on how we use Mizar to implement Arktos network model, including its CRD-based control plane and XDP-based data plane. I will cover the first part, and then we will talk about the network model and the control plane in the second part. And I will continue to talk about data plane after the hallway. Okay, let's get started with the first part, Project Introduction. The first question you might have is, what is Arktos? Arktos is one of the two major projects on the Project Centaurus Unbranded. Arktos is for the large-scale cloud computer orchestration, while Mizar is for cloud virtual networking. Arktos and Mizar, the six side-by-side, work together to build a large-scale cloud infrastructure. If you are familiar with Project OpenStack, you can think of Arktos as the rule of Project NOAA in OpenStack, while Mizar as the rule of Neutron in OpenStack. This probably will help you understand the relationship between Arktos and Mizar. Why do we want to start Arktos? Arktos vision is a large-scale open-source cloud infrastructure for next-generation workloads. When we talk about next-generation infrastructure or next-generation workloads, the way we think of it is, if we look at it from an application perspective, these applications are not only VMs, they are also containers, functions, and not all of them will be AI-driven applications. And if we look at infrastructure, the infrastructure is not only about cloud data centers, it's also about exercise and 5G at the center. And combining these two different perspectives, Arktos want to build a next-generation cloud infrastructure with building optimizations for these new workloads types and leverage this new infrastructure better. Because these workloads are not only about VMs, not all of them will be containers. Arktos decided to start with Kubernetes. It's derived from Kubernetes codebase. Therefore, we can have the mature container support that's already available in Kubernetes. But on top of that, we made lots of core fundamental design changes. We already implemented three major features here. One is the unified container VM orchestration and cloud scalability and multi-tenancy. We are also planning some other features like cloud and edge scheduling, cloud edge secure communication, and AI optimizations. But for today's talk, I will give introduction to these three implemented features in the next few slides. Before we jump into the details of these three major features, I want to first give a high-level overview of the relationship between Arktos and MISA. This diagram shows how Arktos and MISA work together in a cloud data center. You can see all the hosts in a cloud data center are managed by Arktos and MISA at the same time. On each host, we have a computing agent which refers to the Arktos control plane. And we also have a networking agent refers to the MISA control plane. These components work together to schedule workloads on different hosts and make sure these workloads can communicate with each other. What I want to say here is, while this is a recommended combination, both Arktos and MISA are independent service. They can work with different computing clients and networking services. For example, Arktos's network model is based on a plugin style. For MISA, we have a plugin called Network Controller for MISA. It's running in Arktos and interacting with MISA control plane to finish the integration between Arktos and MISA. If we want to use another network service, like OpenStack Neutron or AWS, we can also do that. We just replace this plugin, this controller. Arktos itself is neutral to different network providers. On the networking part, MISA itself is also an independent network service. It provides its public APIs and it can work with different computing clusters. Regardless, it's Arktos or Kubernetes or other computing clusters. Both of them are independent cloud services. They can work together, but they can also work with other networking or computing services. This is the architecture of Arktos. As we mentioned earlier, Arktos is derived from Kubernetes. If you are familiar with Kubernetes, you will see some similar components here. For example, we have API server, we have data stores, and we have a scheduler, and we have different controllers. The key design changes we made here first is for scalability. For example, for data stores, we support multiple IDCD clusters. For a very large cluster, we cannot save all the data into one IDCD cluster. In this case, we support multiple IDCD clusters and they can work together. Also, for the API server, in Kubernetes design, one API server will hold all the caches. This will not work if the cache data is too large. We implemented partition for API servers. For Kubernetes controllers, in the original design, they are active and standby architecture. Any time there is only one controller instance is working. We implemented the active architecture. All the controller instances are working at the same time to balance the workload and provide high availability. This is about scalability. We also implemented built-in multi-talency. In Kubernetes, there is no multi-talency. They only provide some level of isolation using namespace mechanism, but it is not strong enough for multi-talency cloud infrastructure. We implemented a strong multi-talency model. Each talent won't be impacted by other talents. In fact, they don't see each other at all. The third part is we implemented the unified container and VM orchestration. The original Kubernetes, they only support containers, but we extend the pod definition to VMs. Both VMs and containers can be scheduled and handled in a similar way. You can just use one resource pool to support all these workloads. I will cover all these details in the next few slides. In addition to these three features, we are also working on some new features like cross-cloud and edge scheduling and cloud and edge security communication. This we will talk about in some other talk. In these next three slides, I'm going to cover these three major features with a little more details. The first feature I want to talk about is the native VM support. Actos is derived from Kubernetes, but Kubernetes only supports containers. We know in reality there are lots of VM applications in addition to container applications. Of course, you can use two different systems to orchestrate these containers and applications. But having two different systems will lead to many problems. For example, you need to maintain and evolve two different systems, and these systems have different resource pools. It's not easy for you to move some machines from one resource pool to another resource pool. That's why we started this feature, native VM support. We added VM support into Actos. So with one system, we can support both container application and VM applications. In Kubernetes, there are also some efforts to support VMs, like KubeVert. But they are taking an add-on approach. That means they introduce a new object, like called VM, for VM applications. And on top of that new VM object, they have a VM replica set, VM state of set, etc. What we do here is we have a native VM support. That means we didn't introduce new objects. Instead, we extend the pod object to contain the VM applications. On the left side, I have a screenshot. You can see within a pod, we can have a VM or have containers. And the experience is very similar to customers. They can use one pod object to contain VMs or applications. The good about this approach is because we are extending this pod object. All the other components on top of pod can work the same way and work with the VM applications as well. So we can have unified scheduling, unified controllers and agent. We don't need a separate schedule or controllers for VMs. After we have this native VM support, the value for customers is first, they can run their legacy VM applications and their container applications together on a single cluster. They don't need to separate different clusters for different applications. And second, in the container world, there are not so popular, very useful workload patterns, like a replica set, state of set, etc. Now they can apply these orchestration patterns into VM applications as well. It's not only used for container applications. Now it's kind of 1 plus 1 greater than 2. And the value for cloud providers is first, they can only have one single resource pool. By resource pool, I mean this machine pool. This machine pool is shared by both container applications and VM applications. They don't need to allocate different small resource pools. The resource utilization is much better. And second, they can just maintain and operate one single software stack. They don't have to maintain and evolve two different stacks. That's a big advantage of this native VM support. The next feature I want to talk about is multi-tenancy. In Kubernetes, there is no multi-tenancy. This provides namespace as a basic mechanism to isolate workloads. But it's not enough for a true multi-tenancy environment. For example, multiple talents need to make sure there are no naming conflicts for namespaces. And there are also not so non-namespace scoped objects which are shared by all the talents. So in Actos, we implemented a built-in strong multi-tenancy model. We introduced a new isolation concept called talent space. All the API resources are put in different spaces. Each talent has a space. And by default, a talent cannot access other talents' space. So first, they are strongly isolated. And second, because all the resource objects are confined to one space, they don't need to worry about these naming spaces or different access controls. They simply have their own copy of the original resource hierarchy. Here, we show an example how a talent has its own space, T1, and they can put any resources under this space. And in the backend storage, we also put this space into the storage key. So they are stored separately. But this changes the API format and resource path. So we introduced another feature called a short path. Basically, customers can still use their old API and the URL to access their resources. Inside the API server, we have a module which will dynamically translate this short path to the internal full path. And we do this based on the access credential associated with this request. So we know which talent space this request should go. Even the customer didn't specify the talent space. With this approach, we implemented a transparent, strongly isolated multi-talency model. Now, different talents can share a physical class, but each of them feel like they are using this cluster exclusively. They are not impacted by other talents and they do not even aware of the existence of other talents. The last feature I want to talk about is scalability. Kubernetes only supports a few thousand hosts, but this is not enough for a large-scale cloud platform, especially the public cloud. In Actos, we made a few core design changes, and our scalability goal is to support 300,000 hosts with a single control plane. In order to achieve this goal, we basically make sure all the components in the cluster can scale out. First, for the LCD data store, we support multiple LCD clusters. And we change the resource version generation mechanism to ensure these different clusters can work together, just like a virtual large cluster. And for the API server, because each API server holds a copy of the cached data, we introduce the partition into API server. So API server can have different groups and they can scale out. They are not limited to the capacity of a single API server machine. For the controller and schedulers, we also introduced the workload partition. Based on the hash key of the workloads, they are distributed to different controller instances, and these controller instances are working in an active way, so they can balance the workloads and also are highly available. With these three major changes, we make sure all the components in the system can scale out. There are no single system bottlenecks. I think this is pretty much what I have for the first part. To do a quick recap, I introduced what Actos is, its relationship with MISA, and the key features in Actos. Next, I will hand over to Hongwei. Hongwei will introduce the new network model in Actos and how we use MISA to implement this network model. Thanks, Xiaolin. Actos is developed on top of Kubernetes. Kubernetes has a very simple network model, which is a flat one. Look at this picture. All of us connect to each other by their IP addresses. The IP address is assigned from a single shared IP pool, which was specified when the cluster was bootstrapped. The IP address assigned to each part must be unique. It can't duplicate. This single IP range conflicts with Actos multi-tenancy because each tenant doesn't want to deal with the potential IP conflicts with other tenants. They just don't care. Look at this picture again. There is another issue. Any path is able to connect to all other paths in the cluster. This is fully connected, and this is a default behavior, which might not be desired in some situations where security is required. In order to overcome this limitation, Kubernetes introduced network policy to provide some network security. However, the security provided by this means is not very secure. It is able to regulate good guys, but it fails to prevent bad guy doing malicious things. Let me explain a little bit. First, abuse or even misuse of neighbors would break the planned security, which is greater in nature anyway. Second, it does not prevent a packet of slipper put at some places where the traffic is passing by and eject out the sensitive data. This is not secured. Besides that, many implementation for network policy security are leveraging a Linux feature called Net Theater. It might sound good to have final green network policy. In network policy, however, in nature, in reality, it needs to massive number of IP table rules, which in turn need to non-negligible performance heat and increase network latency, which is not so good. In this flat-nug model, Kubernetes only have single shared DNS. This DNS single is not desired by Arktos multi-tenancy because each tenant only wants to see the name records of that tenant and does not want to know any name records from any other tenant. Since Kubernetes flat-nug model cannot meet the requirements of Arktos multi-tenancy, it is necessary to introduce a brand new noug model here, which is isolated one. The primary goal for this isolated model is to provide strong isolation across tenant spaces and across network boundaries inside same tenant spaces. The strongest nation means resources allocated in one scope are unrelated to resource allocated in another scope. Taking a picture here, the resources allocated in tenant A has nothing to do resource allocated in tenant B. Looking inside tenant A, there are two networks. Resources allocated in one network has nothing to do those inside another network. They are totally isolated. In one network, it is still possible to apply the network policy to provide one more layer of network security if you want to. The network policy-based security is supplementary to the network-based strong isolationation. They can live together. So we have a brand new isolated noug model. The color system of this isolated noug model is network. The network is introduced by Arctos and it is a new custom-defined resource type. Below is an example of an object of that type. The network object defined in this way represents isolation boundary of noug resources. This list are some network resources that are network-specific. For example, parts of one network cannot access to and cannot be accessed by parts from any other networks. In this way, it provides a very strong isolationation with regards to the network resources. We have described Arctos isolated model and its core concept of network objects. Then how does Arctos implement such an isolated model? The answer is actually Arctos by itself cannot achieve this model alone. It needs to work together with so-called network solution provider to fulfill this goal. Looking at this picture, the API server, which is the central data store for Arctos system, keeps the critical data needed for running cluster, including parts, services, and network objects, of course. Besides other building controllers, Arctos puts two additional controllers specially proposed for multi-tenancy and nerve isolationation. One is tendon controller, the other is nerve controller respectively. Whenever a new tendon is created in the system, nerve controller kicks in to ensure that there will be a different network object created for that tendon. Whenever a new network object is created in the system, the nerve controller kicks in to take action to create essential network infrastructures such as DNS. Besides the building Arctos components, the external network provider should put some needed components running as part of Arctos master. These running components are collectively called network provider controller suite. Typically, they include various controllers for node, for network, for service, for parts and ingress, etc. So, the Arctos building components such as API server and building controllers with the joined force provided from network solution provider implements such an isolated model altogether. We had talked about Arctos and the external network provider in general. Talk about how they work together at a very high level. Now, let's look at specifically Arctos working with Mizar. Mizar is a recommended network solution provider of Arctos. It has been introduced in previous talk. First, let's take a look at the node provisioning process. A new node joins the cluster. Arctos registers the cluster in the API server. The Mizar node agent will be installed by the demon set, which is already running on the workload. Meanwhile, Mizar node controller will detect the new node and notify the Mizar system requesting the Mizar system to prepare the node. How the node preparation goes is totally Mizar network provider implementation detail. After the node has been prepared inside the Mizar system and the node agent has been properly installed on workload, that node can be marked as ready. And the Arctos will schedule part to run on that workload in the future. Next, we will take another high level view to see what would happen if a new network, a new part, new service is being created. Look at this picture here. We have a tenant A already have two existing networks. Both have services, both have resources specific to them. The first thing we want to do is create a network. When the network is created, the DNS has been already deployed and running with the IP address assigned from the network specific range. The next thing is we want to create a new part in the network. The part that is IP address assigned from the IP specific range as well and is able to connect to other IP of the same network. For example, this part is able to connect to the DNS by the IP. The next thing we want to do is create a service. The service gets this virtual IP assigned and the Mizar network provider will finish the plumbing to map this virtual IP to the real IP of the part as a backend IP. What if now we want to add one more part to the same service? The new part will get this IP assigned properly and the Mizar network provider will pick up the event and updates the backend IP pool for that specific service. Then it will balance the node between these two parts instead of the previous only one. The point is Arctos working together with Mizar network provider is able to present a strongly isolated network model that concluded my simplified view of Arctos and Mizar control plane. I will hand it over to Xiaolin. Thank you, Hongwei. Hongwei just talked about Arctos network model and how we leverage Mizar to implement the Arctos network model. It's mostly about the control plane, about these CRD objects. In this last part, I will talk about the things happening on the workloads including what components do we have on our workload and what happens when a pod is launched and what happens when a pod communicates with other pod and communicates with the service. Let me first start with the components we have on our host. If we look at this diagram, this shows all the components we have on our host. First, we have user space and kernel space on the host. In user space, we have two agents, the Arctos agent and Mizar agent. They each talk to their own control plane and listening for the instructions from the control plane. We also have a CNI plugin for Mizar. This CNI plugin is invoked by the Arctos agent, but it will interact with Mizar agent to finish all the wiring work when a pod is launched. First, when this host is initialized, this Mizar agent will attach the XTP program on the main network interface. This is to process any incoming package. At that time, because we do not have any pods scheduled on the host yet, so we do not have any purport network name spaces. So what happens when a pod is created? First, when a pod is created, the Mizar network controller, or the plugin in Arctos, it will watch the event of the pod creation. It noticed a new pod has arrived. That plugin will call the Mizar control plane to create a corresponding endpoint for that pod, basically assigning the IP. And the Mizar control plane will notify the Mizar agent on this host, telling him, hey, I have a new endpoint created on this host. You need to initialize the virtual network interface for this endpoint. What does the agent will create is a v-spaire and attach the XTP program, the transit agent, on one end of this v-spaire. Initially, this v-spaire is created in the host network name space, because at that time, the pod hasn't arrived on this host yet. It doesn't know which network name space will be assigned to that pod. After that, the Arctos agent will get the pod definition from the Arctos control plane. It will create the network name space for that pod. And it will call the Mizar CNI plugin to initialize the virtual network interface for that network name space. Here, what happens is different, depending on its VM pod or container pod. If it's a container pod, the work is simple. The CNI plugin will move one end to the other end of the v-spaire into the container network name space. And this end of the v-spaire will act as a v-nick of this container. So any packets arrived on this end of the v-spaire will go out on the other end and captured by our transit agent XTP program. What happens for the VM pod is a little different. The CNI plugin will also move the other end of the v-spaire into the network name space. But because Kumo now cannot use this end of the v-spaire as a v-nick directly, we have to do a little more wiring work. We will have a switch here and we create a tap device attached to this switch. And we use a file descriptor of this tap device as a parameter to the VM, the Kumo process as a v-nick device. We are trying to optimize this part to gain better performance. But for both cases, when this wiring work is done, the CNI plugin will return to the network agent, basically telling that the virtual network device has been set up in this network name space and the agent can continue to set up other stuff and start the pod. This is what happens when a pod is launched on a host and how these components work together to set up the virtual network device for this pod. Now let's see what happens when a pod tries to talk to another pod with Miza's XTP in networking. This diagram shows the components on two different hosts. And we can see we have two parts, pod1 on host1 and pod2 on host2. Each pod has its own virtual IP. What happens when pod1 talks to pod2 is first, the user space pod application. It causes some socket API to write some buffer data. And this part of data will be encapsulated by the network stack in this pod's network name space because each network name space has its own network stack. It will eventually become L2 packet and arrive at this end of this pair. Because it's a V-spare, the packet will go out on this side and then captured by the XTP program. This XTP program, the transit agent, will encapsulate this packet to an overlay packet to be exact. In Miza case, it will be a packet based on GNU protocol. And this overlay packet will go out through the main interface and to one of the bouncers. On this host, on the source host, it maintains a list of the bouncers for each subnet. So based on the pod2 IP subnet, it knows what bouncers it has. And then it will randomly choose one bouncer based on the hush value of this five-taple. Let's say in this case, it chooses a bouncer on 10.003. So the outgoing packet will have an outer destination with 10.003. When this packet arrives at the bouncer, the bouncer maintains a map. It knows which host has which endpoint. So it knows that the target endpoint, the endpoint of pod2 is actually hosting on host2. So it will modify the outer destination to 10.002. Actually, I made a typo here. Here should be 10.002. So the packet goes here. And on this host, the main XTP program on this zero will decapsulate this packet and change it to an inner packet again and send to the corresponding respares. And then they go up all the way to the pod2. This is how the communication happens with this mid-IZTP program. But one thing I need to emphasize here is after the first packet, the XTP program on the source host will receive a packet telling it that the endpoint2 is actually hosting on the host2. So it will have a cache item on here. Next time when another packet arrives for this endpoint, it will send directly to the host2. So it won't go through the bouncer every time. This will save the latency. The last part is about the pod2 service connectivity. In Kubernetes or in Arctos, a service is like a load balancer. It provides a stable service IP to a dynamic backend pod port. So you don't have to use this dynamic pod IP directly. You can use this stable service IP. What happens with Arctos and Mesa is similar to the pod2 pod connectivity. Here, the pod application just sends the packets to the service IP. So here the service IP is 192.1.2. So this service IP is still in the overlay packet. It's in the destination. And what happens at the bouncer is the bouncer maintains a hash table. It knows what backend pod is included for this service IP. So when you see such a service IP, it will dynamically change it to one of the backend pod IP. So you will see for this overlay packet, the bouncer changes both the outer destination and also the inner destination. And then this packet will be sent to the backend host and similar to what happened to pod2 pod communication. It's decapsulated and sent to the target pod. The direct path also works here. After the first packet goes through this process, the catch item will be created on the source host. So the next time the packet will be sent to this backend pod directly instead of going through the bouncer again and again. This is the end of our talk. So we talked about what Aqtos project is. It's a relationship with MISA and its key features. And we spent a lot of time on Aqtos network model. We talked about how it's different than the Kubernetes flat network model and how we can leverage MISA to implement such a network model, including the CRD-based control plane and the XDP-based data plane. If you have any questions, feel free to bring up. We also include some links here. It points to our project repo and the design docs, roadmap, milestones. Feel free to check out. That's all. Thank you. Okay. That's our presentation. Before we move forward to the next presentation, is there any questions for this talk? Okay. If you have questions, you can put it in the Q&A session anytime. Feel free to visit our GitHub repo and post that issue or raise it in our SNAC channel. Without further ado, I will hand over to Ding for the next presentation. Hi, everyone. This is Ding. I'm from Futureware Cloud Lab. So in this session, I'm going to talk about Kube edge. So the presentation will have the background of the Kube edge, then the architecture discussion, then we'll go where the community building stuff to show how we open source this project and it's growing healthy. And the last part will be a tutorial I will show you. It's very easy to build and deploy Kube edge based on Kubernetes. The last part will be Q&A session. Kube edge. So Kube edge is a CNCF project. So this one we donated to the CNCF at early beginning of 2019. So in March 2019, Kube edge entered a CNCF sandbox. And we just graduated in September. We entered a CNCF incubation phase. So basically we graduate from sandbox and enter the incubation phase. The next step is to graduate from incubation to become a formal CNCF project. So Kube edge. Kube edge is built upon of Kubernetes. So we use the Kube edge to provide a fundamental infrastructure to support network application deployment, lifecycle management, metadata synchronization between the cloud and edge. So the motivation behind this is about Kubernetes is a strong, it's a popular tool, a platform to manage resources inside the data center. However, with a lot of hybrid cloud and also a lot of nodes in the far side. For example, a small server running on the enterprise network or even a very small IoT gateway running inside your home. But you want to control it from the cloud. So it's not possible before. And to a developer application on this IoT gateway or it's a really high bar, we probably require an embedded engineer to build an embedded application on such an IoT gateway. Also the OTA update or application deployment and management. We require a physical to show up or you need to build the OTA system yourself. It's not easy to maintain. So in order to leverage Kubernetes, a flexible application deployment and management. So we build Kubernetes onto the community to solve this cloud edge communication problem to provide network and their application deployment and management issues. So the major challenge to solve these questions. First, network reliability and bandwidth limitation because Kubernetes just solved a problem inside the center. So master node and work node have a fast and reliable network connection. So the latency is very low and the bandwidth you can assume almost unlimited. However, if the work node is on the edge side, that means it's connected through a ISP, it's a public internet. So the network connection is now reliable. It's maybe down and up. Also the bandwidth has limited limitation. If at your home you may have 100 megabytes or even lower 20 megabytes internet connection. Even in a small enterprise or small office, you may not have a large bandwidth. So if you transfer all your data up to the cloud, it may drain your bandwidth. So you don't want to do that. And also there's maybe a resource constraint on the external. For example, if you're running application on a home use IoT gateway, it's may as low as 250 megabytes memory or even lower 128 megabytes. A memory is a very old model if fitted. And also the CPU is not powerful in that IoT gateway either. The other problem that the challenge we have is highly distributed and a heterogeneous device management for the IoT booming with edge computing. So there's maybe a lot of devices connected to the IoT gateway or edge node. But that is highly distributed even either from network, point of view, or the geo distribution. Also all the devices are heterogeneous. There's a lot of kind of devices could connect to the node. So that's a hard problem we need to solve. So in order to meet this challenge and solve the problems, Kube edge provides these abilities first, similar cloud edge communication. So this communication not only for the data but also for the metadata. So we made it transparent to the end user. So transfer data and metadata between cloud edge on the back end. Edge autonomy, basically as we said the network connection between the edge and the cloud may not be reliable. That means the internet could be done for the edge site. So if that happens, so we need autonomous operation on the edge. So after it is connected in the Kubernetes case, the master management plan may think the node is dead and evict all the application from the node and market is offline. However, in the edge case we cannot do that. We need a special treatment for that. Also we need the edge node to manage the application lifecycle on the node even though it is disconnected to the master management plan. And also after it's connection restored, we need to sync metadata from the cloud to the work node to ensure all the application and the resource are in the desired state that the management plan required. The third, we could provide a low resource. The low resource is only required low resource consumption. So basically we can let the cloud run is very resource-containing device IoT gateway. As we mentioned is 256 megabytes or even lower as 128 megabytes if we compromise a little bit the ability of the system. The last one is the simplified device communication. So when a device connect to your edge node IoT gateway, so we have the device twin and device sharing. So basically you can view all the device status from your portal on the cloud. And also you can control this device and issue your desired state from your cloud to your edge node eventually control your device connected to the edge node. Edge node that's very useful for the IoT and the industrial use case will go over some details on the following slides. Kubernetes architecture. So in this slide, I'm going to give you a overview of a Kubernetes architecture. You can see as I mentioned the Kubernetes of the cloud edge and also device connection problem. And then we build upon the Kubernetes. So in the center you can see that the Kubernetes are deployed. So on the cloud side, Kubernetes has a cloud core component. It's including a few new controllers. It's an edge controller that's basically edge node control, device controller that's control the device connected to the edge node. Sync controller that solve the problem to sync data and metadata between the cloud and edge. Cloud Hub is basically a component we connect between. It's the endpoint connect to support connection from the edge side. So on the edge side, there's a main component we go back has is a edge core component. That's basically a derived from Kubelet is control the container life cycle on the edge side. So on the edge we support mainstream CRI runtime, including a container D Docker CRIO. We also have a brief connect support for the CNI to connect to network and also the CSI. So another for the device, there's a mosquito component, basically we support MQTT protocol. So device can use a model bus Bluetooth or the industry OPC UA protocol to connect to the edge node. And in this case, we can reflect. That's a pop-up model. So in this case, from cloud side, you can see the status of the device connected to the edge. And also you can control it between the cloud edge. We have a connection through WebSocket connection because we do this way. Besides WebSocket, you can also choose a quick protocol to connect the edge to cloud. We designed this way because in most of the case, the edge is running behind an enterprise firewall or your home firewall. Your edge node probably doesn't have a public IP, so you cannot control or connect to the edge node directly from the control plan or cloud core from the cloud side. So what we do is we initiate a connection from edge to cloud, set up a WebSocket connection, the long connection. So in this way, you can issue your control command from the cloud to the edge and the edge send data back to the cloud. Let's drill down a little bit. For the cloud side, as we mentioned, we have this component called edge controller, device or controller or device API, sync controller, edge controller controller, co-edge side-side driver and the mission webhook. So you can see we still have that Kubernetes master. The device controller, edge controller, sync controller, it's a CRD controller, the list watch to the Kubernetes, the end point is the API server. Then these controllers use the cloud hub to connect to the edge node. As we present in the previous slides, we use WebSocket protocol as default. You may choose a quick. And also we have a CSI driver and the admission webhook. That's for the connection control and the storage. So the edge controller to manage the node, the pod and the config map, that's the proxy. You can see it's a shadow management. You can see the proxy from the Kubernetes to the edge. And device controller is due to the device modeling and also shadow management for all the device on the edge side. Sync controller, as I said before, is a inconsistent data detect from the cloud to the edge. It's probably from a connection loss and a restore in that case. CSI driver is for the storage provisioning and admission hook is the API validation and best practice enforcement. Let's come to the edge side. We already talked about from cloud, we use cloud hub as an entry point and connection point for the cloud side to all connections from the edge. So in the counterpart, we have an edge hub on the edge side. Basically is take all the messaging and it's the hub to connect all that connection message sent back to the cloud and receive the message from the cloud. That's by default use WebSocket. And also we have a met manager that's a metadata manager. That's a load level metadata persistent. So we have a local edge store is a local database. And also we have a device twin, a device twin that for the protocol we sync device status from the device to the edge. Then eventually we are synced back to the cloud. HD and we are derived from a couplet that we call the couplet light. So basically is do the part management and to so we can create a part and delete the part also. And that one, so with that one we can control the part of the life cycle and create part even the connection between the cloud edge is broken. The last one is the event bus. That is basically MQTT client support pop-up model. We can collect the status from the device. So as I mentioned, couplet is already a formal CNSF incubation project is approved in September 2020 this month. And so we have a continuous momentum for this project. We already have more than 300. We grow from 30 contributors to more than 300 contributors. It's going to help stars with the more than 28 hundreds and the folks we almost 800 folks already. And we are checked more than 25 organization developers to contribute to our project. And we have maintenance from five different companies. Also, we have collaborations with other open source community. For example, we are actively involved in the Kubernetes IoT Edge workgroup. And also with RF Edge criminal project, we have a two BP blueprint project using cool edge. One is the idiot blueprint family is the use that for the lightweighted gateway project. And also there's another one called the cool edge at service blueprint. That's when it's focusing on the AI framework and building AI offloading machine learning on the edge adoptions. So because that's a successful and a very powerful open source project that we have more than 20 plus adopters. In this category IoT and hardware. We have ARM, Samsung, MQ and a lot of a carrier is a China mobile, China Unicom using this project. And also we have IT services and cloud provider. There's a few cloud provider already using cool edge building their cloud edge edge cloud computing product. At time academic. There's a few university. They have a networking cloud network network lab or IoT Edge lab participate in this project. Now let's go over a user adoption example. So that's a typical one. This is a highway system. So I call it a typical because there's a reflect the cat challenge I mentioned in the in the previous slides. The first one is highly distributed. So that means all the devices should be on the highway toe gate toe booth. So that means it's a geo distributed is highly distributed. And also does a also does is have a may not have reliable Internet connection because that's one is in the far edge. This may not have a strong mobile signal. They may not have a direct Internet connection. There was a mobile. A mobile network is not reliable. And if the signal is not a stable. So they may go up and down on the Internet connection. And the connection bandwidth is it's not a high enough. So they have a limited bandwidth. And also the is heighted a heterogeneous highly heterogeneous system. You can see some gate have 86 small server there. Some are using arms over and also all the device connected devices connected to the edge node are highly heterogeneous. They made from different vendors. So we need to tolerate. We need to handle all this devices. Also the system is huge. It's more than 50,000 edge node is managed in this system. So this system will be a Kubernetes plus Google Edge. And totally that's a more than half a million containers in the system. Every day is collect more than 300 million datas per day. So with this system, it's easy for the application developer to deploy a new application from the cloud to the edge. So and also we transfer edge data back to the cloud. So it's include take a picture of a lessons plate and do the transaction to charge the toe charge the toe. So with the system is improved the performance of the big, big performance boost from the old system. So every time a car is processed. It's from 15 seconds down to two seconds. And the protract is done from 29 seconds to three seconds. So the cool badge. We do the quarterly release. We do this because we want to be a release as the Kubernetes released. So whenever Kubernetes release we take about a month to incorporate a new API change or things and the release the cool badge release. So we also do quarterly. So you can see that's a lot from we entered since have sandbox. There's a lot of new features deployed it the developer. And we the three months released the quarterly release is happening on February, May, August. The next one will be November. So in the main release. We you can see we already have a content any content the support and in the main release. We verify the CIO integration. And we can collect a log. We not only do the deployment of the poly deployment from cloud edge and also we can cloud a collect a log from path to from edge to the to the cloud. In the most recent release we do a big enhancement for the device management because we have a we now have a IOT sick in our community and also we have an MEC sick focusing on. The mobile edge cloud computing in this thing. I will talk about the detail later. And also, as I said in the main release. We have a CIO integration. We have a CIO integration. We have a CIO integration. We have a CIO integration. We have a CIO integration. And also, as I said in the main release we support the collecting log to the edge. Then in the August release 1.4 We support a matrix collection. So we connected all the data to the metric server of a Kubernetes native metric server. So not only to show the cloud status and you have a perform data for the edge that that with that data you can connect to your portal or premises to do the monitoring. And also we do the agile certificate rotation. So to enhance the security. And then we are in August we support 1.18 Kubernetes and then in the next release we are going to support 1.19. So we follow the upstream Kubernetes closely. Now let's have a chance to see how easy to build a Kubernetes project and how easy to use it. So the Kubernetes build on goal language. So it's easy to build for ARM and Act 86. So you can see if you want to build on the ARM device. You just need to change your goal architecture to ARM and to build it for the edge. Because we that's easy to build the edge core component for the edge side. But for the cloud side I assume that you are using Act 86. And but it's easy to build also to build a cloud core part for ARM and Act 86. But for current users, they only require ARM for the edge core part component. So if anyone have a ARM want to try the ARM surround the cloud core cloud part we can also test on that. We have our CI and the E2E test on the Travis CI. So basically even for the cloud ARM side we already tested however the current users haven't used the ARM devices on the cloud side. For cloud edge yet. Deployment. So deployment just like Kubernetes we support. We build a new tool called the KE Admin. So that tool is like the cloud admin to deploy Kubernetes. We use this one to deploy cloud edge. So by default before 1.3 release we only need part 1000 for your cloud core part as I mentioned. That's for our cloud core that's the endpoint the user the edge to connect it to the cloud part. So you only need part 1000 open. So then you expose your IP basically your KE Admin in it. Advertise address basically the IP you are it all facing IP of the cloud core. So that accept the connection from the edge side. So from 1.3 we also require we ask to open the default part is 1002. That's for the log collections. Basically we need a separate port to collect the log from the edge side. So if you don't use that log control log. So you only need a part of 10,000 open. You don't need 10,000 to open if you don't want to collect log from or metrics data from your edge side. For the edge side basically that the cool edge work node equivalent to the Kubernetes workload is just on the remote far side on the edge side. So that one we first do the key enemy and get token to get your token. Then you will call key enemy and draw that's then you need to specify the cloud core IP and the port. And also specify your token so this way the cloud core can authenticate the connection to the edge. Then let the work on all the drama Kubernetes cluster. Then the we also can for the dev. We can support a we support you can build an advanced user. You can not using a tool but using your build locally and deploy locally. So if you want to set up a cloud side. So first there's a few CRD you want to deploy first one is the device CRD then device model CRD that the YAML file and also the object sync YAML. The last part is to cloud core. Then you start a cloud core process to the cloud core dash dash mean config to generate that cloud core YAML. Then you cloud core run cloud core using that YAML file as a config file. Then for the edge worker side basically work on all the side. So is first edge core to output your edge core YAML then you need to get your secret. To run this command to control control to get a secret. Then you update with your general update in your edge core YAML. The last one will be a start edge core process using the YAML. So that's a limitation for this cases because you need a route right to run this. So I will for the time limitation let me show a quick demo. So this demo will show basically show what I just presented how to deploy a kube edge and even deploy. Then after you deploy the kube edge how you deploy application. So you can see there's two windows the upper one is the edge node the lower one is the cloud node window. So currently the cluster let's show it. So currently the cluster only have the master basically the cloud node. So there's nothing and also no workload deploy on this cluster. Then now we as I said we use the key admin to start the cloud core part then let's see if the cloud core started. Yep we check the log to see if cloud core is started now let's go to the edge side. First let's get token for the edge core to use. Now let's start the edge core component. So using the token we just generated. So let's let's see it's really fast let's see let's check the log to make sure the edge core is running. Then let's go back to the cloud part. You can see if we kube control get node you can see besides master node there's another work node drawn we label the row is the edge. It's running a kube edge 1.3.1 release. So now the cluster have two nodes master node on the cloud and edge node on the edge. Now let's see because the kube edge is fully compatible with a Kubernetes API then you can use the kube control to deploy application from cloud to the edge. So the next step is to show you how to use the kube control to do the application deployment and the management. Now we just do a quick deployment that's a simple app this simple njx app so that's the deployment YAML. So we just use the kube control apply that deployment let's see okay let's see a new part is generated and it's running on the edge node. You can see the IP is edge node. Now let's go to the edge node you can see there's a new Docker container and the part is run basically the part is running on the edge side to show. Then you can access this part using the from remote. So basically we show. So in this demo we show in the slides we mentioned we show how to deploy the cloud core part on the cloud of using the kube edge part of cloud core and also kube edge edge part is edge core. Then how the edge node drawn the cluster as a worker node of a Kubernetes cluster with a special label is the edge row. And also we show we can use the kube control to deploy application from cloud to the edge. You can imagine if you have a kube native cluster running on public cloud and you can have a iod gateway if you are produced a lot of iod gateway to the end user you can control it from the cloud side you don't need to physically go there to do the update or app deployment. And also you don't need to worry about the OTA updates or application updates so that's all handled by the kube edge plus Kubernetes platform. So the kube edge community is a mature community. So we have our project website is on kube edge.io the code the code repository is on the github is already donated to the cncf so the owner is the cloud native foundation and the slack channel is you can. And I will. Happy to welcome people to join the slack channel main analyst. And also we have a community meeting every Tuesday or or Wednesday on Pacific it's Tuesday on Pacific time and Wednesday in the Asia time. So we have we hosted two different time one is more friendly to the North American the other one is more friendly to the Europe. It's easy to convert to a time zone we have the link. This meeting is open to everybody because the open source that we follow the. We combine with the cncf requirement we record every meeting and publish on the real tube channel if you missed the channel if you missed the meeting you can always watch the. Meeting recording from the YouTube we have our YouTube channel. Thank you that's all about mine now. Before I hand over to the next presenter any so any questions. Okay, so let's hand over to any so if you have any questions always welcome to ask me and also tomorrow I'm going to have a. Q&A on the slack channel is a. It's a 1pm I think it's the eastern time so you can always ping me at that time from the slack channel thank you. Any is all yours. Okay, thank you in the everyone. Thank you so much for dialing in today. And so today we spend majority of the time talking about the centaurus project here I just like to give you a little bit of a context is why we came up with this project. So as we are approaching to the 5G AI era we understand the workloads that we're facing today are very different from the workloads were used to the new workloads require scalability the new workloads require connecting to the edges. And which connect to thousands of billions of. IoT devices. So we understand that there's a need for this new open source cloud infrastructure project. This we created Centaurus Centaurus addresses the needs of the new workloads of scalability. And as well as the distributed nature of cloud edge architecture while keeping the unified resource management of and orchestration of various resource types such as VM containers serverless and maybe some possible future resource types. The current open source projects we see today, they either address mostly BM, you know type of workloads separate from the container type of workloads. So we think that moving forward having this unified way of managing orchestrating various workloads in a seamless way will be very important. So that gives you a little background why we came up with Centaurus project and the use cases can be very interesting. For example, you know for telcos, I'm sure telcos are thinking about, you know, coming up with this maybe 5G cloud that would help them service the, you know, help them support the 5G services they have to their clients. And for financial services, maybe they want to come up with AI cloud that can offer, you know, smart financial services to their partners and customers. And for medical research, maybe they want to come up with research clouds that can help all these research medical research professionals from all over the world to attack to tackle some research. For example, you know, a COVID-19 or maybe in the future we have some other diseases that we would like to mobilize the whole the entire global medical community research to do the research together so having a large scaled cloud can definitely help support that. So that's why we came up with Centaurus. And today you have heard quite a bit about META and Arctos. META is the networking piece, Arctos is computing piece and both projects have been open sourced for quite a while. We are being socializing those projects with the open source community at various KubeCon events in 2019 and 2020. And for this year, we sponsored KubeCon EU and also there's a community event in China. We also sponsored that event. And there's a KubeCon US coming up in November. We also have a sponsorship there so you can also meet us there. For those of you who have missed our presentations in the past, don't worry, you can always go to our website, Centaurus cloud.io website, and that's where you can get the past event recordings, blogs, and papers. And as you can see, you know, a mature cloud platform is not just networking and compute. We also need storage, we need security, we need identity, we need monitoring, usability, etc. So we would like to invite you guys, the open source community, to come join us and help us build Centaurus and build around all the feature functionalities and requirements and really make this platform a viable platform that will address the future, you know, AI 5G workloads. And we do this project as an open source project, so we do everything based on the open source practice. So in other words, all the designs are open on our website on GitHub, and we have open communication, we run open source meetings, we have zoom open source zoom calls, we have slacks, we have email groups, and, you know, everything we do is open. So we would like to invite you guys. Currently, we have about six member companies, we still have a couple more that we're talking, you know, they don't want us to use their logo yet until they get their legal approval. But we're getting momentum in terms of ecosystem building. But again, it's a very early days for Centaurus. So we would like to invite you to come join us and go to our website, go to GitHub, and we have a mailing list. And we definitely love to have you join us. Thank you. And for the rest of the time we have maybe we can open up for questions. Do we have any question on the, you can either send your question via chat, or you can just speak up right now. If there's no more questions, so shall we just join today's workshop. All right, well thank you so much for hanging there with us for the last three hours and we really appreciate your time again you know we would love to talk to you about Centaurus. Thank you. Bye bye. Thank you Annie. Thank you everyone. Again, thank you for attending the tutorial. I believe this tutorial is recorded and it will be make it public. We also pull all the slides, all the videos that into the Centaurus website so you can access from there. And for more questions we can, we can post it in the GitHub on the website and as well as a Slack channel. Again, thank you for attending the tutorials. Bye bye. Have a good day. Bye bye.