 I can get started. Can you guys hear me? OK. OK. OK. Good morning, everyone. Welcome to this Centaurus tutorial. My name is Moni, and I'm the project manager from FutureWay Technologies. Today, I'm here with two of my colleagues and one of our partners from Click2Cloud. They are Dr. Ying Xiong, Dr. Peng Du, and Rupo. And together, we'll be delivering this Centaurus tutorial for you, a Centaurus, which is a cloud-native infrastructure project for a large-scale distributed cloud. And here's the agenda for today. Our first start off the tutorial with a high-level overview of Centaurus. And Dr. Ying Xiong will dive deep into the octose architecture. Octose is our cloud-compute infrastructure in Centaurus. And follow up on that, Dr. Peng Du will be talking about the Mizar and the Centaurus edge deep dive. And to wrap up, Rupo will be talking, sharing some of the community updates, the events that we have done, and what our plan for next step and how you can get involved to be a part of the community. Now, without further ado, I will start off by giving a high-level overview of our Centaurus project. So what is Centaurus? Centaurus is an open-source platform for building a unified and scalable, distributed cloud infrastructure. And Project Centaurus actually joined the Linux Foundation in December 2020. And it was a great milestone for us that I would like to share with you all. And Centaurus unifying the orchestration network provisioning and management of cloud-compute and networking resources at original scale. It's also an umbrella project where we host four other sub-projects. Octose, which is our cloud-compute sub-project, means our cloud networking sub-project. For next is our Centaurus edge sub-project. And Elnir is our AI platform sub-project. Now, as cloud becomes the norm to run and develop current and future applications, current cloud infrastructure has to continuously evolve to meet new challenges and requirements posed by this next-generation application, such as AI, 5G, cloud gaming, federated learning, AI, VR, and many more. And this new type of application is actually driving the traditional cloud infrastructure into a more distributed cloud infrastructure. So here I would like to share with you some of the challenges that we see in today's cloud environment. The first challenges that I would like to talk about is being able to manage distributed cloud at a large scale. So when we move from a more centralized cloud infrastructure to distributed cloud, we need to manage distributed cloud at a very large scale. If you think about this, not just managing a few super large data center, we also need to think about managed edge clusters that are smaller and at a remote location. And our cloud infrastructure actually needs to be redesigned and optimized to manage constrained resources that are running at the edge side that are closer to customer and customer data. And similarly, we also need to be able to provision and manage cloud virtual networks, such as VPC, subnet routing rule, security rule, across data center and many edge locations. And it's actually very challenging if you think about edge location that has different quality of network and different network bandwidth. And in addition, cloud infrastructure also needs to support more and more types of resources or workloads. And it would be really helpful for cloud providers to have a unified platform, a unified infrastructure, to manage multiple types of computing resources, such as VM containers, serverless and server. And it's proven to be very challenging to have a unified stack to manage all of this computing resources altogether. And now the fourth challenge is that with distributed cloud, computing nodes are actually everywhere. It could be in a large regional data center or it could be a small remote location. And where to run customer applications to achieve high throughput, high performance, and low latency while still optimizing the resource utilization is important. So distributed cloud infrastructure need a global view of all resources. And we need to be very smart about where we allocate these resources to run multi-tenant applications to meet customer's needs. And lastly, as AI workloads becomes more dominant workloads for cloud and edge computing, we also need to optimize the same infrastructure to manage and run AI and machine learning applications between edge nodes and cloud data centers. And the resource scheduling also needs to be very smart on how we allocate GPU resources efficiently. Now, with all these challenges that I talk about in mind, I would like to give a high-level description of each of the sub-projects that I briefly mentioned earlier. And the first project I would like to talk about is Arctos. Arctos is an open-source project designed for large-scale cloud compute infrastructure. It's derived from Kubernetes with core design changes. And Arctos aims to be an open-source solution to address key challenges from large-scale cloud. And some of the key features I would like to highlight are large-scale ability. And with Arctos, we actually can now support large-scale infrastructure class management of 50k nodes per cluster. And our eventual goal is actually to support 300k nodes in one single original control plan. And the next feature is unified orchestration for a VM and container. And in Arctos, a pod can either contain multiple containers or one VM. And they are scheduled the same way in the same resource pool. And this enable organization to use a single convert stack to manage all cloud hosts in the future. And we're also working to support other compute workloads, such as serverless or bare-metal server. And the third feature I would like to talk about is the hard-building multi-tenancy. So basically, external organization can share the same physical cluster without trust among this organization. And it's based on the virtual cluster idea. And all tenants, well, all isolations are transparent to tenants. And each tenant feels like it's their own dedicated cluster for them. And in the following session, Dr. Nshun will be talking more about the Arctos architecture. And next, I would like to talk about the Mizzar, which is our networking sub-project under Centaurius. So the current flow-based programming solution are no longer suitable for the high-scale multi-tenant networking environment. So we started building Mizzar from ground up on top of XDP Express data path. And Mizzar's main building blog is actually an XDP program that runs on each host. And Mizzar, by definition, is a large-scale and high-performance cloud network to run virtual machines, container, and other compute workloads. In unlike traditional networking solution, Mizzar relies on the natural partitioning of a cloud network to scale. And Mizzar also simplifies the programming of data plan to scale by flexible in-network processing, which is different compared to the flow-based programming model. And on the left here, we see the Mizzar's high-level architecture, which includes the management plan and the data plan. So Mizzar's data plan is actually built on XDP and EBPF technologies and Geneve Network protocol. It provides high-performance and extensible packet processing pipeline and function that helps to achieve Mizzar's functional scale and performance goal. And Mizzar's management plan, on the other hand, is built on top of Kubernetes using CRD and operator framework. And it programs the data plan by translating typical networking APIs and resources to Mizzar's specific configuration. And some of the key advantages of Mizzar is that it can support the management and provisioning of large-scale networking endpoints in one cluster without compromising the performance of the network. And Mizzar also has high network throughput and low latency. Mizzar has an extensible data plan and means I unify the data plan for a VM container and potentially other compute workloads in the future. And Mizzar also provides multi-tenant isolation for traffic and address spaces. And also, you will hear more about the Mizzar architecture from Dr. Peng Du in shortly. And next up, I would like to introduce our Edge subproject to you. Centaur's Edge, which is also called Fornex, provides a framework that allows user applications to run reliably in the Edge environment. And following two of the design principles, robustness and flexible topology, I'm excited to introduce you to some of the key features from our first release, which was just launched last month. So the first key feature was both computing nodes and cluster can now run on the Edge. And Centaur's Edge also has a hierarchical topology where Edge's cluster can be structured in multi-layered tree-like topology, providing the best mapping to end-user scenarios. And Centaur's Edge also supports multiple flavors of cluster on the Edge, such as Kubernetes, lightweight Kubernetes, and octaves, our subproject that I just talked about. And lastly, Centaur's Edge provides multi-tenant Edge cluster networking and supports concepts like VPC and subnet. And our Centaur's Edge also allows application deployed at different Edge locations to communicate with each other through Edge to Edge communication. And I just want to emphasize that Edge computing is not just equal to running things in a local cluster, or a local data center, but rather to offer a way for different components of application to run in their best student environment in collected efforts. And more details on Edge will also be shared shortly. And lastly, I would like to introduce you to our AI platform subproject, which is called LNARE. So as AI workloads thrive on cloud-wasting rapid adoptions from many sectors, such as health care, retail, auto, and many more. And according to Model Intelligent, the cloud AI market was valued at USD $5.2 billion in 2020 and expected to reach USD $13.1 billion by 2026, registering a compound annual growth rate of 20.3%. And we all understand that AI workloads have their special characteristics. For example, long running, which requires full tolerance. Distributed training and gradient exchange, which requires co-scheduling and communication. And burstable request as serving, which requires low latency and low balancing. And with those characteristics in mind, we started the LNARE project, which is positioned as a self-learning elastic platform for AI workloads. And in order for us to achieve this self-learning elastic platform, optimization is needed to improve resource utilization, training and serving efficiency, monitoring, logging, and the intelligence of the platform. And on this slide you will see on the left is our high-level LNARE architecture, which includes four components. The first one is the unified elastic framework, which is an easy-to-use framework, which unifies normal distributed training and elastic training. A multifunctional profiler, which does multi-level resource utilization monitoring. AI oriented scheduler, which is a learning-based scheduling strategy that leverage real-time cluster status for dynamic GPU workload scheduling. And lastly, defined-grant GPU sharing, which is to support fractional GPU resource allocation in order for us to improve resource utilization. And some of the few key highlights that I would like to emphasize is profiling and monitoring provides the insight of how efficient cluster and job run. And it's actually the data foundation for us to build a self-learning system. And elastic training and dynamic scheduling with a global view of resource utilization in a cluster are essential for running AI training job efficiently. And lastly, GPU sharing is critical for improving resource utilization in the AI-serving scenarios. And with the collected effort from the four components, LNARE is a platform designed to improve AI workload efficiency while equipped with the self-learning capability to continuously optimize the decision that was made by itself. And unfortunately, we don't have a deep dive on the LNARE architecture today. But you are welcome to join us, visit us on our GitHub page to learn more about the architecture of LNARE and also some of the work that we're doing and what are some of the next steps and how you can be a part of the community. And that concludes the high-level introduction to Santorius. Next, I will hand over to Dr. In-Chung, who will be giving a deep dive on octaves, our Cloud Compute Infrastructure in Santorius. Thank you, Moni. Hello, everyone. Again, my name is In-Chung, I'm with Futureware Technologies. And I want to thank you again to come to this tutorial. We have prepared lots of slides, and we have a lot of contents to cover within an hour. So we're going to try our best to finish in an hour. So I'm going to jump right into the topic and give a little bit dive of the project octaves. So as Moni mentioned that the vision for octaves is to become a large-scale Cloud-lative infrastructure that manage a cluster of computing nodes that provision the resource on demand for application and your workload. Now, talking about the Cloud-lative and orchestration platform, I know you are immediately thinking of the Kubernetes, which is the most successful, most popular, and open source project in those days in recent years. So what is octaves? Well, octaves is based on Kubernetes, as Moni mentioned. But with a fundamental change that we've made to achieve the goal we want for octaves, which become the next generation Cloud-lative infrastructure for managing the AI, 5G, and all next generation applications. So here are four major changes that were made to Kubernetes for octaves, the project itself. We first scale out the Kubernetes architecture to support large-scale clusters. When I say large-scale, we mean that more than 30,000 nodes in the cluster want to support. And as Moni mentioned, we have a goal to support 100,000 computing nodes in the clusters. I will show you how this design later in the slides. We also made a fundamental change to the almost entire Kubernetes code base to provide the multi-tendency and network isolation to make Kubernetes a true Cloud-lative infrastructure that manages the multi-tendency. So sometimes we call it virtualized Kubernetes. That means that the multiple customer or multiple attendant, they can share the same Kubernetes cluster without impacting each other. The third change that we made or introduced in the octaves is to unify the API and object in the runtime for both VM and containers. We intend to support future more virtualizations such as Bayer Matters, more type of resources, or other new type of container or virtualizations. So the first major change we made is to extend the architecture to support and then manage the cluster that far away from Cloud Center, which is on the edge side. And DuPont will show how we do that and then manage a work node that deploy to the edge clusters. So let's start with scale-out architecture. I think most of you know, if not all, this is the Kubernetes architecture. It's a very basic high-level. Where the control plan has API server and the ETCD as a data store to store all the Kubernetes objects, then you have a bunch of controllers such as services, part, dmin set, replica set, endpoint, those are controllers running. They all watch API servers to look for something to do. Then you have a scheduler, which is running the scheduling algorithm to decide which part, which containers, all your vacations running on which node. Finally, you have the Kubernetes that run on every and each work node or worker node that actually store and stop your containers, your applications, and manage the lifecycle of your applications. So in order for this architecture to support a larger scale, we split this architecture into two parts. The first part on the top, we call a tendon partitions, we call TP. And you receive the request from the customers or tendons. It has its own API server. It has its own ETCD. But the ETCD only store the tendon-related information or objects. It run most of the containers, such as your deployment, your replica set, your jobs, your dmin set, or stifle set. And those are information that is stored in the ETCD version of the partitions, we call it. You'll see there are two additional controllers that was oriented color in the diagram. Those are two controllers we added in the architecture to support the multi-tendency. And now with isolation, I will talk about this later in the slides. The second part or partitions on the bottom is what we call the resource partition. It has its own API server. It has its own ETCD. But it's only running the node and surface account controllers, very small controllers running there. It manages to work the actual computer nodes on behalf of the whole cluster. So it's transparent to the customers. In order for this architecture to work, in fact, the scheduler has talked that running on the TP we call tendon partitions has to talk to the API server on the RP resource partition just to get a bunch of nodes so that they're doing the scheduling. Similarly, the Kubernetes running managed by the RP resource partition, he has talked to the API servers on the tendon partition TP in order to update the part status, your application. Because your application, your part information is also on the tendon partition on the top. A little bit more design point for this scale out architecture. You actually can run multiple TP tendon partitions or multiple RP resource partition. And this scale independently. So it's perfectly fine if you run three TPs, two RPs in this architecture and it just works. And each tendons belongs to one TP or tendon partitions. The tendon partitions can manage multiple tendons, of course. So it's one to many relationship. Now, if you're running multiple TP or tendon partitions, you will see that you have multiple schedulers running at the same time. Because each partition has its own scheduler. And those schedulers actually will have run concurrently. And you may have a conflicts. Fortunately, the Kubernetes scheduler that has its own conflicts resolution building in the schedulers. And of course, if you run multiple RP resource partitions, the scheduler has to talk to all the RP in order to get all the nodes or a computer node in order to do a scheduling. So the scheduler, if a multiple scheduler running, they have the same information about the nodes. So logically, this is one cluster, not multiple cluster, even though we split the architecture into two. So how good is this scale out architecture? So we're using the community version of KOOP mark, which is a benchmark tool, a performance test tool, that tests the scale out architecture. So we first try one TP, one RP scale out architecture, which is run tendon partitions, one resource partitions. As you see, we are able to support 30,000 nodes in the clusters, compared with community version that we also tested, a version of 1.21, with 15 case nodes only. As you see from this performance test result, we need to mainly focus on the part stop latency. They are very comparable. Where 99% of percentile, the node stop latency, is about 6.5. It need to be higher than compared with 5.6 for the standard Kubernetes version 1.21. Well, but still close. And also, during this scale out architecture, we're able to improve the system throughput. You will see that the QPS was set from D420 to 120, which is six times our throughput. So with this scale architecture, we are actually able to achieve a double the size of the cluster size to 30,000 nodes with a six times throughput. So this is very encouraging. In fact, promising the scale architecture works for logic scale clusters. I mentioned earlier that you can run multiple TP and multiple RP resource partition or tendon partitions. Now, this is an example that has two TP running. And we're using an API gateway to distribute the request into the two tendon partitions. And we also have two RP resource partition that manage each resource partition managing 25,000 nodes. So total in this design, in this sample architecture, with two RP is 50,000 computing nodes in the clusters. So with this design, sample scale architecture, we're also using the same benchmark tool, Kugumar, to test this two TP, two RP architecture. And you'll see we're able to achieve 50,000 computing nodes in the clusters compared with, again, Kubernetes version 1.21 with 15k nodes only. And if you're looking at the proof of test result, the actual part stop latency is actually much better compared with Kubernetes version 1.21. You will see that 99% tire, the part stop latency is actually better. So also with 15,000 computer nodes in the cluster, we're able to deploy and then run in 1.5 million containers in this test, which is very promising. So that's the change that we made, the first change we made to scale out the Kubernetes architecture to support a larger cluster. Now the second change we made in the Kubernetes for October's project is to provide a multi-tendency and network isolation. So October's introduced a concept called space. And then we arrange all the Kubernetes objects into the space, and the objects in the different space that they don't actually isolate each other, don't talk each other. There's a special building space, we call system space, that where all the system-labeled resources, such as clusters, nodes, those resources are not to do with tenant or customers. They are stored in the system space and is isolated from the tenant. Then we introduce the tenant object and tenant controllers that I mentioned earlier in the diagram that represent the customers. So every customer, if you're using October's, we have to create a tenant for you. And tenant has a dedicated space, equal tenant space. All the objects created by the tenant, the deployment, the replica set, the jobs, the demo set, they're all created. They were stored in that tenant space and isolated with other space. We also add a field in the metadata for all objects in the Kubernetes called Tendent. And that field actually provides a very nice way to identify which space that this object is pronounced to or located to. That way we can easily find out the objects. And that Tendent field is also used for authentication authorization. So we use this field to authenticate that whether you are able to access this resource or not for that Tendent. So you provide access control. It's part of access control to isolate the resource between Tendents. So if you really think about this, it's each tenant space. You can view a tenant space as a virtual cluster. And Tendent admin can manage this virtual cluster, just like they manage the real Kubernetes cluster. In fact, the Tendent admin uses the same API to create, for example, the resource quarter, the security policy, the configure map for your application for your Tendent. And they can still use the same API that created the resource for that Tendent, and which is visited in that virtual cluster. That's only visible and available to the Tendent. Same thing is true for CRD, which is customer resource definitions. But now you have two types of CRDs. You have Tendent CRD, the Tendent admin created for that virtual cluster they have. And that CRD, again, is only visible and available to that Tendent. Then cluster admin can create a system CRD, which we're living in the system space, called system naming space, or system space. And when you install that and create that, that's available to all Tendents. So they provide a nice way for cluster admin to create a common CRDs or objects that are available for all Tendents. So that's good. And we provide the Tendent, we provide a space concept to isolate the resource between Tendents. How about the network? I think most of you know that knows Kubernetes. The Kubernetes network model is designed to be a flat network. Every part in the cluster can talk to every other parts in the clusters, because using the single API address space, using the single DNAs. So there's no more Tendent. in terms of for network model perspective. But it does provide a network policy for you to control which part can talk to which part for network isolation. But we think this is really soft isolation. And it's not enough. So we introduce the network objects and network controllers, which you'll show in the previous architectural diagrams. So with the network, the Tendent, when they've created a part, before they create a part, they have to create a network. The network has its own API address, has its own DNA surrounding. And you can create a multiple network, which is actually basically a subnet. And the part within the network, of course, they are living in the same API address space. They can talk to each other. There are parts on the different network. By definition, they cannot talk to each other, which provide a strong isolation for the part communication in the network model. Now for parts on the different Tendents, of course, they are in the different network, so they cannot talk each other by definition. So that also provides another way for isolation for between the Tendents. So now the third change we made or introduced in the actors is the unified API and a runtime to support both VM and containers in the same platform. Basically, there are two approaches to support the VMs in Kubernetes. We talked about the second approach, which we call native approach. The first approach is add-on approach, where you define a separate CRDs. You define or implement a separate operators or controllers. And therefore, you have separate APIs for VMs than for containers. Now the second approach we take, we use an exactly same API, we extend it, we use an all existing controllers and the existing schedulers without any change. So in order for that to work, we first introduce virtual machine type in the part definition, as you see in this diagram. Now the part not only represent the containers, but also represent the virtual machines. So it's very nice abstraction that already defined in the Kubernetes for part. We just use and leverage the part definitions. Now in the future, we intend to support other type of resources, could be bare metals, could be other type of resource containers. So with that part definition extension, now all the controllers, including the deployment controllers, the replica set, the jobs, again, the services, the endpoint now works for VMs. And we call VM part, now it works for VMs. And there's no need to change on the controllers to support that, to support the VMs. So everything is the same. There is a small, minor change in scheduler based on the current implementation of scheduler of Kubernetes. And you're actually looking for containers. So we change that to abstract that so that they recognize the part instead of recognize the containers. Now they can recognize containers and virtual machines. So that is the minor change. But the main change to support the Unify platform for the VM part and containers is the runtime infrastructure. As you know, this is a runtime architecture on the node for Kubernetes. And to extend this runtime infrastructure to support the VMs by extending the CLI the container runtime interface with this VM specific properties, as you see in this diagrams. Then we add a VM runtime, which is a modification of wallet that we register this runtime to the runtime registry in the Kubernetes. And we also introduce the part converters on the right side of the diagrams that convert the part specification to the format that VM runtime recognize. So with that, we're able to enter in from API server to actually start managing VMs. And all the way to the actual host action node with the runtime to start the VM with a various instance. So lastly, we introduce up to score actions on the VM part. There are certain set of actions that can be performed on the VM that's not applied to the containers, such as you can stop or stop a virtual machine. You can reboot your virtual machine. You can take a snapshot of virtual machines, detect or detach or attach the device for your virtual machine. Those are specific actions for your VMs. So we introduce these action objects that in the architect to support that. We also extend the Kubernetes. That then listen to the actions from API servers. For example, if you say I want to reboot your VMs, you use an API YAML file to create action objects or actions request. And then the Kubernetes on the host, which actually listen to the actions and then get it. And then we reboot your actual, do the performance actions to reboot your VMs. So by now, I talk about three major change in Kubernetes for the architect. And this is all core fundamental change. The first change we made is to extend the architecture to support edge clusters. And Pong Du from our team will talk about that. So with that, I cannot give our view of deep dive of the architect's project. Again, the project's open source project. And if you're interested you can find, you can go to Centera's website or the GitHub link we will provide in the end of this talk and they get more information from there. So right now I will hand out to our Drupal and he can talk about the MISA and the architecture on the edge cluster side. Okay, thank you Dr. Xiu. Let me move forward. So hi, my name is Pong Du. And today I'm here to introduce the, to talk about the MISA and the edge project. Since this is a tutorial and we have limited time, I will not go too much further into the details but I want you guys to walk away with essentially why we're doing this project and what's the benefit that the way we're doing it. And if you have further questions there will be more information provided. So I'll start with MISA and Dr. Xiu and the money already discussed this a little bit. So the connection between this MISA and the other part of this whole Centera's effort is you can think, one way to think about that is the multi-tenancy. Now for example, if we have one cluster we have different tenant then we use, we provide the multi-tenancy so each user thinks that they own the cluster they can do whatever they want based on their, well of course there's limitation but they think they own the cluster. For MISA is to control the network side of this, okay. And there are very different benefits of having this MISA architecture. So scalability as we said, we want to support a very large number of endpoints. Here if you see the word, actually I want to take this off. Here if you see the word endpoints you can think of that as just a pod or we provide other kind of VMs and pods but the endpoint here for a mental image you can think of that as just a pod. Of course we want to provide a higher performance and throughput with high throughput and low latency and extensibility. This is important especially for single cluster and for the edge. And I will talk about this more in a little bit. And the versatility is we want to support a VM we want to support pod so for the MISA project it's able to support that. With this four points I feel like it's a little abstract so again think of MISA as we want, we have multi-tenancy in a cluster and then if you convert that multi-tenancy concept into the network is the MISA network models we want to see. So I want to start with this so that you have a better idea of what we are talking about. So we said we can support multi-tenancy here I'm showing you two tenants. So one tenant one and tenant two. From the VPC address space you can see that they actually have the same address space so that they're not supposed to talk to each other but inside each VPC range if you have pods within there they can talk to each other with that IP address. Actually if you have a pod in VPC one they can have the same IP address within that range in the other tenant space but they cannot talk to each other. So the point here is the VPC is a virtualized network so by that I mean they can be duplicated and they don't talk to each other. Yeah and also if we dive into one single VPC we see that within that IP range we can further divided that into smaller range which we call subnet. The subnet as you can see here on the left that's the first subnet by showing all those red zero you know that the IP range for that is one added to 168.0 something and on the right is a second subnet and it starts with a different IP range it's 122 and something. So this is within one VPC and the keeping in mind one VPC is for one tenant. So for each tenant they can pick okay if based on what they are trying to do they can say I have one subnet for certain kind of work the other subnet for the other kind of work and they can pick and choose where to put them. This is more important also when we talk about edge because now this is still it used to be just for one cluster and when we talk about the edge things are more distributed and the subnet that can be distributed as well. So just keep that in mind. Where is the concept of VPC and subnets if we are talking about we want to put things together for one cluster we can have one VPC as shown here and we have two subnets and we can have endpoints. By here we can see that the endpoint we simply just have two pods and they have their IP just pay some attention to the IP range. So the pod on the left belong to the first subnet the pod on the right belong to the second subnet and essentially all we want to do is so they can talk to each other through the VPC IPs or through the IP that's allocated from that VPC IP range. As simple as that. So this is the like the logical model that we're trying to achieve. With this model now we can talk about how we implement this and what's the benefit we get from our implementation. Since we're still talking about the one cluster we have this is a typical Kubernetes cluster here is showing as Actos cluster. We have the master we have the worker nodes. So these are I'm showing like very abstract components. I'm not showing all the details but for the master it has the control plan that's the if you will the Kubernetes control plan with the scheduler controllers and all that with the worker node that one of the most important part is the cobalt right. So based on this then we want to run or we want to install Miser on this and Miser by itself it brings in a few extra components to support that VPC and subnet. So one is the control plan. The control plan is running as CRDs and controllers it essentially just manage all the objects for example we talked about the VPC we talked about the subnet those are the when we talk about an implementation those are the object stored in ETCD. So and also there's a workflow going through them for example if you create a subnet what's gonna happen if you create another VPC a lot of things needs to happen that's why we have the controllers there to work with that. On the worker node there are two parts one is let me show just one. So on the worker node first thing is Miser agent what it does is essentially taking control command from the control plan and it's in charge of making modification to or just put that turn that command into some action on this Miser XDP programs. So we run Miser XDP program on each node that's gonna be the real control of where the network packet goes and we run all this in all the worker nodes sometime in the cluster in the master as well if we use that as the worker node but those are the three biggest components coming from the Miser. If you have a cluster you want to install the Miser those are things that you will install there okay. And the XDP program I'll spend a little bit of time actually the quick pause who has experience with XDP program who has heard of this XDP program. Okay then I'll go into a little bit further details here. So here what it shows per node so if we go back to this picture I'm showing all the components we have a bunch of nodes there if you just look into one of the worker nodes this is what it looks like. So you should still recognize we have the Miser agent and we have the XDP programs. Those two green box you see we actually run two XDP programs. And if you're not familiar with what that is think of that as you're running your code in a kernel where the kernel provides a virtual machine a sort of a virtual machine it's not the kind you get from EC2 for example but it provides a virtualized environment in the Linux kernel where you can put your program in there. Here we are putting the transit XDP program in there. So we put it in there it can run and of course it has to go through a bunch of safety checks it cannot just run some random things in the kernel of course but after that you put your code inside that virtual machine and that will see all the traffic packet coming in and going out. So that the traffic will come from the pod. So for example here I'm showing the ingress ingress means we have some packet coming in from the cloud or from somewhere else it arrives at the network interface and the first thing the packet goes to is not that the network stack anymore it's intercepted by this XDP program it sees that it has the business logic in it so it sees this essentially that's the implementation of the VPC and subnet and we have the logic that this is for the ingress part so packet coming in based on where the packet is trying to go sometimes it's targeted as a pod on this host sometimes it needs to be sent to some other host so first case is well the XDP program will see oh the target is the pod on this host as simple as that so the XDP program will do some processing and then essentially it's passed that to the network stack from that on it's the same as before like it goes through all the layers and eventually that packet will go to that pod and pod will receive that message in another case the XDP program will see that oh it's actually it comes here for a reason but that the reason is not to go to a pod that belongs to this host it's supposed to go somewhere else then in that case the XDP program will not even bother that the network stack it will just simply send that or divert that back out with some changes to okay you are trying to talk to me at this point you need to talk to some other node so it will change the source destination based on who is trying to talk and it will send it back out to the interface what's the benefit of this? well you are not going through the network stack anymore that takes time that takes a lot of compute here we take the packet the first time it shows up on this node we do the processing and we send it out so that's much faster and more efficient than having to go through the stack all the way up and then having to go through the stack all the way back down okay so that's the two cases where things can happen for the ingress the packet coming in another way of course is if we are sending something from the pod the pod is trying to talk to another pod based on their VPC IP or subnet IP or their IP in their subnet it's happening very similar so another XDP program will intercept that we'll seize that and it will just make some decision and it can send that to the interface and the packet will just go out okay as you can see the process is very similar it coming in going out they are going through the XDP program we try to avoid that the network stack as much as possible well if we have to go in there then we have to go in there there's also the EBPF map but that's the data that determines where the packet is supposed to go if it's supposed to go to another pod here or some other pod on some other node that all those information will be stored in that EBPF maps think of that as a dictionary you put here a hash table or whatever you want to call it and where does that data come from that data come from that means our agent that's running on this node so that puts everything together and if I just back out one few more slides that's what's running on one of those work channels okay that's helpful all right further on just more details as how we do the back then we talked about the packet coming in it has to go here, it has to go there but those decisions are based on the VPC and subnet logic and how we implement that here is a quick illustration so for example we have two pods on the left sometimes the pod one wants to talk to pod two on the same side in the same subnet in that case it goes to one of the bouncer node bouncer is just the host running that XDP program we just talked about so it has those logic in it so it knows that when the packet coming into the bouncer it will be bouncing back to the other pod or the other host in the same subnet similarly if we are talking to the pod in the right which is in a different subnet but in the same VPC then the bouncer essentially says oh you are going out you are not under my management so it's sending that to another thing called divider divider is just another host running the XDP program but it controls different part of the communication you can see oh I'm gonna suppose to send that to another bouncer which manages the other subnet think of this as you are sending a message to a guy in a different building you don't know where he is you know his office number but you cannot find that building so you send that to the front desk the front desk send that to for example the post office the post office send it to the other front desk in the other building and that front desk has the information of where that guy is think of it that way and it will make more sense then okay and this is roughly how MISR works there's more information there the introduction here is just to have you guys understand why we're doing this and what's the basic components in there I'll continue with the edge project I'm under limited time so I'll go a little faster but so here are the goals and visions of the edge computing actually edge computing means different things to different people it's a very broad term so I want to just give you guys a moment to thank you okay what do you think of the goals for edge computing through our research and development that we came up some of this key or important things we think edge computing environment should have and this is the problem we are trying to address with our solutions on the right I'm showing a simple edge computing example I guess with a story it's easier to remember but think of it as we have some server running in the cloud we have some server running off the cloud or in some local environment and we have some device running on it I'm showing camera but it doesn't have to be a camera it could be a sensor it could be your phone it could be all those things so based on what the criteria on the left on the right we have all this but if we want to solve this problem that there's a problem with the architecture you're seeing on the right for one with the scalability we're seeing especially with 5G there's gonna be more and more devices trying to connect to the cloud and if we want to support that scalability that the cloud point will become a bottleneck so instead we come up with this hierarchy we think the framework that support edge computing should be a hierarchical it's like a tree with different branches can I go for another five minutes? Okay, thank you Okay, so compare this with this you can see the difference and what that difference provide is the scalability now you see that all the sensors all the local device they're not talking to one single server across the whole region it's trying to for example talking to a server that's really close to all those devices by doing that you have better latency you're talking to someone who's closer to you and who can give you the answer faster and also if you have multiple servers that's dealing with all these local devices then you can scale this think of octos as we're trying to support more nodes devices more compute in one cluster think of this as we're horizontally scaling this out to two different clusters so that's another way to scale out well scale up or scale out a few other things I want to quickly go over for example the autonomy the one key goal for the edge computing and one key difference is things can fail if you put that outside a data center in a data center you have cooling you have all the staff there outside data center you don't have sometimes you don't even have access to that device very easily so being able to deal with failures whether that's the network failure or the single node failure we want to be able to deal with that and the application running on the edge should be able to continue to run one last thing is the privacy means sometimes the user don't want to send their data to the cloud so we do the processing closer to the user as close as possible and we send only the producer process result upward and if we have different layer of this we can continue down this path okay okay so I'll go a little quicker we have a different ways of organizing this cluster here's the question the answer to this real quick is we want to support both cases this is the decision made by the user we want to be flexible we provide a solution the user can make their choice okay these are some details architecture on the left is some detail components I will not go into there but if you take this and we are able to expand that so that we can support those hierarchical clusters and of course if you have workload distributed everywhere then there's a ways to essentially say I have a deployment I have a pod I need to run that at a certain cluster at a certain level then we're able to provide that facility so that you can do that the last thing I want to talk about is networking here we saw this before this is for one cluster now we are expanding that to two clusters with two subnet that's spreading out they're not in the same cluster anymore but we still want to be able to support that in our edge solution we do have that and the way we do it I'll go either real quick we provide now another level of gateway communication between those clusters so this is not one single point of control they are distributed so that's also met the goal of scalability and if we have two posts there they can talk to each other like that so okay for more details we just have had our first release so if you want more details feel free to talk to me afterward or there has our information sorry about it running over time thank you so thanks Peng, Dr. Shong and Meghni for giving the overview of Centaurious and thanks to everyone who have joined in person even in COVID's pandemic so as we all know that we are in open source and it cannot work without a community and the supporters like you all this like I'll just give a quick background about what the Centaurious like we already know about this project a technical concept but how it actually started and what are the key partners that has been involved with this community and what is our goal so far where we have been and what we want to achieve and without your support it again will not be possible so as we discussed like Centaurious just launched in Linux Foundation officially on December 16, 2020 it's still a small baby but we are still like growing that and with the support of the great partners like Futureway, Click2Cloud, Great Green and Gain, Soda Foundation, Informatics and few others so the partners got involved to discuss the strategy of course with the great community members and the industry experts as we all know that there is a huge demand in industry when we talk about the edge or even the growing number of IoT devices that we use on daily basis at the same time the medical industry, healthcare or even a smart city like the devices or cameras that are based around the places and it required the mechanism to manage and to process the data we found that there is still a huge requirement in industry at the same time there is a gap which needs to be fixed and that is how the Centaurious comes into the picture where we are trying to make sure that how we can make that distributed cloud at the same time provide a multi-tenancy scale this the like the architecture or the nodes at 30 care even in future with the more multiple numbers or thousands of nodes so our entire strategy comes into the picture that how we can make that possible we form a community with the voting system or with the support of a different industry and leadership we got an advisory committee which is part of Dr. Shong, Dr. Professor Hakim and Chris from CNCF where they are part on the advisory and then defining the technical or industry requirement even the strategic guidelines that how this community should look like then we got like entire technical steering committee of member of seven members elected by again a community and industry like Deepak, Prashant, Stefan, Sneel, Showning and Nikita so we all are, I mean, there's all technical steering committees there who meets every single month to define the project to make sure that what are the missing parts even to add a new projects like approving the industry, approving the guidelines and making sure that we are meeting the industry expectations that is going to come in the upcoming quarter or even in the current phase with that we got like outreach and marketing committee which part of Annie and she's not here today but she's also there and myself where we are planned and we are like taking care of the marketing partnership without that which is the core part along with the technical capability so that is how our committee looks like and again, along with this committee we have a special interest group like SIGs for each of the modules that Dr. Showng, Magni and Peng have just introduced like Actors, Mizar, The Edge and AI side and whenever we define this project since it's open source we all, we want you to be part of this community even this project and start developing or even giving the suggestions so this all SIG groups are open source I mean the meetings which we conduct on a weekly basis is open source you're more than happy to participate and give the suggestions or a recommendation even if you want to suggest any project that you think is relevant that is where we normally discuss and communicate in this forum so we have four different SIG groups then recently as I mentioned that since these communities I mean a project is still so young but we got a, we recently hosted an event in Asia in APAC region last month where we got a great response from about 12 different countries like lot of speakers joined our industry experts joined from China from India from France from the different regions and they were part of the keynotes we successfully launched this event where the traction of that Centorious project went to 50,000 outrage on the social media channels around 1300 like 10 people registered for that event 200 colleges in Asia Pacific showed interest to get that community project in this curriculum activity around 700 plus people joined life during that event we got 40 members out of that event and then who might be a core community members even in future or can be a core contributor during in this Centorious project we announced seven top awards along with the cash prizes and 33 certificates been sponsored by our partner I mean Collector Cloud so this is a public release and announcement that has been published on the local and regional newspapers are the gifts that we have announced or distributed as well as the vision to make this project to the industry like telecom to healthcare and so on so that is how that vision has been recognized by government of India as well which has been published on the announcement that you can see on right so with that I'm sure that you must be interested to know more that how you can join this community and how you can start contributing on being part of this ecosystem so you're more than welcome to join we have our booth entire four days in this event our team is there you can feel free to reach out and then talk about the ideas that you have or even the feedback in case you think or want to learn more about that with that you can reach or go to our website and know more about Centorious we have our mailing group subscribe that mailing group so you can get new announcement and the latest updates I mean released so far we have our GitHub accounts where you can fork the code try out and like do some kind of activities around it and feel free to share as the feedback we saw that we I mean even in that small duration there is 1000 lines of code that has been updated in past couple of months from the community contributors like you all so we welcome all of you to join our group also the Slack for any offline communication we have different Slack channels for each of this project and with that meeting information can be also available over there so with that I'll just let the time give back to you to enjoy your lunch and thank you so much for being part of Centorious and we welcome you all to grow this community even more further thank you