 So today, the topics we're going to talk about is the network, and behind that is the word, cybernetic clusters. So thank you very much for being here to listen to our demo. So I would like to do self introduction first. My name is Michelle Info. I come from VMware. And now I am doing the VHS and the stability casting. And before that, I'm working for the VMware Individed Concealer. And I also help to develop the VMware Individed Offense Tax in VMware Big Data Extension. And then I provide system tax development. Hi, my name is Yuya. I also come from VMware, China, R&D, into the Beijing office. So what I do is I work on VMware Kubernetes products. And now I do a lot of Kubernetes. And before that, I'm actually focusing on nutrient open stacks networking. So it's quite related to networking. So if you have any questions or if you have any topics related to networking, do not be hesitate to talk with me. So for our today's agenda, I'm going to talk about Kubernetes, the components and the standard schedulers, as well as the quality of the service and some of the network features of the Kubernetes. And also we're going to talk about how to enable network in the best course. We own a Kubernetes. It is the de facto standard of continuous license application. It lists a lot of advantages of the Kubernetes. In terms of scalability, it can reach to 5,000 nodes into 300,000 containers. And the number of people increasing. And also it can provide many separations of the resources. So it can easily realize the multi-tenancy application. And also no matter if it's public, no cloud, it's private, no cloud, or other different environments. And the option is quite wide range. And you can easily realize the immigration. So here are the major components of Kubernetes that we have, container wind times, which is the main focus on dockers. And an equivalent component is the management port. And SCD is the database. And so all the objects in our stores are in the SCD. And then API servers is actually one of the interface for the object. And for the scheduler, Kubernetes scheduler is actually the tool to statute working load as an appropriate node as a controller manager. And it is management of the tools for the life as a code of controller. And Proxy is a component to managing network and iProverter. And it is... And it will do everything. And class communication and network policies and coding and that is actually the main server. And let's first of all look at Kubernetes standard scheduler. Also can see from this slide, Internet is in the charts and shows how Kubernetes choose the node for ports and how to do the scalability. So first of all, Internet, they weren't a series of filter rules. So the purpose is to filter out inappropriate nodes. So here I have listed three volume filters. So that is to say your ports do have some requirements on the volume. So if the ports couldn't match with the volume or it is not in the same zone, node will be filtered out. So the other is actually in the candidate list. So the inappropriate node will be filtered out of this candidate in the list. And the second is in the CPU and in the memory. So if there is no sufficient CPU or memory, and the node will be filtered out. And also on the specifics of the nodes, and probably in a certain part of the label, if there is no matching label, the node will be filtered out. And the second step is the resonant node will be scored in the ranks. And so there were several follow-ins to follow. The first one is the node replica's distribution memory, even or not. And another ranking principle is the utilizations of the node. And also we need to check whether the resource and the usage is balanced or not. And another nodding principle is the affinity and the tendency of priority. And then the highest ranking node will be selected and a port node will be deployed on that. In Kubernetes, there are three. The quality of the service and the classes. The first one is the guaranteed. Guaranteed means when request is equal to the limit, and then we'll all set it up. And then this resource node will be appointed with the highest priority. And the second is the class is burstable. So that means you're ready to fill in the request and the limit. And however, it requires that it's smaller than the limit. And so your resource node will not be on the limit. And the last type is the burst node effort. That means you didn't fill in any request. So it can only guarantee the residue resources that will be deployed. And for the quality of the service, in the standard Kubernetes and QoS, it only considers CPU engine memory. And then here are separate band bus services of applications. For example, all the video, IP platforms, web applications, emails, and not also online gaming, as well as file servers and P2P. So for all of these applications, we have to deploy it on the node. That means the sufficient bandwidth, we also have to guarantee the very low network latency and very high traffic pass away. And as my colleagues just mentioned, there were some sensitive applications to the band bus. For example, some telecommunications and applications. And also, we have to consider that in the current community of Kubernetes, there is no open and completely solutions upstream to the cluster. So let's review a little bit. What is the current network feature of the Kubernetes? The first one, and I think you are familiar with that, is the CNI plug-in. And it is a kind of a Kubernetes add-on, including Flamon, Catechol, and Climarchus. And to know about with the CNI genius. And they can be seen as a kind of a CNI add-on. So the CNI plug-in actually provides a later two and a later three connectivity. So many CNI plug-in can provide the native ingress. So you can actually define ingress by yourself natively. And so the CNI plug-in is going to point it to the ingress, to the point, and to identify the responding port. And also the community has been talking about the multi-NNS support. And for matters, the CNI genius, which is provided by Firebase and Intel, and as well as Network Service and Mesh, and not all of these natures can help us to solve the multiple NIC features. And because by default, we know that CNI only have number one NICs, however with this new type of CNI plug-in, we can support multiple NICs. And also for the native, we have the Network Policy, which can help us to define Network Policy resource and to provide the macro segmentation for ports. And another one is also frequently seen is the Service and Mesh. It can provide the seven layers of Service and Mesh. Here are some most frequently seen network features in Kubernetes community. However, we don't really have any research for things like network use. On the other hand, in the virtualized infrastructures, we're ready to solve the problems of network performance. For example, we can use the PCI pass-through to pass through the hardware on the host. And also we can use an SRLV to do some improvements in the network performance. And we can also use virtual switch hardware acceleration to help us to improve the network performance. And similarly, for VM, if we want to solve this issue, we have to rank applications within the consumer and we have to have the same considerations. So the basic discussions in our community are actually in two tools. The first solution is the device plug-in. If you check the Intel and their bottom layers, developments, they talk about DK plug-in. They talk about JPE plug-in. So those are all device plug-ins that are aimed to allow the port to use the hardware resources. So if we want to improve the network performance, this is actually the solution that we can consider. Device plug-in can actually match the hardware resources that are contained here. Another way is the Macavillum. And we can actually let the port to leverage the host of the containers. And directly used is the NIC. So those are just some discussions in our communities. However, we are going to propose another solution, which is the list is not on the previous slides. And the network use requirements is in our recommendation. We think that by using this in a way, we can actually provide sufficient and a natural working solution for ports from different or from a more comprehensive dimension. And as Yu Yang just mentioned, within the community, there were already several network features, for example, hardware accelerations and different solutions to improve the performance. And so today, I want you to talk about using another way to improve the network QoS. So we need to make a analysis model. So what we can do are two things. The first thing we need to do is traffic shipping. So that means on each node worker node, we have to reserve the sufficient network network sufficient resources on it. And also, we need to do ingress and egress, compulsion on the port. And ingress is the fitting, and the egress is the alternate put. And also in terms of the Kubernetes collaborations, we also need to take the network resource into the consideration. So the prioritizer and the future is going to be considered in the term of the calculation. And on the physical level, they bear model deployment. And as you can see, we already have quite a fix in the physical NIC. So since the bandwidth isn't quite fixed, we don't need to reserve any other extra spaces. And however, we need to conduct the ingress-egress control on port. And we already have Linux traffic control, which is actually a trade-shaped queue. So this is actually in my seat. So for each grant in the port, there is a branch. And in each first port, we have a branch. So the BASNA efforts port is shared with one branch. So with these scheduling and deployment, and it's going to issues grantees. And so the guarantees in the port is going to deploy according to the request to guarantees. And all traffic will be sent out for the burstable port. And since the request is smaller than the limit, and so they want to guarantee that the request will be satisfied. However, the gap between the request and the limit is really depends on the residue bent on the west. And then it's going to issues that they rest on once on each branches. So this is from the rest of the tokens. And when they are idle, those traffic can go out. So in the watch node deployment, most of them are in the VM. So on this slide, vSphere plus NSX, you can see the Kubernetes architecture. Here. This is the master nodes. And you can see NSX and CP. And it will constantly watch or observe Kubernetes objects. And it will also connect to the part through NSXMP and the CP and just set up the fireworks. And in the worker node, you can see we can manage the traffic. But the management traffic and the data traffic are separate. All of the parts have been connected to OVS. And there is a specific network cut connected to OVS with a VDS. So with this kind of architecture, on the network cut, you can manage the traffic or you can connect the parts with the virtual bridges. And then you can manage the traffic of those connected parts. And the core principle for us is to manage the bandwidth of the node and the ingress and egress bandwidth of the node. For example, when a user is requesting a part, parts from the CPU memories in the metadata, he or she can add something about the bandwidth he or she needs. And then with the scheduler, the part can find the scheduler extender. And first, the nodes with unavailable resources will be excluded. And the rest of the nodes will be scored. And then we can select the nodes as high as the score. And then we can bind a part to it. And then the NSX node agent will watch the switch. And then NSX will register the part. And it will tell the part how much bandwidth it needs and then reserve the bandwidth in the watch switch. So on this node, you can see the user has selected the 2G bandwidth nodes. And there are already 2 parts existing on it. And we can have a 900 megabps. And there is still 1300 megabps. And this is about the detailed steps. The first step, on the workload, you need to reserve sufficient bandwidth with NSX feature that is IO control. We can set the same share for each of the nodes. And share means priority weight. When there is a traffic jam in the network, you need to think about who is of the better priority in the virtualization or in the cloud. It's not about the running of the Kubernetes. It's also about the running of the other applications. So there is a problem of the weight. And reservation and limit should be set in the same value. And then we can guarantee that we can reserve the same bandwidth. And in a node, we should add a label to illustrate that this node can support what amount of the maximum bandwidth. The reason for us to list the class API on this slide is just that as my colleague had mentioned, we can have the automated result. And one of the way is that with the NSD and NSOC, we can control the bandwidth. But who will help the cluster to proceed that there is a request to reserve the bandwidth? One option is to use the cluster API, just as we had introduced in the morning, there is a management cluster API which had been running an edge-on, that is the cluster API control plan. And then you can define what the clusters look like based on different cloud providers. And in this providers we can define the bandwidth. And then it means that on the GCP or GKE, there is a definition. There is a profile for each of the work nodes that is the memory of the CPU. And now you can extend the bandwidth. So with this kind of the mechanism, the Kubernetes cluster which is running on the VM can be made into what you have been desiring. And this is also about the bandwidth. And this is the screenshot of the part of how to define the workload provider and there is a property about the bandwidth. When you had a dedicated bandwidth and a cloud provider such as the vSquare provider can perceive that there is a bandwidth property and then we will call the relevant interface and then the VM can be set into a specific bandwidth. So this is about the steps of setting a fixed bandwidth. And if I had just explained to you the principles. Now cluster API is mainly for the purpose of automation and this is a new project that's happening for the automation and with the cluster API you can better set those figures but of course you can do it manually just as my colleague had mentioned you can set them manually. Now what we need to do is that after labeling those nodes you need to apply for the parts and you need to tell the schedule of how much resources network resources specifically you need and here we had listed the Ingress bandwidth and Ingress bandwidth burst as well as the egress bandwidth and the egress bandwidth burst and the cost is about a priority when there is a preferred terminal network whose package would be output first. Next we need to extend the Kubernetes as a scheduler. Kubernetes allows the extension of the standard scheduler with additional filter and the prioritized rules. This is the external UI maybe the Kubernetes scheduler will schedule this. First the filter and then the priority and then you can see the weight here. From this slide you can see that there is a change on the futures step by step you can exclude the unnecessary node. And I had just introduced three default filters and if you want to extend the Kubernetes then it will schedule the external filter to continue filtering. After the filtering we will have the prioritization and in each of the box you can see the rules and the final score is back of each of the separate scores and it will also schedule some of the external prioritized rules. About algorithm for extended rules First we need to calculate the capacity of the node and just now we have the label of the bandwidth multiplied by 75% and this is the best practice recommended by the SCACS and we need to know about the required or requested capacity for the new part and this is the maximum value of the ingress bandwidth bus or the egress bandwidth bus and for the rest of the capacity we should use the total node capacity minus existing parts capacity and then the minus the new part request so you need to check the value if it is smaller than zero it means that we have insufficient resource and then you need to remove this from the candidate list if it is equal or larger than zero you can remain this much in this node and then about the prioritized rules or priority scores you need to use the node capacity minus the sum of the bandwidth request and then to divide it by the node capacity and then to multiply it by the weight the more the remaining resources the higher the priority score will be so these are quite straightforward rules but we of course can add some more complicated and single based rules here we have listed case for example there is a newly requested part and it request 600 mbps bandwidth and there are four nodes in this cluster node one well we didn't label one of the nodes and the three of the nodes have been labeled all of them have the 1000 mbps capacity and in node one 300 mbps have been consumed in node two 500 mbps have been consumed and node four there is no problem running in it and after the first round of the field the second and third node should be removed because node three does not have any reservation of the bandwidth and node two it is exceeding 1000 mbps 500 plus 600 it is larger than 1000 and then we need to score them the higher the remaining resources the higher the score would be so node one the rest is 700 and node four the rest is 1000 so node four will have the highest score multiplied by the wage it has a score four and node one only have one so finally we would select node four to place the pod and these are some of the figures if you don't manage the traffic for the pod then the result is unanticipable so maybe the key task had its traffic occupied by some of the other unimportant node and on the right hand side in the spec of the pod I had defined the bandwidth needed and this is in line with requested bandwidth and the blue line shows 100 mbps and the orange one shows 50 mbps so this is our presentation thank you you do have any questions we had done something similar before and we had some other shared resources which cannot be isolated and to have several questions the first one is that we are using the scheduler extension function right but actually we also have the multiple scheduler framework and can we consider of using that yes there are two options the first is that when you have to install the scheduler you cannot define the policy and then you can add some rules and the second option is to set up multiple schedulers and when you have created those parts you can dedicate which scheduler to be used and because of the resource of the network bandwidth is more similar to CPU because this is compressible and for some of the guaranteed pod you have allocated the reserved bandwidth if the pod is not occupying this bandwidth will it be released temporarily or will it be released you are completely right because this is compressible there is another question I couldn't remember my question so that's it you just mentioned the comparison compressible and however in your chart you actually filter it out with which slice I mean when you are applying them the 500 and 600 so this is not too right so on the first time you actually filter it out so it will not be allocated so your request value you already filtered it out for this note so I mean if it is just a request and then you cannot really after you filtered it out you cannot really guarantee the compressibility I mean the total bandwidth is the one so then you actually calculate the request of this note well Justin asked me he just mentioned that our bandwidth is compressible so when you are creating a pod when you are doing a pod request you have to guarantee for example in this case the 600 however when you are actually running within the pod and you are running the ads in a very low workload so your network traffic is at 9600 so we are doing that because we want to guarantee the performance when you are doing this note request then we want to satisfy your request and I cannot compress that here without understanding or without not knowing that when will be the peak in traffic in the times so if in a way only when deploying it according to the compress level I cannot make sure that for another pod during the peak in the time I cannot really make sure the performance of another pod so what we do here is I probably may have some idle bandwidth and even if I have no idle bandwidth I cannot give the total bandwidth that is not higher than the request so you actually calculate according to the minimum the total bandwidth and adding all the pod together will be not very smaller than the antenna hour or so so that means I cannot schedule another pod even if I cannot make sure that I can guarantee my reservation so this is the solution this is the strategy you would take hi hello so antenna I think the question they are asking is very reasonable antenna you said that you want to provide a guarantee for all the notes so that is quite intuitive so when you actually provide the residue of the adult antenna bandwidth do you think that this kind of strategy is very constructive so will you actually consider a smarter or more aggressive strategy yes I mean we actually we are still thinking about whether any smarter solutions by far I couldn't tell you that I can find another smarter solution because if you guarantee that you probably need to consider that you are burstable and the best I mean if I can make the results and the allocations and the way that I make and the smarter and then that is really good but by far there is no way for us I mean not yet for us to find a better way so basically this is not our solution our current solution because maybe not our presentation already saw that on our post isn't actually on the interface of an open message and this is actually and then all and I see and for this I see we can reserve some traffic but I cannot guarantee that all the workers interface and you know will have not fixed traffic guarantee yes I totally understand I'm just very curious because you know when you are making internet your request internet when you identify your request and we all know that for closer container it's changing a lot and it's keep changing all the time internet so since I do not really know your workload and I don't know that for your historical data it can be used as a threshold or I don't know whether you will consider to use them as a way of the training well yeah I mean yeah I agree probably I can develop a very smart algorithm and so I can train it use it to the training yes I mean you can properly check your historical data of your abandon of us deployment yeah I agree with you we actually considered this kind of an approach so probably we can use some very smart AI algorithm and so maybe not for the application of the networking but we actually try to use the algorithms and to be used now on the public in the cloud and you know we've actually tried the CPU memory estimation internet there were a lot of them similar algorithm but if we can apply this kind of an algorithm this kind of an abandon of us domain because by far we know that bandwidth is very very big it is not about hundreds not often a cell dense internet not best however however I don't know I'm not really sure whether it is the physical or whether it is meaningful for us to use the AI algorithm to estimate the allocation of the bandwidth I mean by far by far it's not necessary however the algorithm is not really can be used with the CPU AI algorithm well I think that is a very good topic probably we can discuss this later and for the compressible question if the internet actually allocates the difference in the traffic I mean for the guarantees there is no I mean you do not need to essentially any or allocate any traffic problem you can actually provide those traffic to other thank you very much thank you very much for participating in our demo