 Now we start my presentation, I didn't expect that there were so many people here to join our session. Thank you very much for attending my session from IBM. My name is Yang Chunbin. I'm from IBM Systems. I'm focused on Easto. One is focused on the Easto community for making some contribution. Another is running as well in the IBM cloud. Today we'll talk about four parts. For the Easto in terms of large scale clusters, what problem we have, what we have done, we'll show you some best practice, show you some warm tuning guidance. If you look at the topology graph and our testing space on IBM cloud, look at this graph. You'll see different roles, master management proxy and VA worker. Is IBM cloud private? We just identify these several roles. If you look at master, it just run the TBI API server and ATC me. If you look at management, it just run some logging monitoring and monitoring. So here are these components. If you look at the proxy, they run some ingress to do the proxy functionalities. If you look at the open node, it focus on the user application. The application will run on these workers. Different roles define different tent to guarantee that the applications can be run in its nodes to ensure this exclusiveness so that different nodes will have its exclusive nodes. It's on the IBM private cloud for the Easto control plane. The control plane runs in nodes of management. The management is parallel to Dalian city. All of these runs on management nodes. If you look at the proxy, it's focused on ingress gateway. This is the topology structure. Our testing is to run a simple online application. This application has two parts. One is from the other is backend. You can see these two parts. One part is responsible for receiving user's request. The other part is to monitor this receiving points procedure and give feedback data to the users. We have done some tests. We want to test 10,000 ports and 4,000 services. Including 4,000 ports and 1,000 services in 100 namespaces are not managed by Easto. While 6,000 ports and 3,000 services in 100 namespaces are managed by Easto. Actually in the real world scenarios for these existing Kubernetes clusters, some micro services are managed by the spring cloud. Let's give you an example. If we use spring cloud, it just proves that it must be based on Java. It's not very flexible. So that's why we use Easto. It means that it's interesting to not intrude this functionality. It means that within this existing environment, it's interesting there for this new application to be managed by Easto. So for such a case, it summaries the user's environment to do this case. First of all, we can look at this data after deploying 4,000 ports and 1,000 services, which are not managed by Easto. We can see this information if we look at the pilot and parmithus. This memory application is very high. If you are familiar with Easto, you will know the reason. We know that Easto, no matter what will be the service of ports or light or are not managed by Easto or not, anyway, the Easto will collect the endpoint service information and the information will also be included into parmithus. That's why we can see that pilot and parmithus' memory are occupied in a very high way. For the remaining 6,000 services and 6,000 ports and 3,000 services, we see that it just opened the injection after deployment. These applications are managed by Easto, and after this deployment, we can see that this pilot is called HPA. That's why we have 5 pilot instances. We can see that the memory utilization rate is not normal. We can see that it can be as high as 57 GB, but with the spite of such a high memory, that for things that values the online connection, the number is not small. Based on our design, we find that there should be 6,000 connections, but there are only 20, around 20, that if they are in such a large scale environment, this default operation should not work. It's still 1.1. We know that there are a lot of functionalities at that time. One of them is space isolation. We can just imagine that if we want to do Kubernetes clusters, we want to have these applications on the Kubernetes platform. We design them. We also consider what namespace can see these applications, or what applications can be called by namespace. We know that namespace is a large concept. Namespace will be mirrored into a group or department of one organization. It means that this group of these applications will be called by different departments or teams within this organization. So all of these information should be known. And by this still management, this information can evolve here. By doing this, we can adjust the data amount of the pilot maintenance. It means that this pilot can maintain a very large amount of data. If we can tell it that namespace 1 will be called by namespace 2, such information can be included into this platform. And so that's why it's very clear. Based on this is-to-1.1 version, within this is-to-1.1, the downloading packet, it can enable all the namespaces to have this isolation. It will just create an is-to-default. This namespace is a global isolation. It can be accessed by its own application namespace. Okay. This is the first change. The second change is that for each namespace, we do this ingratiate way. What is wrong if there is a large data traffic for the institutional system ingratiate way? This will be a bottleneck. For each namespace, we create ingratiate way to reduce or lower bottleneck problems. After doing these two changes, we can build this environment where there are 10,000 calls and 4,000 services. With the building of this environment, we can see that the fire palace is in instances to see this running. And its usage is lower than 2 gigabytes. Okay. If you check this graph, this is the data from promises. You can see the SDS, the data, the pilot MY, stated from the number of connections. We know that the number of connections is less than 6,000. We enable this namespace isolation so that these large-scale clusters can be built successfully. And the pilot MY can have this connection anywhere while the pilot's data can be transferred to MY. And if you look at the graph, the pilot's call is HPA. If the pilot's pressure is very high or higher than this default value, like, say, 8% there will be a new pilot instance. This instance will take over and void connection. Afterwards, fire palace instances here within this service match environment. Based on namespace isolation, we just do this experiment. I just share with you. The number will be concerned about this STL. We know that it's not into the application, but for the application, CPU and memory consumption, what will be like, what's the value like in terms of large-scale environment? If you look at the STL proxy, the container occupies memory. CPU is very not verified, and the memory is only 20 plus megabytes. Different ways, STL will increase this resource utilization, not very high. We also do some experiment testing to see this RPL value. In doing this testing RPL values, we use ASCAL to see this diameter to simulate real-world concurrency scenarios or cases. In submitting this request, we observe STL telemetry resources. For the STL telemetry, it just summits all of the metrics. It's a CPU-intensive application will not consume many memory, but CPU will see that it just consume about 3.5 vCPU value. We'll see the result about testing. If you look at RPL, it's 1000. If you pay attention to the STL community, it's about 2500. Why is our value low? Some worker lows, 1000 worker lows can fake. It's not very high. It's about 2 CPU 4GB memory. So that's why the testing result is not as good as that of the official result. That's why we just use the diameter to access service to track this value. So we just access our service or entrance to track this RPL value. We can see that it's about 1300 or 1,735. That's why with STL SIDAR CAR or without STL SIDAR CAR we can see this RPL value will be down by 25% better than the official result with 35% with the name space. With the name space, it's still great in grass gate way. That's why our result is better than the official result. We can see that with our community, STL is 1.1A and we also have this 1.2 just released for each version of this STL. We see that it will be updated. The community do pay attention to this information. We must have an instance to be used by organizations. That's why in terms of scalability and performance, we must identify first and make it better. So for the 1.1A version for our community, we test this value. It's a latency with adding 8 milliseconds. It's still telemetry and the policies are quite important to RPLs. They are quite important in terms of CPU consumption. That's why we migrate them to make sure to lower the CPU consumption of resources utilization or the CPU overhands. Now I will share with you some of our best practices based on our testing results. In a large-scale cluster, we use namespace isolation feature. You must open the namespace isolation because such functionality is quite important with default H2P. There will be a file called global-sider. You can just apply it. It can just take into effect for it to name space. They can isolate them with each other. If you want to do some modification, you can just add in this card. You can add this name space that you really want into this platform. And also for each namespace, we should fix or install this in large-scale way for each namespace. You can just take reference from the community. Number three, in terms of tannometry, tannometry is a CPU intensive. It means that it will consume CPU consumption. If it just runs in port, the tannometry will deal with a large amount of requests and will disoccupy a lot of CPU and will affect other performance and will affect other nodes. That's why I advise this tannometry to occupy one node so that this tannometry can run on this inclusive node. In terms of resource requests, you want to run 6,000 parts and 3,000 subsets. We use an issue to do this management. So that's why we advise six parallel instances with 4V CPU and 4GB memory for the tannometry and with 4V CPU and 4GB memory. And in terms of policy, when it was 1.1.1.2 version, the policy is disabled by default. In terms of the policy, it will affect the RPS value. We know that this tannometry will be divided into two parts. One is tannometry and the other is policy. The tannometry will go to the report, so it won't affect the request. The tannometry will not affect the latency, but if you look at the policy, the policy will have an impact on the policy of the latency. If you use this policy functionality, it can do this secondary authentication based on some content forwarding, some content. If you wish to avoid this, I will forward this request to other channels to see whether your request is legal or not, whether it is in line with our rules. If it's not in line with the rules, that's why we have this new link. It will have some differences on our request type. If we do not have this functionality, we just advise the developer to disable the policy component to increase the traffic circuit. I didn't mention the promises I just didn't share a lot of data from the promises. Let's go back to the last slide. You can see these promises stated here. If we have 10,000 posts and 4,000 services and the CPU and memory is consumed very high, you see that 66 gigabytes, they're high in case we collect a lot of data within the promises. If we need to adjust promises, there are two parameters that we need to adjust. Firstly, retention. The second is request improv. Retention is that promises how many times we need to store your promises. I think the default is six hours, so you can adjust the time to reduce promises because currently all of the data is stored in the storage, in the memory. And the second is plug-in to law. How many times you need to gather the data and you can adjust these two parameters to reduce promises over head. And if you're all in open source, you know that there are a lot of operator-related stuff. So one of the new directions is that you can have these add-ons, promises, or Guan Fana, or IGAR. These are all instituted add-ons. And we can add these in add-ons to deal with because for promises, they all have their corresponding operator. So Istio will not need to manage whether I need to build a promises cluster, whether I need to store different data in different databases. So in Istio, you don't have to care about it. Istio only cares about its own control. I remember in 1.2, we already support Yager operator. And there are some more tuning guidance. Firstly, keep a live max server connection age. So in our community, we might encounter a lot of problems if we don't have these values because pilots can support low-balance by HPA. However, pilots... So if there is an instance, then the pilot is already bigger than the predetermined 80% value. Then there will be some new pilot instance. You can observe that although there are new pilot instances, pilot memory or pilot CPU usage is very high. And by promises, you can look at the SDS value and you can see that 99% of the links are still in the first pilot instance. And for the rest of them, there are very little links. And this can be considered as a bug because we did keep a live max server connection age. And this tuning knob is that we default... The default value is 30 minutes. And if within 30 minutes we don't send requests, then the request will be broken and you have to re-send the request. And that will ensure a very good low-balance because the new pilot instance is built a link with the ONY. If the ONY and the first pilot instance is always in keep, then it is not possible for it to transfer to some other instance. And the second one is concurrency. The default is zero. And concurrency is actually an ONY value and ONY will deal with the ONY logic. And WorkSwipe is based on nodes vCPU, node CPU or nodes CPU core. To give an example, if the node is an 8-core node, then every side card or every ONY will have 8-core WorkSwipe to deal with all of the stuff that it needs to deal with. And that will influence nodes' performance in general. So by introducing this concurrency value, you can have the flexibility to balance the whole load. You can adjust it to two. That means ONY will only have two WorkSwipe and deal with the request that ONY needs to deal with. This is telemetry filter. For telemetry filter, the default is it collects all. So in Istio 1.1, it will have an STDIO shutdown the STDIO filter and access lock will also be shut down. If you have tried 1.1, and you can see that in OnWord, there are not so many data because the filters are shut down. Otherwise, there will be a huge amount of data and it will also influence the memory. It will increase the memory usage by ONY. And in terms of telemetry filter, our recommendation is that the default is it collects all requests. No matter the request is success or failure. So when in Define Room, you can have one more expression which means that it will ONY collect the 404 error or a certain kind of error and that will significantly reduce the number of error and it is also helpful for the future troubleshooting. The fourth is tracing. The default is disable. The reason for disable is because firstly, it has default of all-in-one tracing. To be detailed, all of the data in tracing is reported to a collector by OnWord. And EGAR collector has an open tracing protocol and it is open source CNCF program. So EGAR includes three parts. Firstly, EGAR agent, EGAR collector and EGAR firing. If you don't run it in EGAR environment, EGAR agent should be run in every node and it will be responsible to gather all of the data on each node and after it collects all of the data, it will report all of the data to a collector and then EGAR port will query the EGAR collector. So that is the whole tracing process. However, in EGAR, OnWord has already realized a tracing protocol. OnWord will take EGAR agent, will take EGAR agent's role so it doesn't have to, you don't have to install a little additional agent because OnWord can report all of the data to EGAR collector because in EGAR, when you need to save the data onto database, does it mean EGAR has to consider which data to be saved, central or other stuffs. So EGAR is reluctant to do this kind of work and it is also not within their EGAR's range scope. So it has an EGAR ONWORD in default, which means all of the data is in the memory. So EGAR port, when we restart the EGAR port, every data is erased. That's why we need to have this kind of tuning and then another reason is that we use EGAR ONWORD and EGAR, if EGAR ELECTRE needs to have a certain amount of instance and EGAR we need to have a certain amount of instance. All of these will not be managed by EGAR. So if you have an EGAR agent, the agent will know. So EGAR agent only starts after EGAR ELECTRE at 1.2. And then HPA thresholds for telemetry and gateways. The default is 80%. However, because HPA has a tech interlock and it also has a curly thing, it can tolerate the deviance and the deviance is 10%. So in our production environment, I recommend that the default should be set. The default of telemetry and gateways should be pre-set before. You don't have to need HPA feature because HPA has three minutes of latency in default. So when we have a very large request, the pilot or inverse gateway will not be able to have requests for some additional requests timely. That will lead to a latency. That's all for my speech. And now I'm open for questions. I'd like to ask one question. You talk about in every namescales to have an inverse gateway. If I have in the system to have an request and scale it up, is there any difference? Well, I'm more focused on a large-scale environment. To do the maximum scale, it's six. If you scale up a lot, it is also possible to solve the problem. However, how do you handle it in the terminal?