 Good afternoon everyone. Today we would like to share with you about E-Steel for the container containerized cloud foundry. Some of our experiments and also show you a demo later. My name is Tui Xiang. My colleague, Zheng Gong Grace, we are from IBM Beijing, from the Beijing Development Laboratory. We have been engaged in the cloud development work and platform work and also bot-related tours. It is a purely open-source tour. Also, the container or rice is some of our work. Have you heard about CF or E-Steel or cloud foundry CF? This is the open-source Sumita. We would like to give you a brief introduction of cloud foundry. This is our agenda. At the beginning, we introduced the project background and also a brief introduction of E-Steel and then motivations for the CF, E and E-Steel integration. Grace will give you a presentation about the work we have done, including experience, drawing and lessons learned and also the future vision. The last part is about the demo and the use cases. This is a brief introduction of cloud foundry. It's a typical path and path. Platform is the platform with the applications and the deployment of it to the platform for running. So the highlight is it can use the CF code from the soft cloud catalog to run the application to the platform so you don't need the compiling and packaging. At IBM, cloud foundry was established in 2013 and IBM cloud in 2014 had the first public path platform. It's on a larger scale. Now on the platform, there are more than 1,000 AMs and also based on cloud foundry, we have the small and medium sized companies and individual enterprises for dedicated platforms. For example, the 10-VM level to employee path for isolation and the running of application. This is the introduction of the workflow. From this chart, you can see four path platforms. It has a complex container. It doesn't always work long within the basic microservice. Each component has different drops and each of the drops have its functions. So it's different from microservice. One request comes in from the left in grass and then it will seek for ABS for recording or logging. Then the auction year, this is a scheduled, the scheduling makes it possible to find the appropriate container at the background for deployment of staging or compiling. This is some of the garden is a specific container for each path platform. It has different cells, each cell, capacities different. Some of them may already have been fully used. So the six for the low workflow for deployment. The binary in the package will be stored at the droplets. So for this is for the preparation or for the future use, you do not need to go through the compare. You can use this one for scaling. This is for the registry and the log collection, the log gate to collect and the traffic controller. You can have a supporter to receive the log in 2014. Actually, in the past several years, we are based on this platform. And since the year before last year, we hope that the cloud foundry is not deployed as a platform instead as a service. In the past 50 or 60 years in such a big environment, the customer or client cannot make a deployment on their own. It requires the in-house team to deploy it. We want to migrate it to Google so that the clients can use our portal for deployment. In this way, it's a one-button deployment. You just choose the basic date centers and also select the size. For example, the number of the cloud cells within the selection, then only one button, you can make the deployment two to three hours. You put some cluster under them. On the cluster, you can deploy the containerized DCM. So the process is only about two to three hours. In the past, it took several days. These are some of the core features. It's isolated. The client has the coupon individually. And we also enhance it. It can use the private services, including the communication, also our DP connection for ASM protocol storage. They can use the private connection and the users can manage them on their own. Doesn't need the operations protocol. The operation team doesn't need to get involved in. This is an introduction for Istio. So it's a match in recent years has become a hot topic. The important component is the pilot. Pilot is the basis for traffic pilot route to have the high granularity control. There are many practices. For example, we try a timeout or a circuit. When there is an attack against the component, you can have the circuit breaking. For maintenance, it's also very helpful. For example, for upgrading, it can provide AB upgrading, can narrow upgrade. Based on your data center capacity, it can provide different functions. If the capacity is large, then you can have a double size for AB upgrading. So the metric can be cut or switched over. If it's not a double size, then you can have a 10% of traffic moved to the new one for gradual running. So it depends on the specific needs. The grant, the traffic management and the security, these are their major highlights. Security, here you can see the whole component. It provides a service management and certificate generation. If you do not provide CAs, then it can help you to release the certificates for deployment for the components. You can go through TTP for the proxy. It can distribute the service and add the mutual TILs for external service. You find this external service out of the service wise, then it doesn't go through the mutual talents. In this case, it has the security mechanism to protect the mutual TILs distribution. This is about the motivations for containerized CF with these TILs. Actually, traditionally VM-based cloud foundry. In fact, we had in-house back-end team to do it. Either for deployment or solving problems, basically, there is a manual control over when the self-pub form turned into a service. The customers, they can deploy many instant services, so it's impossible to allocate individual operators. So we want to help our users to achieve self-management. So this is one of the motivations. We found that Istio provides us with some capabilities in nitrofin management to recover the fellow thoughts or scenario or AB, which help to reduce the burden of our teams so that our customers can self-manage and self-upgrade. So this is the main motivation we have. So these are the words for the experimental stage and tools that we used. We used Yeller for over-chasing and monitoring and Grafana, Miani and Visor are all for displaying the data using images so that we can see the problem very clearly, like at one stage the traffic volume is high or at one stage there are some inventories or job fails. In fact, here are some summaries for operate functions that we used and metrics like in tracing are also very important. In terms of open tracing, we had some pilot projects based on Jagger or other projects, other products. We need some introduce, some codes, not only rely on the envoy or Istio to organize the trace because we need header to deliver the code from upstream to downstream. We need a code intervention and it is kind of complicated. There is all-routine for job distribution and also using the subscribe for delivering jobs. Later, we don't have much time to deliver research on this so we want to focus on the contacts or header and chart them or connect them together from receive the request and then deliver to next header. Probably we can connect them to present a good trace map then when problems occurs, we know where it fails or proceed in exact time in the past. It's about milliseconds but it takes a few seconds to solve the problem so the job queue might be failed. In terms of VM, we found some of these services encounter the networking issues so the job queues are waiting for the problems to be solved since it cannot be finished in the short term so the job queue fell afterwards so by using open tracing we want to solve these kind of problems. In external service, when may we use a QDB? Some persistence can be recorded in remote db. The drop-in will be recorded in the promo so for Istio we want to monitor it to see if the network works well for the QHDB whether the speed is proper, whether the service is stable. It is related to platform so in fact some good certificates have been managed quite well. By using Istio we can see if we can do some search-rotate to it automatically, otherwise one might need to do the search-rotate every three months. We want to take Istio to make it work automatically and now I will give the floor to Zhang Gong. Good afternoon, Zhang Gong. Thank you. Matt talked about the last platform and containerized function and why we used Istio and CIFIT for integration. So we want to see how Istio can bring us with some new values. I want to see when we start to use Istio how to solve the problems, when we work with CIF operators team how we learn from their experience and use Istio to solve the CIFS operating scenarios. So if you know about Istio on the official website there is a book info based on microservice. There are four microservices in reality, not all the services are based on this so CIFIT is based on the traditional CF and containerized. Inside of CIFIT not all the services are microservice designed and not all of them are founded in the past. In the VM-based platform it's service for service. In the past we know the resource of VMware and there's a decoupling function being applied after containerized for the first step. One Kubernetes deployment can be made to several servers but Istio takes one deployment to one Kubernetes service. Then in order to maximize the Istio's performance on the official website you can see some principles like we want to set the part of the service and understand the protocol for each part and take the endemic in accordance with the protocol if we want to define the deployment by adding some labels. Then for distributing the traffic or tracing can be smoothly applied afterwards. In CIFIT we also redefined it. Another question is that inside of CIFIT you can see that CIFIT itself is very complicated and is a very large part of the platform. It has many complicated traffic flow and based on previous designs for example in internal service theoretically Kubernetes should use Q&A or QANAS to assess those services. However due to the historical design some of them will go for externally and then the low balancer and then inside service and there are some reliance on the external service which use different protocols. At version 1.0 we spent a lot of efforts because it is defaulted that the traffic will be used externally and we feedback to the containers for the version 1.1 the function was slightly changed so that you can better enable different services at the same time in CIFIT as what we just said there is an ongoing IsoVC with CIF design to smooth the traffic flow. Another is CIFIT service is based on IEM. Kubernetes platform many of the functions are relying on their features. One example here IKS IsoV is a tailored expended Kubernetes. We used to rely on its features inside Istio. There is a gateway many of the functions are overlapping so when we start to use Istio we need to decide if some functions might be affected. So there is ingress chat so go through the ALB to install some services we'll face with similar case there could be a gateway so that you might need to decide whether to use it or not on a Kubernetes platform there are some tools to manage it and probably it's overlapping with the certificates management so we need to decide if to use or not about tracing. Istio provides us with many good features which has no intrudence for those apps but in terms of trace we will talk more later on. I talked about some problems we ran in the before and these are our future vision. CIFIT is based on a containerized and open source platform and it is managed by us and IBM. The backspread is for enabling or making contribution to the community and CIFIT is trying to make improvements. That's what we said it is not cloud-native design for example the deployment here is using home charts to manage the whole CIFIT so the community is trying to improve by using the operation of Kubernetes to deploy the operator itself also have difficulty to operate. During the upgrade they will wait for the v2 version to get ready and then delete the v1 version version one so such capability and the Istio's capability will be further integrated. The second direction is the urini project that's what we just presented to you about the platform CIF by using its control board which can deploy it on the Kubernetes platform. We can also deploy on the containers platform and this urini is re-incurriting by the platform and it is based on cloud-native design so inside the urini we want to enable Istio so that Istio can further serve the urini. Another direction is about the Istio mesh expansion it's a new feature simply put the control board of the Istio is deployed on the Kubernetes so we can use it to manage the VM in terms of the value to the CIF by the communities of CIF have features that are VM-based the experience or the best practices that we get can be re-applied on the VM-based CIF and then about during the integration we were discussing with the operators to sum up the pain points and how to use Istio to solve those problems first of all as an operator or manager during the operation they might want to know during the operation of CIF how is the condition and here you can see so four of the Istio thinks that these tools are being already embedded inside of your installation which includes Rejabber, Prometheus, Radaven and Visero is another visual which can help you for tailored services so that you can see the addition of full service mesh and I'm going to present you an example so first I want to know how is the condition of the service mesh this is the tailored traffic inside a service mesh and you can see the service on cv and order traffic so the pain points are traffic many traffic are going to service you can locate it down to the hot spot service and for the key early you can see the details of example the specific protocols on the right side you can see the requests and the responses results when the service mesh here you can see it can give you intuitive image of the failed requests and also you can see the specific details of the request for the Visero is based on the permissions data the previous interfaces can be a tailor made but if we want to have the further detailed troubleshooting then we need more detailed data that is the data of overall status now for the Istio how can we get the metrics and logging and tracing data from Istios from the tracing from the VM at the base layer and the Kubernetes you can use the platform capabilities to get the monitoring logging data and to the upper level the the components and the traditional CF also has the component metrics and the logging data Istio can provide the upper level data including among the different services and it's the ready for use it fills up the tracing gap this is another demo This demo is about metrics and checks here we take a look at the installation by default you can see the reset metrics and these are all the requests and the total of requests you can see from which service to which service and the response code of each request you can get all the information this is a preset but to help you with some the real problem you can customize the metrics for example the instance which features or which properties you should capture and this is some of the customized sequence request we want to capture the request capture the service from and you can have the metrics and all the attributes for example the service you have the key part you can also define it this is the trees for troubleshooting this is a process this is a push up a process In the past SRE team took the safe architecture but you may feel confused but with Istio's integration you can open the key already directly you can clearly see the option near service failure and then we can further open yet to find out the service and then deal the previous hour here you see precisely the trace of failure you open it then we get the details this trace is captured when we send htv request to create a new task when we push for new task creation has the azure returned back to the the slides I mentioned another issue that is the trace enabled enabled on the left side the trace is from Istio's official website that is the example I showed you just now you can see clearly when the user sends a request to the final response the subang subang isn't the call isn't the calling and for each step of duration we can help you to locate and define the problem and the trace you get the the default to spawn the request goes through my if there is no response then you can have a root subang but within the service if the service is complex you want to create a functional level trace then you need to in the app after the information to build the spawn when you call next service all the trace headers must be assigned to the next service another use case is about the elasticity or the resilience we can use Istio capacity principle of the circuit breaking the circuit breaking means that when the service becomes very slow the calling of services retrying then it may cause a failure of the whole chain so it may cause the very bad experience so the circuit breaking principle is that when you set up the circuit breaker when it's triggered and when many requests come in it can go back to your 5 0 3 error i will show you a demo we use the client and the api service to create 10 connections to send to send 2000 requests you see without any operation the api returns 100% or 200% of normal return and then we build the circuit breaker on the key already you can see the the symbol or the mark and you can see the configuration in details the maximum tcp maximum is five each connection has a five as a maximum and the pending request is number is one as a max or at max now i show you the results i use the same traffic for access the api server so you can see it's triggered now this 99% request is success we enhance the currency now it's a 20 we can forecast we can get for you 200 return so we enhance connection or increase the action to a 50 then 30% of requests have the 200 return the use case is that in the traditional cf platform we had the critical impact problems for example one client has the massive creation of users when all the users access the api trip to get the the information is very slow so we use stl circuit breaker to protect the api server to give it some time for repair or for recovery this is the another use case we want to use the stl capacity to support cf overgrading after the deployment the process of grading was to make sure it's normal after deployment of a cf we should make sure the application deployed is accessible and the service the application can still be deployed based on Kubernetes rolling up a date but you know every time you only roll out one part you could can support the deployment and the kind of deployment however it has some dependency you need to keep correcting the label and the deployment has the dependency on the length of the base layers for traffic control so we want to use stl to have a simple and intuitive upgrading also some fine ground management this is the time of stl for api service overgrading firstly i show you the api deployment the only version this is the only version from krl you see the traffic 100% distributed to api only version so we deploy v2 as a beginning only api group v1 and v2 from the krl the v2 version is deployed i do not need any deployment resource kubernetes from the service stock for part it's a rolling rolling out so from v1 to v2 the traffic is 50 to 55 percent we use stl for connect via rollout we define the destination rule that is the concept of destination it's about access to the api and upstream and also we differ in stl virtual service the virtual service means that you get the request and traffic how to do a further routing after the definition from the krl also we see intuitively two versions each version in for example has a v2 or 10% of traffic distribution via 90% of traffic distribution so we can run some tests the reduction traffic can have a 10% to a v2 full test when the test is passed we can draw it out we can also have a self-customized v2 service when all these pass the test then we can upgrade to v2 we send the stl virtual service zero downtime to distribute traffic to v2 version the system the upgraded demo okay that's all from us we're already running out of time if you have any questions you can come to us later thank you