 Hi everyone. Welcome to Istocon Virtual. Welcome to our speak. We will be talking about expanding horizons, advance deployment strategies in multi-cluster Kubernetes environments. We are Eduardo Bonilla Rodríguez and Francisco Perea Rodríguez, and we are going to present the agenda for today. So first of all, we are going to give a bit of introduction of who we are, where do we work, then we are going to talk about the why. So why are we presenting this speak, which problem are we trying to solve with it? Then we are going through all the technologies that we have used for develop this solution, which is the concepts deep dive and then we are going to talk about the how to. So how have we implemented this solution? Then we are going to go through a demo, through a live demo, and finally we are going to talk about the conclusion. So let's go with the introduction, who we are. Hi everyone, it's a pleasure to be here as a speaker at Istocon. First of all, I would like to thank the organization for the opportunity of being here and also thank the attendees of this session. I'm introducing myself. I am Francisco Perea Rodríguez, Cloud Consultant at Red Hat. I'm focused on service mesh and GitOps projects. I'm passionate about the these technologies and other cloud-medic projects as well. And today I want to discuss alongside my colleague Edu how to integrate service mesh and GitOps tools to create an environment with complex deployment capabilities. Thanks, friend. So I am Eduardo Bonilla Rodríguez. I am a customer and a success engineer at Solodotayo. I am focused on Istio and Envoy in cloud environments. And I am also passionate of all the CNCF projects. That's why we brought this topic in which we will be using Istio as the key point of a solution in which we will integrate several CNCF projects to be like GitOps approach. So let's go. Well, I would also like to thank you the organization for letting us be here and also to all the people that is attending the session. So let's go with the next topic, which is the why. So here we will be talking about how, like which problems are we trying to solve with this presentation. So imagine that we are a huge company, for example, let's say that we are a bank and we run different clusters in different cloud providers. We also have on premise clusters in which our development teams are working to give a service in production to our end users. We have a high availability architecture. That's why we have several clusters deployed in several availability zone because we don't want any outage to our end users. The development teams of the banks have been working on the Hello World application, which is currently running the version B1. And they are facing some challenges because they want to upgrade this version to B2 version. This is a tough process at this moment because they have to upgrade this application individually in its cluster and it can have some errors because they don't have a centralized process of doing that. It is also very time consuming because they have to do this process individually in each of them. So let's go with the current process. Imagine that they deploy now the B2 version of the Hello World app and they get this version in progress. What would happen here? They currently have a load balancer that is front in this application. So when they have the version two ready, they can get rid of the version B1. So let's get the service now to the version B2. And as we are seeing here in the cloud provider cluster, everything went fine. We are giving service in production to our end users. But as this is a separate process on every cluster, we can find some errors in which in some clusters. For example, in the on-premise cluster, we have now the B2 version in red, which means that we found an error and this application is not able to give production service to our end users. So how can we avoid that? How can we have a centralized procedure to treat every cluster as if it was only one? It doesn't matter where our applications are running. The bank has started thinking about this, so they decided to install some CMCF technologies, mainly focusing on the project in Istio. So they deployed Istio in both clusters, also the ingress gateways and east-west gateways for providing multi-cluster communication, both north-south and east-west. So let's see here the flow of the communication. We will have no traffic from the ingress and east-west, all the clusters will be communicated between themselves. And we also provide an extra layer of security using Istio. As you can see the log, now we have secured our communication between our services. Apart from the multi-cluster communication, we need a half-cluster, which the bank deployed, for making all these processes automated and synchronized. That's why they decided to take advantage of all the Istio metrics and federate them in a federated Thanos running in the management cluster. They also installed argocity multi-cluster in order to sync all the applications and Istio resources across all the clusters. So the key point here is that we will use argorollouts and make use of its perfect synchronization between argorollouts and Istio to take the metrics that we federated in Thanos to make all the roadmap processes automated in a tested way. So argorollouts will be taking the metrics from Thanos and automating some tests in order to upgrade the application V2 in every cluster at the same time, taking a look at the same metrics. So we will have now a centralized, automated and tested process. So that's a bit of our architecture and how will we solve this problem? Having said that, we will go to the SS slide with each concepts deep dive. So we will be talking here about the technologies that we have implemented in our solution. First of all, let's talk about Kubernetes. So who doesn't know what is Kubernetes nowadays? Kubernetes is an open-source system for automating deployment, scaling and management of front-line-rise applications. It is planet-scale, so it's designed to allow multiple companies to run billions of containers a week. Huge companies are using them because they can scale without increasing your operation fee. They never add grow. It doesn't matter if you run them locally or if you run them on cloud. They have a lot of flexibility and consistent, so it's very easy to change them from environment. So this leads us to the last topic, which says that Kubernetes run everywhere. Here we have a bit of architecture of our three clusters, the half cluster and the cluster one, cluster two, which are the workload clusters. From the half cluster, we will have the central management, the monitoring and the applications deployment from Argo CD multi-cluster. So we will have here all the metrics and the management from Argo CD. In our workload clusters, we have installed several of them in order to provide high availability in different availability zones. We will also deploy here Argo CD rollout, and we will install Istio for making all the applications networking. So the key point here, as we said before, it's Istio which will provide all the traffic management. Both multi-cluster is West and North. Also, it will provide us with all the metrics, all the servability that we will later take for automating our test. And it will also provide us with an extra layer of security. Okay, thanks, Edu. Let's talk now about Argo CD. Argo CD is a declarative. GitHub's continuous delivery tool for Kubernetes. Argo CD was accepted to CNCF in March 2020 and is at the graduated Project Matri 11. The Argo project is a suite of tools for deploying and running applications on Kubernetes. These tools are Argo workflows, Argo CD, Argo rollouts, and Argo events. Argo CD provides the ability to manage applications and infrastructure in a multi-cluster environment, a feature that we will see in this presentation later. We continue with Argo rollouts, which is a Kubernetes controller and a set of custom resource definitions, which provides advanced deployment capabilities such as blue-green, canary, canary analysis, among others. Argo rollouts integrates with ingress.glores and service messages using their traffic-shutting abilities to transition traffic to the new version during our update. Although this is an optional feature, it's impressive when working in environments with Kubernetes, ServiceMesh, and GitHub's model. We are using this feature in this case. Regarding monitoring, we should talk about two projects, Prometheus and Thanos. I'm sure that you all know Prometheus. It's a monitoring project which was accepted to CNCF in May 2016 and is at the graduated Project Matri 11 as well. Prometheus is a real-time, time-serious database with a query language designed to provide aggregate insights from data series. On the other side, Thanos is a height-available Prometheus setup with long-term storage capabilities. It was accepted to CNCF in July 2019 and is at the incubating Project Matri 11 so far. In this case, we use both Prometheus and Thanos to setup a full monitoring stack in a multi-cluster environment. And Istio, Iali, and Argo rollouts will use this monitoring stack and we will see later in this presentation. So, how have we done it? As we saw before, the top level architecture comprises three Kubernetes clusters. A half-cluster and a half-cluster and two world-level clusters. On the left side, we find the half-cluster, which has Argo CD and Thanos. Argo CD has the Git repository configured and the applications and application sets are created there. Argo's application controller creates the user applications on the world of clusters. Besides this, Thanos has stored the metrics received from Prometheus deployed in the world of clusters. On the other side, the world of clusters have installed Istio, Prometheus and Argo rollouts. Istio is installed to achieve the multi-cluster setup, adding a security layer by default with mutual DLS and traffic shopping abilities. Prometheus generates application metrics, which are stored in Thanos as I commented before. And finally, Argo rollouts, which will deploy the user application, in our case, Hello World application. About Istio, in this case, Istio is installed in a multi-primary and multi-network mode. Thus, we have set up a single mesh with more than one primary cluster on different networks. Both clusters have visibility with the API service of their cluster from the point of the story. Service workloads across cluster boundaries communicate indirectly. This communication is done via dedicated gateways for this worst traffic. The gateway in each cluster must be retrieval for another cluster, obviously. Argo CD is a Kubernetes objects generator. Argo CD liberates gate workflows and tools like help, customize and Kubernetes manifest to manage infrastructure and applications in Kubernetes. Argo CD is installed in the Hub cluster and it's connected to Git repositories, which is the source of traffic. So how does Argo CD work in our case? By using the app-op-app spatter, in addition to application sets, we create. And a main application which generates two application sets. This is a bootstrap application. An application set which deploys the Istio resources used in this case. This is the gateways, the virtual service, and the destination rules. Finally, another application set which deploys the rollout object in both workload clusters. Finally, the objects created into Argo CD look like this. Here we see an app-op-app spatter architecture. On the left side is the bootstrap application. The bootstrap application creates two application sets. Each application set creates multiple applications that deploy the resources in the workload clusters. The monitoring stack is configured with federated Thanos web prometheus. Thanos is installed in the Hub cluster and the metrics are restored and centralized there. Here, Argo rollouts and also Kiali will use Thanos to query metrics. Prometheus is installed in both workload clusters and it's used to scrape metrics of the applications and Istio components deployed in its workload clusters. Finally, two additional resources are created to scrape metrics and send them from Prometheus to Thanos, the boot monitors and the service monitors. These resources are created in both workload clusters as well. Finally, let's talk now about Argo rollouts and the rollout process. A rollout resource is created in each workload cluster. The rollout is the definition of the Hello World application, which is the applications used in this case. This resource is used by Argo rollouts and it's very important to bear in mind the highlight lane template name because this means that the rollout will use an analysis template to roll the application. How does Argo rollouts work? Once the Hello World rollout is created, Argo rollouts creates the Kubernetes resources defined in the rollout. This is the pod, the service, the connected service and also a virtual service has been already created by Argo City. The next step is to update the rollout. In this case, the rollout is changed to use the version 2 of the Hello World application. At this point, Argo rollout starts deploying the version 2 of the application considering the analysis template configured. For this, both version 1 and version 2 application needs traffic because we are measuring Istio records total metric. Both applications are receiving traffic. The virtual service is being updated by Argo rollouts to split traffic between both versions. If the version 2 of the application starts successfully, Argo rollouts configures the networking resources to route all the traffic to this new version. Finally, the rollout process has finished and all the traffic is route to the new version of the application in both workload clusters. Thanks, friend. We are now going to go with the demo to show you this scenario in real time. We have prepared a video for you to see it. As you can see, we are using Temuqs. We have three windows. On the top window, we have the half cluster in which we are going to apply the application set. As you can see here in Argo CD, this application set will deploy the Hello World app in the workload clusters and also the Argo CD rollouts. This is deployed as an app of apps and we will see it now in the workload clusters. In the left window, you can see the Hello World application that has the version 1. And in the right window, you have the workload cluster 2 with also have the Hello World running in version 1. We are going to let what's running for the Argo rollouts object in which you can see that it is stable, ingrained, and both of the workload clusters has a version 1 which means that they are running the Hello World v1. In the top screen, we are going to let Argo run in every time. Also, you can see here that we are hitting the different pods from cluster 1 and cluster 2. So with the help of Easter multi-cluster, we are distributing the traffic between both clusters. From Kali, you can see with a better visibility the traffic that is going from one cluster to another. We have here the ingrained gateways and the eastward gateways and we have the traffic being split between them. You can see the percentage here and you can also see that we have the lock which means that the communication between clusters is being encrypted. So we did this for having a better visibility and what we are going to do now is change the Argo rollouts from version 1 to version 2. Okay, thanks, Edu. So now we are going to change our code to use the version 2 of the Hello World application. Just call it and boost the code to the GitHub repository and let's go to Argo CD interface to force the refresh of our application. Even this process is automatic, Argo CD replaces the application between three minutes by four. Here, I'm forcing it for demo purposes. At this moment, after the refresh action, we can see in Argo CD a new resource analysis run in progression state. This means that the rollout process has started. Now we can see in the top of the terminal the version 2 of the Hello World has started receiving traffic and also the Argo rollouts are in progress in both clusters. We can see it on the bottom of the terminal. See that the revision of the rollout is to reduce its statuses canary, so the traffic is being route to the version 2 gradually. Also, clearly, the graph has been updated showing the version 2 of the application and the percentage that will gradually increase. Here, we can see a 5, a 6, even an 8 percent. What's happening now? Both versions of the Hello World application in both clusters are receiving traffic. Argo rollouts is queried in Thanos to fetch the Easter request on magic in both clusters so that the rollout process is progressing between the step configure. Here we have 8 steps in our rollout. And Argo rollout is configuring the Easter virtual service to change the weight for each version of the application. Now, we just have to wait for the rollout process to finish. So far, the rollout process is in the step 7 of 8. It's very important to see does this process is running simultaneously on both clusters. Now, we see that the application in Argo CD is in sick status, so the rollout process has finished. At this point, all the traffic is being routed to the version 2 of the Hello World application. Let's check in Kiali the progress and the traffic distribution. Here, we must give some time for Kiali to refresh the state, but we see the percentage is increasing bit by bit to the new version 2 of the Hello World application. Also, we can check that both ports are running in each cluster as one version 2, but only the version 2 is absorbing traffic. This is because the Easter virtual service has been configured to route all the traffic to this new version 2. Finally, after some time, Kiali refreshes its graph and the version 1 of the Hello World will disappear shortly. Yeah, as Fran said, we are making all this process synchronized in every cluster. As you can see, it was exactly simultaneously in both clusters as they are both being federated using the same metrics. So both Argo rollouts, even they are independently, they are taking the same metrics from Thanos Multicluster. An idea that Fran and me got to improve even further this process is that we created an issue in Argo in order to have also an Argo rollout synchronized for multicluster deployment. And as we can see now, Kiali finished and all the traffic is going to the version 2 application. So if we go to our conclusions, we have led you here some links of some blogs that we have created. These blogs are writing for this solution to be deployed in both EKS and KBM, but it's very easy to change them from one environment to another. And also we have led you the github repos in case that you want to deploy this on your environment. And yeah, we wanted to show you, how can we treat an environment which is multicluster as it was only treated with one process as if we only had everything in one cluster using Istio as the key point between multiple technologies, making this process automated, tested and easy to manage. So thank you very much to everyone for attending to this session and we really hope that you have liked it. Thank you very much for the comments. I hope you find our session interesting and helpful and feel free to reach us on social networks to communicate with us any topic of this presentation. Thank you very much.