 Hey everyone, welcome to this lightning talk on using Orgo workflows to curate Kios engineering with litmus kios. This is Sairna and I work at harness at senior software engineer and I'm also one of the core contributors to litmus kios. So to start off with first of all, we need to know what is resilience. So resilience is basically a system's ability to sustain a fault and bring itself back up. Let's say a pod gets evicted from the node. So what is its state? Is it healthy or not? So it is healthy than it is resilient. So the period, the time period between going down and bringing itself back up is called resilience. Then moving ahead, downtimes are expensive, not just in terms of money, but also there are other aspects as well. The secondary factors which are equally important as money, that is, some of them are loss of customer confidence, damage to brand integrity, loss of employee morale and a lot of others. So keeping all these factors in mind, we definitely want to avoid downtimes at any cost. So one of the way to achieve this is adopting the practice of kios engineering. So with that, I would like to introduce kios engineering, which is the process of testing a distributed computing system by injecting fault intentionally. So the goal here is to identify the weaknesses in the application through control experiments and check if it can sustain, it can withstand unexpected behavior or not. So how it is done? First of all, you need to identify the steady state conditions. Steady state conditions is the desired behavior of the application in a given scenario when it is healthy. So first of all, you identify them, then you introduce a fault intentionally, and then you check whether the SLOs are continued to be met or not. If yes, then it's very good, and if it's a no, then we can fix the issues and then come back and we can continue the same cycle again and again. So yeah, moving ahead. With that, we can introduce litmus kios, which is an open source cloud native kios engineering framework. And it also has the cross cloud support as well. And it is now a CNC incubating project with adoption across several organizations. And its main mission is to help SREs and developers to find weaknesses in both Kubernetes and non-Kubernetes applications as well. So lastly, I would like to talk about why we shifted to Orgo workflows in litmus 2.0. Before that, I will talk about what was happening in litmus 1.0, the first version. So in litmus 1.0, the kios injection was basically done by just applying this litmus CIDs such as like the kios experiment manifest and all the other components. We just did a kubectl apply of these manifests. And there was no proper visualization. And it was only one experiment at a time. The kios injection was done only one experiment at a time. So keeping all these facts in mind, we decided to integrate Orgo workflows with litmus kios. And these are some of the features, the main reasons why we decided, why we chose Orgo workflows. We can define workflows where each step in the workflow is a container. So as I said earlier, the kios injection was done only one experiment at a time, we can execute only one kios experiment. But with this, we can have multiple kios experiments spun together as a one single workflow, which is considered as a kios scenario. So each kios experiment can be a single container. Then second, second point moving on to the second point, proper visualization of workflow or the kios scenario can be done using a graph. So that is also provided by Orgo. Then next is individual template in the kios workflow can be tuned independently, like we can tune each individual template without affecting the whole kios scenario, the whole workflow. And then sequence of steps can be changed as per the requirements, such as like, let's say you added experiment one first, and then you added experiment two, but later you decided you want to alter the sequence. So you can like, we can also do that with Orgo, you can put move the experiment two in the first. So that can be done. And lastly, this is also one of the important features, that is, the experiments can be run in both parallel or in sequential manner, which was not there in litmus 1.0. So yeah, so these are some of the important features and reasons why we decided to shift to Orgo workflows. And that's it. And now Amit will be giving a small and a brief demo of how the kios engineering can be done with Orgo workflows and litmus kios. Thank you. So hi everyone. I'm Amit Kumar Das. I'm also a senior software engineer at harness, and I've been contributing to litmus kios from past two years. And I'll be giving a short demo on how we can use our workflows with kios center to curate kios engineering. So let's get started. So I have installed the kios center and banking application, which is a bank of Anthos application on my GK cluster. So we'll try to target a few of its services which are like the balance reader service, the transaction history service and some of the services as well. So if I see like we have installed the bank of Anthos application in the bank namespace and we have few services running here. So from the kios center, we'll go to create a new scenario and here we can select our agent. And in the next step, we can select the kios hub. So kios hub is basically a marketplace where we can find a lot of experiments, which can be used as per a user's requirement. So we have selected the kios option. I'll provide a name here. So I'll be deleting the balance reader service of the bank of Anthos application. And here we can add the application and tune the same. So yeah, so I have added the pod delete experiment and I'll try to target the bank namespace along with the balance reader label. So this will actually target the balance reader deployment, which will basically delete the balance reader pod. And we should see that the balance should not be displaying here. So let's go ahead and click next. So here we can add different types of probes. So I'll just go with the default configuration. So I'll click next. And in this step, we can tune our kios experiment. So I'll just make some changes. And I'll just increase the total kios duration to 60 and keep the other environment variables as it is. And I'll just click next. And in this step, we can make some advance kios engine engine configurations. So I'll just keep everything as default. I'll just finish this. And yeah, so I've made some changes in my kios experiments that should be visible when I inject kios. And if I want to see the manifest so I can see that this is a basic workflow or workflow and we have provided several steps over here like install kios experiment step, the pod delete step and some other metadata related to these steps. So if I go to this particular step, the pod delete step, I can see that we have the kios engine and we have the target application details, which is the bank namespace and the labels that we are providing to target our application, as well as the environment variables that we have provided from the UI. So I'll just continue and click next. And I will have an option to create a schedule now or we can also schedule it later. So for this, we are using the Cron workflows, which is also provided by Argo. And we have several options like we can tune the recurring flow of the Cron workflow. And for the demo, I'll just create a current schedule. So I'll be scheduling it now. I'll click next. And this is the final step where we can see all the details that we have provided in the previous steps. So I'll just click finish. And yeah, so a new kios scenario is created. And if I click here, so we can see that the run has started and I can actually visualize the overall workflow. So in this workflow, we'll be targeting this balance reader service. So if I go here, I can see that the balance reader service is up and running. And if I go back to my litmus namespace where we can see that a pod delete runner has been created, which will actually target this balance reader service. So we should see that this balance reader pod getting terminated. And yes, it is happening now. And if I refresh this page, I should see that this balance is not visible. So yeah, so we have successfully induced chaos on this bank of Anthos application on this balance reader service. If I try to add some amount like and I deposited so this will be happening, but we will not be able to see the balance. So we have from an end user side, you don't want to see transactions going on, but you are not able to visualize or you are not able to see the overall balance. So this is how chaos engineering is performed using the litmus chaos and chaos center. And we are using the basic arc of workflows to curate chaos engineering. So yeah, so let's wait for a while to let this experiment complete. And we can see all the logs over here. So these are the logs which are available with the chaos experiment. Yeah, so we can see that the workflow is completed. And we have the overall logs here. We also got the chaos results like the phase is completed and the verdict is passed. We got the probe success percentage. And if you go back, we can see that the balance reader pod is back into its ready and running state. And if you go back to the application, if I do a quick refresh, I should see the balance coming up. Yeah, so the balance is coming up. So we have successfully injected chaos on one of its services, which is the balance reader services. And yeah, we can also try to inject chaos on some other services as well. So yeah, these, these all can be performed with the chaos center of litmus chaos. And so as part of the demo, like we have seen how we are using our workflows, as well as the Cron workflows to inject chaos on different microservices. And that was pretty much it from my side. So thank you, everyone. I hope you liked our lightning talk. And yeah, thank you so much.