 Hello and welcome to this webinar about chaos engineering with litmus chaos. I am Umar Mukherjee head of chaos engineering at Harness. Along with me my colleague Kothik is principal engineer at Harness. Together we have created litmus chaos project around four years ago which is now a CNC incubating project. Here is a agenda. We'll first look at why and what a chaos engineering and its relevance to cloud netting. And I'll also talk about how chaos generally matures in an organization that practices around DevOps. Then I'll delve a little bit into the introduction of litmus, its features, use cases. There have been a lot of learnings about how litmus is used by community and enterprises. I'll talk a little bit about the newly found use cases for litmus. And at the end, Kothik will do a demo of how to get started with litmus and he'll show an example of running chaos using litmus and connecting new targets, etc, etc to the litmus-based control plane. So let's start with some facts around IT that are happening today. So we all know that digital services are growing fast and the digital traffic is growing at an alarming rate in the last few years. And this is fueled by the digital transformation that's happening. And one of the reasons why the digital transformation is happening at this rate is the enablement given by Kubernetes. And the transformation of IT into digital world is actually fueled by the adoption of Kubernetes. Because of this, there is a change that's happening in DevOps as well. There is a subsection of DevOps that we are calling as Cloud Native DevOps, which is a little different than the traditional DevOps in that it's faster. The deliveries happen in a more automated way. The new set of tools that are happening, the automation of applying the configuration, upgrades, overall delivery, everything is happening fast. Service reliability is extremely important for businesses. So of course, when you have less reliability, it really means that you're facing service downtimes or outages, which is not good. The businesses can face financial and reputational damages. So we all know the importance of service reliability for the businesses. And in general, all services are built for providing maximum reliability in that they are built, the components or applications, microservices are generally built with redundancy as a goal. And even with that, we are seeing outages happening. And this is because however much you build carefully, there is always a new thing that fails and you're not taken care of that corner case. So outages will still happen. And that's why it's a matter of how many nines in the reliability do you have, 99.999 or one more nine to this percentage of reliability. So there are general metrics that have evolved over a period of time when you talk about reliability, mean time to fail, how often you fail. So the distance has to be bigger and bigger between the failures. And even when they fail, how fast you recover. So the service downtime can be reduced when there is a service outage. So this is the context of reliability with respect to the business and its reputation. So there is a need to increase the reliability or reduce the downtime. And how do you do that? And we have seen DevOps continuously focusing on various degrees of testing, various types of testing including functional system load failure testing. A well architected system with a good QA provides a good reliability or less outages. But there is a limit to which all these types of testing can guarantee the quality or reliability of a product or service. So the next option or innovation in increasing the reliability or resilience of a service is to introduce chaos testing or to practice chaos engineering. And what is chaos engineering? In simple terms, it is introducing faults on purpose and observing the system. If a fault can happen, it will happen. So why not do it right away? So you want to be able to increase your responsiveness to downtime. So let's actually introduce this faults to see if there are some weaknesses and then reduce the recovery times or fix issues. So in general, it's an iterative process. So in chaos engineering, you pick some systems or applications and then you pick a certain chaos experiments or scenarios and you run them in a focused manner with the low blast radius and then see if it is matching your expected results or it is matching your expected steady state hypothesis. If not, you have a learning and you go and fix that issue and then you keep doing it. So this is an iterative process. What it means is that you start small and then you start covering various services, various components within services and the degree of randomness faults increases and you start covering more complex faults that could happen with chaos engineering. So that is what chaos engineering is. And its relevance is becoming more and more in cloud native ecosystems simply because you have proliferation of microservices. Developers focus on their code within a microservice which is now a tip of a pyramid. Unlike in the legacy systems where you have tight integrations with your stack underneath or surrounding in microservices or cloud native world, you are operating totally independently. You are designing your code totally independently to run within a container with a microservice. So because of this, you are running your shipping faster and instead of shipping every quarter, we see now shipments happening every week or on a daily basis, sometimes multiple releases in a day. So in effect, you have multiple microservices to deal with and you are receiving the updates to this microservices at a phenomenal rate. So together, you are seeing more dynamism and how can you make sure that the service reliability is intact if there are any failures either within this microservices or the infrastructure that holds this microservices. And the answer to that is chaos engineering can help because of this increased dynamism, chaos engineering is becoming or has become a more obvious choice to deal with the reliability problem of cloud native business services. So in cloud native, if you're talking about chaos engineering practice, we've been advocating certain principles. Generally cloud native writes on open source, right? So your chaos engineering stack or solution could be an open source based one. And the chaos experiments need to be, you know, well tested, rugged, flexible. So they're better built, you know, through the community or put them in an open marketplace, right? And chaos experiments are like any other software code. So that really means that you keep changing them. There are various version, you need version control and management around chaos experiments. So it's better that you have chaos operators and chaos experiments as custom resources on cloud native. And then obviously the real problems happen at scale, so you need to ensure reliability at scale. So which really means that chaos engineering also has to be applied on a scale system and you have to increase the scale of chaos engineering as well, right? So GitOps is one possible answer to, you know, chaos engineering happening at scale. And the last one is observability. Observability is very important in chaos engineering in that you introduce faults and but you always observe through your established observability practices, right? So the context of chaos has to be placed along or on top of your observability system. So what it really means for cloud native chaos engineering is you better have open APIs or APIs that are easily available to integrate with chaos engineering or chaos engineering tools or framework should provide APIs to upload or, you know, pull the chaos metrics from the chaos engineering platform. So together these principles, it covers all the requirements of a good chaos native cloud native chaos engineering platform. So one such platform, of course, is GitOps chaos, which I'll talk in a bit. So before I do that, let's also see how chaos engineering happens, how it starts and matures within an organization. First of all, chaos engineering is relevant in any of the fields, QA, Dev or Ops. So it's not necessary that chaos engineering is only for Ops. So you can do like, write shift or live shift. And similarly, you start chaos engineering at generally infrastructure players, you know, with the simple experiments. And as you develop and get experience of chaos test running chaos tests on cross-sectional layer, then you go one level up into middle layer APIs, message queues or API servers. And then you go into more stateful services, databases and data services. And finally, your own applications, you can start introducing simulated faults, right? And when you reach that level, you can say that you have reached, you know, maximum level of chaos engineering in your organization, right? When you reach that level, you can generally observe if your language of chaos engineering, you know, the maturity is being dealt in terms of, you know, SLOs, SRE metrics, is it linked to chaos engineering or not? And developer metrics is also linked to chaos engineering or not. So as you can see, chaos engineering plays a core role in cloud native DevOps, right? And a good chaos engineering solution within DevOps or cloud native DevOps adopts the culture of chaos or chaos culture across all functions within DevOps. What I mean is in Dev, QA and Ops or by developers and SREs, you know, both are like practice chaos engineering. So developers run chaos in their pipelines and, you know, SREs run SLO validation using chaos and they introduce randomized chaos in pre-production or in production through game days. Similarly, the test systems actually go and introduce deeper chaos tests in their test systems and into their CD pipelines, pre-CD and post-CD. So when you introduce chaos engineering in all fields, there is a general maturity of deeper tests that get developed. And you're obviously validating the products being shipped, products being run for more error scenarios. So your services have become generally more resilient. It must chaos is a CNCF incubating project that has got high adoption levels by this popular companies and also it's got a large community around 2,000 users. The usage is growing. It has grown 30x in the last few quarters and we are seeing around 2,000 installations per day of Litmus operators. Let's talk about the Litmus use cases that are generally practiced from vertical standpoint. They are more used in banking retail and e-commerce services, which is directly related to the digital traffic. We have seen Litmus being used in edge computing scenarios as well. In general, the Litmus usage is more and more happening around wherever cloud native DevOps or commonplace. And the use cases can be in many different forms, primarily various game days to start with and then slowly get integrated into your CD systems, a pre-CD or post-CD and auto-triggered chaos for continuous deployment. And one more level of maturity is to include Litmus in CI pipelines themselves. And if you have a good scale and performance testing, chaos engineering can help or Litmus can help validating the current strategy of your scale and performance testing. And similarly, if you have invested in good observability systems around cloud native Litmus or chaos engineering that's based on Litmus can help validating whether your observability systems will really help when there are outages that may happen. So how Litmus chaos in general works, Litmus you can start very small and grow in a distributed fashion as it's a Kubernetes application. You start with a simple Helm shot when you install Litmus you have chaos center where users can go and create orchestrate and manage chaos scenarios. And when you install the platform comes with a bunch of experiments. These are chaos experiments for Kubernetes resources and you have some cloud chaos experiments as well. And it also comes with a good SDK and a way to introduce your own chaos logic into the platform. We call it as bring your own chaos. So you use all this chaos experiments and then you put them into chaos scenarios or chaos workflows and you stitch them however you want to make a meaningful chaos scenario. Once you have a meaningful scenario you can use it to trigger through GitOps or directly trigger or schedule it using chaos center and then apply it on any use cases. So these are the general chaos experiments around Kubernetes. The platform provides a good coverage of chaos for various Kubernetes resources and they are super highly tunable and if you know how to manage Kubernetes resources you can manage the chaos experiments the same way. So other very powerful feature of Litmus apart from various different experiments is the probes. Probes are a way to declaratively define your steady state hypothesis of a given experiment and you can use probes to manage the meaning of chaos experiment results or to implement SLOs and there are many types of probes like HTTP probe or command probe, Kubernetes probe or Prometheus probe and you can implement these probes in different modes either continuously or at the edge or whenever there's a failure happens then only you go and do this probe. So with these probes you can actually think of a chaos scenario and then declaratively define it and then keep moving. So this is how Litmus landscape looks when it comes to CI, CD and observability. In CI what is possible is in pipelines you can log into chaos center create a workflow that creates a manifest and include that manifest into a stage of a pipeline and we have tested this integration of CI there are tested examples available for Jenkins GitLab GitHub Actions and Harness drone. Similarly in CD you can trigger it pre-CD, post-CD or based on an event happening at the time of deployment. For example a pod version changes so you can run some chaos tests. So it's well tested with Argo Flaks pinnaker and Harness continuous delivery module and similarly on the observability side it's got integrations with well-known observability platforms. In general how you can do observability integration with these platforms is by using Litmus HTTP Pro. So you can trigger an HTTP request around chaos and start interacting with the observability platform to go and read the SLOs or verify the SLOs and then observing it in the observability platform. There are some new use cases that have evolved in the recent years for Litmus and primarily it's around Kubernetes becoming a configuration control plane like Google Anthos or Azure or cross-plane or even Kubernetes to manage open stack. So the usage of Kubernetes is increasing and that really means that Kubernetes becomes really really critical. The reliability of Kubernetes becomes critical but there are many users who are using Litmus to validate such a platform implementation by introducing Kubernetes faults. The other one is you will always have hybrid infrastructures even though you start Litmus with Kubernetes experiments you will see the need for an integrated chaos around bare metal or other switches, rack servers, load balancer, etc. And also nowadays you have multi-cloud scenarios or deployments where you're using various cloud services either database services or app services or serverless systems where you will start using different other tools in conjunction with chaos for example load testers. Chaos tool plus load tester together can simulate a chaos scenario for you. So these are some of the newly formed use cases of Litmus. So to summarize benefits of Litmus with Litmus or with chaos engineering you increase your capability to inject faults at will. So mean time to inject or identify a fault actually decreases your faster and whenever faults happen or service outages happen you recover faster your empty DR decreases and because of 1 and 2 you are more agile and you are fixing issues or weaknesses so the failures will eventually reduce. So these are the general benefits of chaos engineering and especially true with the Litmus together. But we have a service available for Litmus chaos engineering control plane as a service for users. You can get your own control plane by signing up at Litmus chaos cloud connect your targets and run chaos experiment and you know observe resilience or you know find an opportunity to improve resilience. With that let me turn over to Kotnik to do a demo. Hello everyone. With this demonstration let us take a look at how you can install the Litmus chaos platform and how you can get started by running a simple chaos workflow against a sample application and observe the impact of the chaos. My name is Karthik. I am one of the maintainers of the Litmus project. We will get started by installing the Litmus Helm chart on an EKS cluster and then go on to add another cluster this time on GKE to our control plane in order to carry out our chaos experiments. We will use the docs docs.litmus chaos.io to help with our installation. Let's copy the Helm chart related commands. Let's create Litmus namespace and go ahead and install the charts using the release name called chaos. Now let us watch the progress of the installation. Our deployments are up. We need to be accessing the chaos center. For that we'll just make use of a simple node port for this exercise. Let me identify the node IPs. Alternatively we could even use a load balancer. It is left to the user's choice. You could also use ingress along with certificates in order to access the service over a specific hostname with TLS. I'm going to be editing my front-end service in order to access my dashboard and obtain a load balancer path here. This is something you can do as part of your Helm values. We now have an external IP available to us. Let us go ahead and try to access this. Now that we have our chaos center up and available, the load balancer is working. Let us go ahead and log in with the default credentials that is admin and litmus. Each user in chaos center is allocated a dedicated chaos workspace, also called as the chaos project, which is where they will be performing the chaos workflows management, creation of workflows, visualization of the workflows, creation of new teams with the invited members, comparison of workflows, etc. We will take a quick look at what all is possible in terms of configuration in the chaos project. Before that, now that we've logged in, let us take a look at our dashboard. The chaos dashboard is the first page that greets us once we log into the chaos center. We do not have any workflows that are executed. This was our first login. But one of the most important things to look for and check immediately after logging in into the chaos center is whether we have a chaos agent connected. Like you're already aware, the litmus microservices can be logically separated as control plane microservices as well as execution plane microservices or components. We took a look at what the control plane services are. That includes the chaos center front-end, the GraphQL-based chaos server, MongoDB, and Auth server. The execution plane components are the ones that actually carry out and participate in the execution of a chaos workflow or the implementation of the chaos experiment business logic. In litmus, you could connect different clusters to the control plane and carry out chaos against microservices residing in that cluster. The entity that helps you to connect your clusters into the control plane is the chaos agent, also known as the subscriber. The subscriber takes instructions from the chaos control plane and executes them on the execution plane. That is the target environment or the cluster that you added. In this case, we see that a self-agent has automatically registered itself in our chaos center. It is an active state. What this means is that the cluster where we have installed the control plane microservices automatically qualifies as an environment in which you can do your chaos. In our demonstration, we would be connecting an external cluster which houses our sample application. This is the self-agent that's available for you. You could have microservices that you want to do chaos upon on the same cluster, and this helps you execute it. The next useful thing to look forward to in the chaos center is the chaos hub. The chaos hub is an open catalog of several faults or chaos experiments which you can piece together to form a chaos workflow. There are different types of experiments, totaling 50 experiments in the chaos hub. The chaos hub is also the place where you have some demo or illustration workflows that you can use to kickstart your evaluation process of the chaos project. The experiments have been categorized based on their nature and based on the environment they can be applied in that includes generic Kubernetes faults, faults on AWS using the SSM or the AWS system manager, faults executed against GCP environments, Azure, VMware, as well as certain experiments crafted for popular applications like Kafka and Cassandra. The chaos center also provides you with a view into how your workflow execution has taken place over a period of time. You can compare workflow runs once you have them to see whether your resilience trends are on an upward curve or whether they're falling away or actually needs to be taken. The settings page in chaos center is useful to manage your account to create new users on the platform. As an admin you could create new users and you can also create teams in each project where you invite members that are already created on the project. That is the admin can go ahead and create different users providing them with username and password. The respective users can log in with their credentials and once they get into their chaos project they can come to settings, go to the teams tab and invite members into their own project with specific role. So let us provide RBAC at the chaos center level. The users are classified as owners, editors and viewers, each with a desired set of permissions in managing the chaos workflows. The other option that you have on settings is to enable the GitOps feature. You could store the workflows that you create on the chaos center locally in Litmus which is the default option or you could choose to push these workflows into a Git source. The Git source can be public or a private Git repository and whatever workflows that are constructed on the platform will be committed into the Git repository and there's also the option of changing your workflow specs on this Git source and have them reflected for future iterations of the workflow run on the cluster on your chaos center. The repository used for pushing your workflows can be the same as the one that powers your chaos hub or it could be a different repo altogether. Talking about the chaos artifact source, the chaos hub, you could create your own chaos hubs coming from your own Git repositories. It is expected for these repositories to house the chaos experiment artifacts in a specific folder structure, the details of which is available in the documentation. You could create again a public canonical source for the chaos hub or it could be a private chaos hub. In case you're operating out of air gap or disconnected environments or have restrictions on how you could consume artifact sources, you could be using the private chaos hub feature to create private Git repositories in your own environment and pull them onto the chaos center dashboard. There are a few other settings that are available on the chaos center typically used as part of the data operations as part of a chaos practice. For example, you could influence where your images for the chaos workflow pods are coming from. You could be using the default docker.ivo registry and the litmus repository or you could be using your own registries where you pull in the experiment and workflow pods. The usage statistics provides you with a very quick view of the execution as far and you could actually see how many experiments you've run, how many workflows you've run, how many users are there on the platform, etc. This is something that is useful for the admin to generate some helpful reports. With this introduction to the various screens possible on the chaos center, let us go ahead and connect our target environment. We spoke about a GKE cluster on which we have our sample app residing. So let us connect that to the chaos center, let us add that to the control plane, then begin with our first chaos workflow. We talked about the agent that is helping us to connect external clusters to our control plane. The installation of the agent on a remote cluster today is aided by tool called litmus ctl. I have just opened up the readme page for the litmus ctl on the git repository litmus chaos slash litmus ctl. We're first going to go ahead and pull this particular binary. You could do that from the release page or you could just use the table on the readme to pick the litmus ctl binary for your OS industrial. I've just pulled it already and I'm going to go ahead and follow the steps for the interactive execution of this CLI to set up my agent. So the first step here would be to create your account or to set the account on the working space that you have where you have installed the litmus ctl tool. So let me go ahead and perform the litmus ctl set account step. So this is going to help you with setting up the right keys and the auth in order to access the control plane with your credentials for your project. We're going to provide the endpoint where litmus is running. Let us take a look at our control plane endpoint. So this is it. The next step is to provide our user detail. I'm trying to perform all these operations as admin. So I'm going to provide the admin credentials. Now my configuration is set. You could take a look at the configuration in the litmus config file and see that there are some tokens available here. My next step would be to create the agent. Let us take a look at the instruction for that. Litmus ctl create agent is the command. So let me go ahead and create the agent. We're doing this in interactive mode. So it asks us for a few questions. I'm going to select the admin project. I have the option of setting up the agent in a cluster-wide mode or a namespaced mode. By following the cluster-wide approach, you will be able to run chaos workflows targeting microservices that reside across different namespaces on your cluster, which is ideal if you have autonomy over the cluster. Or you could be connecting it in namespace mode, which means that you will be able to target microservices only that are running in the namespace where the agent is installed. And this is helpful when you're running in shared environments where there are folks, users having their own test environments in different namespaces. I'm going to select the cluster mode. I can provide some metadata about the agent and the cluster where we are installing it. I've just gone ahead and provided some information. It asks whether we want to do some checks against the control plane in terms of SSL. It asks whether we want to install the agent in a specific node. We do not have these compulsions at this point. And we are going to install the agent in the default namespace that's with us. Using the default service account, call it miss. You could choose an existing namespace and an existing service account if you would like to do so. It provides us with a summary of the options that we selected with that. Let's go ahead and install our agent. The agent installation progress can be tracked either on the console or you can also go ahead and take a look at the key of center and check whether our agent is active. It is in pending state right now. The deployments are in place. We're just going to wait for them to be active in established connectivity with the control plane. They're all running right now and we should have this active in a few seconds time. The subscribers, as you can see, they're installed with a set of other components, most of which are Kubernetes controllers, the KS operator, the event tracker and workload controllers are custom Kubernetes controllers that act upon different custom resources and participate in the chaos execution process. The exporter happens to be a a prometheus exporter that provides chaos metrics. Our agent is active right now and this means that our demo environment is good for taking chaos. Let's take a look at how you can construct chaos workflows now. Now that we've connected our agent successfully, let us go ahead and create our first workflow. Like we said, this is going to be a simple hard kill chaos experiment, along with an availability check against the website, the potato head application website. Let us choose our agent. This is the agent that we just connected. This is the GKE cluster named demo. We have different options for creating workflows. We could be using a predefined workflow. We could be cloning an existing workflow template or we could be importing something that is handcrafted. And when I said predefined workflow, this is the one that is available to us from the chaos hub, which is where you might have pushed your workflows earlier. Now, since this is our first workflow, it is constructed by picking defaults or the experiments from the chaos hub. I've selected the default embedded chaos hub. We'll just name it as a custom workflow. We will pick an experiment. You could search for the desired experiment or the fault here. I'm just selecting the part delete one. And now we'll go ahead and tune it. This is some context or metadata around the experiment. I am looking for a deployment in the default namespace with the label that is name is potato head main. You could see that it provides you with some asset discovery features where you can see the microservices that are available on the system. I'm going to select a deployment that carries the label potato main. And this is the microservice that pieces together different components and gives you the misreportato rendering on the webpage. Now that we've selected the microservice against which we want to do chaos, let us go ahead and add our availability check or the constraint. The hypothesis validation as part of experiment runs can be carried out using something called as props. We're going to make use of an HTTP probe for this particular demonstration. And we're going to be running this probe in a continuous mode. We're going to set up a polling interval of one second. We're going to be looking for the availability of the website every second. These are some properties that you could provide. I'm providing minimal values. Just one second for the polling interval. Just one retry if the response code is not 200. And we're going to begin this a second after the experiment execution begins. So you have an option to abort the workflow in case the constraint is not validated. You could choose the stop on failure to be true. We will set it to false for this run. I'm going to provide the URL of the potato head application in my probe. And I'm going to provide a pretty aggressive timeout. We're going to be using, this is in milliseconds, by the way, and we're going to be using a get operation. We're going to be looking for 200 response when we hit the URL. With these conditions, let us go ahead and create our probe. I've added my probe. I'll take a look at the other tunables supported by this experiment. You could run this experiment for a total duration of 30 seconds with a 10 second interval. That means the pod is going to be picked and killed every 10 seconds up until the total chaos duration of 30 seconds is the upper bound. We just want one iteration of chaos. So I'll just probably change the duration to a much smaller value. One aspect to note is that the duration here talks about the chaos duration alone, not for the entire experiment. The experiment in litmus performs a pre-chaos and a post-chaos check to ensure the system is left in the healthy state. And that takes a few seconds up and over the chaos duration that you have specified. You could also add more environment variables depending upon your need. Let's finish it and keep it simple. We have the option of cleaning up the chaos resources immediately after the run and we have the option of keeping them as well. In our case, let's keep them because that helps us look at the experiment pod logs. Let us click next and provide a weightage to this experiment. This helps with the calculation of a resiliency score that you can use to compare workflow runs. We're going to give all points to this workflow as we have a single fault or experiment chosen. We have the option of scheduling it repeatedly or scheduling it once immediately, which is the option that we will take. This is a summary of all the options that we selected while constructing the workflow. Now let's say finish. This triggers the workflow. So you have the first entry in the workflows page right now. Let's take a look at what is happening as part of the workflow run. You could use the workflow visualization draft to track the progress of your chaos workflow. For each experiment that you picked within a workflow, you would have typically two steps performed, so though that is easily customizable. The first step installs the fault template from the chaos hub. This carries low level details of the fault, such as the images that are brought up to run the part carrying out the fault business logic and other such details around the fault. And then the next step, which is what you see right now called as part delete is when the actual chaos invocation happens, the chaos engine resource, which is the user facing resource of it was, which is where you might be going and adding your probes, adding your durations and a lot of other tunables that is going to be created. And the custom controller called chaos operator that we just saw when we set up the agent is going to pick up the chaos engine and run the fault. Let's move to the lens dashboard to see what's happening. You can see the potato main application has been killed and a newer one is coming up. Let's go ahead and look at what's happening on the web page. I've just gone ahead and refreshed it. It is to be noted that this particular application carries an init delay that is it is going to take some time for the application to be ready. Now it has come up in fairly quick time, but let's take a look at what happened as part of a run in the dashboard. You can see there was a slight dip in the probe success percentage. There was a small spike in access duration as well. The red area that you see on the Grafana dashboard is coming from an annotation that we added, the annotation source being a metric that is provided by the chaos exporter. It's available in Prometheus. So we have the metric here. It is the litmus chaos, plus the scoped experiments. There are other metrics useful that you can use to construct your own dashboards. We've used that to find out the exact period when the chaos has been active or the experiment has been active, rather. When the experiment was active, you saw that the probe success percentage dipped a little bit, the access duration spiked, then it has come back to normal state. We had set a threshold on our access duration, which is why you see the broader red area, the moment it crossed 25 milliseconds. Now let's take a look at what's really happened on the workflow. As expected, the workflow has failed. Our initial hypothesis, well, we said our initial hypothesis is that there should be no dip in the probe success percentage and no spike in access duration. That did not happen, but we did brace ourselves for this problem because there was just a single replica of the potato main microservice. You can see as you scroll down in the logs, the probes have failed. We did have an aggressive check against our application availability and that failed. What is the mitigation? You could be scaling up your potato main microservice to multiple replicas and repeat the same fault, in which case you will have the workflow succeed as per the initial hypothesis or the desired case where you would not see any downtime on the probe success percentage or access duration. Once you have the workflow runs executed thus, you can go to the observability section, take a look at the workflows, compare them and download reports. The workflow observability section here is helpful in checking how your application holds up to a particular fault across different builds or different environments or different releases. This was a very quick execution of a simple chaos workflow on a sample microservices application, which is residing in a remote cluster that we connected using the Litmus CTL. We took a look at how you could use the interleave dashboard and Grafana, use the Litmus metrics to see when the chaos is actually active and how you could go ahead and observe details of that run using the observability page, using the analytics for the workflow here. So that is a wrap up of the Litmus demonstration. Hope you get a chance to try out this tool and do provide us your feedback. Thank you. So that's a good demo by Kotik. Thank you Kotik and thank you very much for watching this webinar and we are reachable at Kubernetes Slack Community and the Litmus channel. Please do reach out to us. Thank you.