 So today, this talk is specifically about cultural shifts and how you can leverage chaos in platform engineering, which is something we wanted to tap in me and Raj. Unfortunately, Raj could not be here because of some issues. But rest assured, he has provided a demo recording of a few, like, a five minute recording. So he'll be talking and going in depth about the architecture, how he's using it, and stuff like that. So we'll see all the good stuff. But let's jump right into it. So a little bit about me. I'm a senior software engineer at Harness. And also, I'm a engineer at the project called Litmus, which is the CNCF incubating project. And Raj is a senior enterprise architect at FIS. And he has more than 20 plus years of experience. So definitely somebody you would want to listen to. Now moving on, the agenda of this talk. So we'll be, of course, you guys are platform engineers. So I'm not here talking about the basics of platform engineering, but we'll be covering the core components of IDP, talking a bit about the cloud native problem, which is why we are introducing chaos engineering, the what and why, the chaos-first principles. And then we'll see how we can introduce the chaos in an answer demo and talk about the vision and the tools that are in the market that you can use. Of course, the tools are just an abstract view of things. You can integrate your own tools. It's not vendor-specific or tool-specific. And then we are going to execute chaos, observe the impact on a Grafana dashboard, hopefully, if the Democrats are happy. Now let's talk about the problems the cloud native era has on all of us, right? So we are basically running our, we have shifted from a legacy, a very simple architecture, to something as complicated as cloud native. So our DevOps course can be self-service. They are policy-driven. We have zero trust environment. So much problem, so much extra overhead has come into our manufacturing of a single software, which used to be so simple, and has so many components. So this leads to something we call the DevOps problem, which is you shipping your containers, you shipping your applications 10 times faster. And for that, to manage everything, we introduced something like platform engineering to have everything in a single layer, right? So the core components of an IDP, these are all, these are some of the high-level components that I've mentioned, which is the application configuration management, the infrastructure orchestration, the environment, deployment, and role-based access control. So these are the basic pillars that you might have in your IDP. And based on this, you might be extending it to doing individual things on a higher level, on a granular level for each of these core components. So this is something we are already familiar with, and it comes to platform engineering. But what we are not is how we can grab this entire scenario in the form of a chaos engineering delivery model. So the cloud native problem, like I mentioned, we, from legacy DevOps, we are moving into a cloud native DevOps. And we are shipping not every quarter, but every week, which results in 10 times of more of my services being shipped, 10 times faster, and thousands of different environments. It's very easy to mess out on exactly each and individual layers. So you might not be able to test everything. You might mess out here and there, and that might be the reason of your outage, right? So if you see the pyramid, so you have your application, you have the other layers, like your Mongo Kafka, your application, and then you have the cloud native service layer, the Kubernetes layer, the platform layer. So not everything is tested. We should, but we don't get enough time. So the solution to this, or let me actually tell you why we even need this. So it's down times, right? We hate down times. We don't want down times. This is what it causes to our customers, to the people, to our users, and we definitely hate it. So these are some examples. Of course, I don't want to be harsh, but these are some real examples. So we have loss of customer confidence, some damage to integrity, lack of self, like loss of self-employed confidence, and yada yada, a lot of things. But this is something we definitely don't want to do, and down times are something we want to avoid, which is why we want to introduce chaos engineering. Now what is it? I'm sure you guys have heard of it. It's a practice we do to deliberately break our systems, typically in production. It doesn't have to be in production, but something we do to ensure that it can withstand unexpected disruptions. So you see this model, right, of chaos engineering. What we are saying is this is the core principle. We want to test some, we want to select the chaos experiments to test our applications. We want to run a set of targeted experiments, observe the impact, use the learnings to make our application more resilient, and then select the systems back to test. And this is the loop, right? So one, two, three, four, five. This is the cycle we keep on doing, and these are the core principles of chaos engineering. Now what is the chaos first principle? So this talk is about the chaos first principle, right, because we might or might not be doing chaos engineering. We are definitely doing platform engineering. So what is this principle? It's a principle in the context of platform engineering, which advocates that you go chaos first, rather than being afraid of it, what might happen to my system. Should I even break my production? Should I even do it? It's costly, and so many things, right, to not actually go ahead and proactively do it. So this principle advocates that you design your system with the expectation that failure is the fundamental cause, and that it's bound to happen. So we want to deliberately introduce chaos and disrupt the platform infrastructure to proactively identify the weaknesses. Now this, we are coming back to the platform engineering layer again, once again, the core components of the IDP. Now chaos engineering could be introduced in every single segment of, every single component of platform engineering. For role-based access control, you can check for who has permissions of doing what kind of chaos. You can check for your application configuration. This is, of course, the most common use case of checking your application configuration. You can also check if your environment variables that you're passing as a part of the platform are actually going or not, if there's some disruption there, if there's a missing end which can break your system. Of course, it's infrastructure. You can target your VMs, bare metals, communities, execution layer, things like that, and you can also control your deployment. So in each and every day or each and every pipeline of platform engineering, you can introduce chaos. So this is the idea I wanted to show. Now, how it plays a pivotal role. So chaos engineering in the context of platform engineering has three main pillars, which is capacity, planning, and scaling. So by doing this, you can understand, you can introduce the control chaos and understand what is the max you can go, at till what point you can expect failure. What is, how far, how further you can stretch it. And also, it's a general idea that you are shifting your system towards resiliency by adopting this chaos-first principle. So by doing this, you naturally embrace failure as a part of your system, rather than being afraid that, oh, I had an outage, what do I do? So the last is continuous improvement. So this promotes a mindset of continuous improvement within your teams, rather than being scared the day it happens and you're running in a, you know, set zero. So yeah. Awesome. Now these are some of the tools we'll be using. This is of course just for demo. You don't have to use it. If you have your own service, on tool, go ahead and use it. But yeah, the two tools we'll be using is Backstage and Litmus. So Backstage is an open platform for building developer portal. As you might have already known, there was a Backstage Conf yesterday as well. But yeah, Litmus Chaos is the other platform, which is a CNCF incubating project. And Litmus Chaos is actually open source. Both Backstage is also open source. And you can use them combined to reconcile the idea that I just talked about. By the way, this is just an example. You can also use other tools and also use open APIs if you have. But yeah, that's the idea. Cool. Now let's actually hear from Raj. Hello everyone. My name is Raj Uadha Raju. I'm at work at TechU with FIS. Sorry I'm not joining you in person. Some family commitments, so I'm not joining. I want to share with you some of my thoughts, what should we hope you'll have in the great session at Kibkyan Goodluck signing with your presentation on this topic. So the couple of topics that I want, for thoughts that I want to share is the vision that we have at FIS on how do we integrate chaos into platform engineering. As you can see on the screen, there are four pillars that we can imagine or envisioning. One is define and execute chaos experiment, which is very basic and foundational in the sense that you want to define, you want to define your chaos experiments that fit your needs and execute them probably manually. The key thing is you need to identify appropriate scenarios what fits your application needs. In our case, we have wide variety of applications, cloud native legacy applications, applications that are close running on Linux platform, Windows and mainframe, et cetera. So what and different kind of banking applications that we have. So it's important to define and execute mainly the definition of those experiments. And the second step in the journey that we envisioning is you want to offer this chaos as a service so that it will become self-serve-sable, easy to enable, disable for your applications or platforms. And the third step is once you have that, level of automation or enabled chaos as a service, you want to repeat this chaos engineering in regular intervals, right? More or less, you want to make the repeatable process. How do you do that? You want, our idea is to integrate chaos into a CICD platform so that with a push of a button, we should be able to kind of execute chaos experiments. So we want to essentially make it the repeatable process and automate it. So that's the goal of that tile or the goal of that. And the fourth step in the journey which is integrating to CICD systems. Then the fourth pillar, which is kind of underlying important pillar that we think is enabling appropriate observability and make the chaos evaluation automated, right? So without observability, you cannot kind of clearly measure whether you're a chaos experiment whether it's access or failure. And also that evaluation that you do, that needs to be automated so that you remove this tile of the manual effort that involved in chaos engineering so that you can scale chaos engineering across the organization for hundreds of applications which is a case with FIS, right? It could be hundreds of applications or hundreds of components within applications. That's kind of the larger scale that I'm talking here. So this is the vision that we have and what we have done is we put together a blueprint architecture to realize this vision. So that's presented here. What I have got here is there are five, six components that is depicted in this blueprint architecture or ecosystem. At the top you can see we have a CICD pipeline which we envision with the push of a button you should be able to trigger the load. You should be able to trigger the chaos experiment and then you should be able to evaluate those chaos experiment, whether the chaos that you just conducted, whether it's access or not. And neo-load on the top left, you can see neo-load or jmeter which is a load generator tools are integral part of this ecosystem because several times the resilience issue that you face in production environments they happen under load. And what we want to do is we want to simulate that type of behavior in pre-production environment using the load generator such as neo-load and jmeter and inject chaos experiments while applications under load so that you understand the resiliency of the application. And you have on the bottom you can see the observability tools which are again are the integral part of this ecosystem. You have denitrace, Planck, Prometheus currently that's what we support or there may be other tools that may come along the way. On the right side, you have the Litmus which is a chaos engineering tool that we have in the toolbox and we have a chaos blade also that we are using in some scenarios but Litmus is a kind of a primary tool. Litmus is our chaos engineering tool where through which exposes APIs through which we inject the chaos experiment into the application target app which is depicted on the bottom left corner on the screen. And we use Captain, Captain is a CNCF project to evaluate this experiment. The way we are envisioning is we have a pre-chaos phase during chaos phase and post-chaos phase and we have hypotheses for each phase that we feed into Captain in the form of SLIs and SLOs so that once we conduct the chaos experiment we ask Captain, hey Captain, go ahead, here is my five minutes, it's my pre-chaos period, it's a five minute I injected chaos and five last other five minutes is like my recovery period, right? Think of a 15 minute load test scenario, right? Then I tell Captain, these are my time periods and here is my SLOs definitions on how the resources or other things should look like such as golden signals, that the error rate, response time, throughput and things like that, go evaluate it, right? And then Captain goes, talks to the downstream observability tools, pulls the metrics and gives me a pass-fail scenario or a pass-fail Boolean value and that I can bake into my CI CD pipeline which is depicted on the top making the decision whether a particular application was success in terms of chaos engineering practice and is it ready to deploy to the next phase, right? So we want to automate this end-to-end life cycle so that chaos engineering can be a repeatable process. We have most of these pieces together in environment but we don't have that end-to-end pipeline built yet which we are in the process of building that, we are very close to building one and kind of repeat that pipeline across hundreds of applications, excuse me, that we have within FIS. With that, thank you, thanks for the opportunity, thanks for listening to me, hopefully this is helpful. Thank you. Awesome. Hello everyone, my name is Raju Adirajan, I'm a state officer. Okay, let me move on a second guys. Okay, cool. I don't want to do a demo without creating the guys who are awesome. So this is Namkew Parg, this is, he's been a contributor with us from LFX and he's been great. So Namkew, if you're watching this. So Namkew has been good at backstage, particularly for Litmus, because backstage is a platform which we were planning to integrate with so that we can enable this observability or like a platform engineering side to it. So he has been proactive in suggesting backstage plug-in Litmus, it's already published and you can go to this backstage plug-in issue link or you can just get up and you should see that. So what we have to do now is before jumping in, these are the three important, what do you say? Configurations of backstage that we need to add. So in this case, in the app config YAML which is the top left, so the app config YAML is where you add your Litmus, where you are definitely after deploying it and then you have the Litmus auth token which you need to add in order to, in order for backstage to figure out that this is the product you're using. Another right, you have the entities YAML, this is where you can add your entities of Litmus that you can use to visualize and see things. So you're seeing a kind of a GIF, but I'll show you this live. But yeah, these are two main things you need to do in order to actually use backstage Litmus. Now, let me come over to backstage which is here. And once you do that, you should see something like this. You have a backstage Litmus demo. Of course, my company catalog is a generic thing. It's your company, it's your org and then you can add the Litmus component to it. Now once you do that, you would see something like this. This is the overview of the platform we are integrating. So in this case, it's Litmus. It's just a bare minimum. So you can of course extend it to as many things you wanted to. You can add your owners, the systems, the tags and things you want to connect it with. Now, right now I have Litmus deployed already. So you see there are things like Chaos Hub, the intras, how many intras are connected, the experiments and if a GitHub is on or not. And you have some subcomponents if you want to introduce those. But the interesting part is you see this Litmus tag, right? This Litmus tab specifically. So in here, you will get all the dev information. So things like the experiment docs, the API docs, the Chaos Hub, the community, things about the community. If you have more than one Hub, you will get it here. You can see the environments and which are Chaos Intras it's connected to. And at the bottom, you will see all the experiments that have been run as a part of your Litmus pipelines. So this is straight out of Litmus, but you are seeing it all in backstage platform itself. So you can see how many experiments ran, what was the resilient score of it, who ran it, executed how many hours ago, and minutes ago. So everything that's Litmus related would be shown to you in a single place without actually you having to go and out of the platform and do it here. So you can even run it from here if you click the run experiment. It'll actually be live and you can see the pipeline. But yeah, I think with this, you can also go to the Litmus execution directly and you can see what went wrong or what went right, what was the log, what was the probes, and things like that. Cool. So what we have for the demo today is actually this application called online bootycap by Google Cloud Platform. So this is the architecture. It has, it's a demo of microservice application by the way, but we are mainly targeting the cart service which is at the bottom right above the redis cache. So it has, it looks something like this. It's a live cart application. You can add your products, you can add to cart, and they should be added to the cart. You can do checkout. So this is just a simple microservice application. Now what we want to disrupt right today, right now, is this cart service. So how it typically looks like is something like this. So this is our boutique application services that are running. In here, I have the cart service which has a specific label called app equal to cart service attached to the deployment. So I want to target this specific pod with the label so that it goes down, and our application is actually down in chaos. So this is a very straightforward example, but you can, of course, elevate it further and use your own use case, add your own use case, create your own hypothesis. So these are all the components of Litmus. Once you deploy Litmus, you will have Mongo for high availability. You will have your front end, the GraphQL, the authentication, and then you have Event Tracker, Operator, Chaos Exporter, and Controller. So the Event Tracker is used for GitOps. Exporter is used if you want to export your own metrics. Then you have Operator, which you can use to, Operator is actually the guy who is using injecting chaos via Chaos Runner. It's checking for if your application labels, targets are actually present or not, things like that. And that's all the good stuff. With that, we also have a monitoring setup right now with Prometheus and Grafana. This is optional. You can choose not to do it. But we provide it out of the box. So if you install it, we have a utility already in the repo. So if you install it, it should be available. So the repo looks something like this. So it's github.com. Litmus Chaos slash Litmus. There are other umbrella projects in the Litmus Chaos project itself, but this is the main one. So if you go down, you can see this Chaos Center. And there's a monitoring. So monitoring is where the Grafana is there. And Chaos Center is where you will find all the information regarding installation. So you have the Bitnami Mongo. And you have the manifest which you can use. You can also do it via Helm. It's possible. So you can go to the docs.litmus.cho to see all of this. But yeah, let me just jump right into the demo. OK, cool. So before we move into doing one of the part deletes, this is the API setting where you can generate an API token from Litmus. And then you can use this token in Backstage to connect it. But yeah, if you go to the application or the account settings, you can use this new token and generate one token and use it at Backstage. Now let me jump over to Litmus. So this is what we have. This is exactly what we saw at the Backstage platform 2. Now what I want to do is inject chaos. Let me create a new chaos. We call it cart. Disrupt. I'll be selecting my infrastructure in which case this is the chaos infra, which I've already connected. And because of this infra, you are actually able to see the, you know, you saw the workflow controller, the subscriber, the operator, things like that. So all of the dependencies actually come up when you install infra. Now I want to select the blank canvas. And I want to apply the individual fault that I want. So in here, you can see a list of faults. We have AWS, SCR, GCP, Kubernetes. So I'm going to just select the Kubernetes one. By the way, these faults are completely open sourced and also the repository is free. It's a chaos hub. You can go to hub.litmuskias.io to find this. I'm just going to use pod delete to keep it simple. To target it, I'm going to use a type of deployment with the namespace of boutique, where my application lies. And the label is app to look at service, like I mentioned. You can tune the fault to make it, let's say, a little bit longer. We can see the result. I've already run it, so won't really matter. But in the probes, this is an important section, like an important thing. We have added with 3.pro, which is resilience probes. So this is like a pluggable check that you want to inject in your chaos. But not as an optional parameter, but rather than a rather important created as a global instance and use it anywhere in your experiments. It's not like you are using a fault or you are using an experiment. It's optional. You can go ahead and add a probe, but rather it's necessary because we want to target or we want to achieve pluggable checks in such a case that it makes sense, rather than just have probes for the sake of it. So in this probes, it's an HTTP probe. It's just checking for the URL, and it's checking if it's healthy or not. So this URL is the URL of the online booting store. You can also change it to the FQDN link of the service you are disrupting or anything. So it's really extensible, and this is kind of a principle that you want to advocate, that it's a resilience probe-first approach. You create it first, and then you can add it to any and as many experiments you would like. So let me add this. These are the different modes of adding. So it's start of test, end of test, edges both. The check will happen both before and after. Continuous on chaos. So I'm just going to do SOT. I'll apply, and I'll save. So once I save, I will click on Run, and the experiment should start running. Now let me go back to the terminal. So here you would see some chaos operations already spinning up. So in this case, let me just move it aside so you guys can see it clearly. So you would see the logs of the application execution on the right, and then you can see the terminal here. So it actually spawned up this guy, card disruption. And we are targeting this, which is card service, which is running for 152 minutes. We'll soon see it go down because that's the disruption of the hypothesis we planned to do. Now once it goes down, our well-functional application would unfortunately not work. So that is the outage we are planning to do. So this could be anything. This could be a different kind of scenario for your case. You can inject latency. In a platform engineering scenario, you can induce latency in your database. You can induce disruption in your infras. So things like that, things which are specific to your use cases, you can do it via this application. As you can see, it terminated and restarted for in like this four seconds, seven seconds, and if I go and reload, it should not work. And that's that. Chaos happened. So you can't really do anything. This is a disruption that we simulated. Now this is safe because you're doing it on your own, not an actual production environment that is downright. So now you can take your time, figure out what went wrong, look into it, see the solution, and then once you are ready with the solution, deploy it back and check again. So this is the power of chaos engineering. And it is doubled with the combination of platform and chaos engineering, which you're doing everything in a single run. Awesome. So that's the demo. Let me show you how the final thing would look like because it would take a bit of time to run. And you can see the same thing would look something like this. So you would get verdict from the Argo controller and from the chaos results here. It would say that your URL responded with the actual code 200, but expected was also 200. So it passed. And in case the combination of our fault and the probe success percent results or accounts to our resilience score. So you would, of course, get a resilience score at the end of your experiments. You can see this is 100%. Because everything was successful. Probe was also successful. But if not, then you would compute that accordingly and give you a resilience score. So yeah, that's it. That's all about the demo. One last thing is this monitoring setup. So you can see that the money will be fresh that once because we are just doing some chaos and we should see some annotations. Oh. All right. So you can see some annotations coming right up on the right side. It's just starting up because we are still doing it. But I ran a few. And you can already see the historic data of that. So you would see that the card QPS and everything goes down. We have annotations. So it's always useful to see this kind of data when you're monitoring it so that you can exactly see what is wrong. And you can also use the Prometheus probe in this case to scrape the metrics and then do something out of it in the Prometheus concept with the promq query, of course. So yeah, that's all about the demo. Let me jump back and talk about the future vision. Cool. So what's the roadmap for us in the future? So what are we planning? So there are four pillars to the roadmap that we are actually looking forward to in terms of open source contribution and how we are envisioning it in the long run. One is the maturity model. We have to define the maturity model for chaos engineering and platform engineering context. Of course, there's the maturity model for chaos engineering. But it's not there for chaos in platform. It's a relatively newer concept, which a lot of us has to accept first in order to go ahead and start doing it. Second is the industry standards. So we have to, as a community, contribute to the development of industry standards on fostering a chaos platform, chaos first principle platform. And of course, have to make people understand that this is important and it is something we should adopt. Third is defining card rails. This is an interesting point because oftentimes we question the permission requirements or the RBAC requirements for chaos. Like anybody could go and break something, who is stopping you. So this is where we need to define card rails and chaos and ensure that it's safe and compliant and not everybody has the right permissions to do everything. So you need to provide a safe environment either through a different staging environment or in the production give certain RBAC or certain context to some specific developers to be able to do it. So this is something we need to create and make a different solution out there for. So it's open to the community. And then we have a budgeting problem because chaos is very expensive. Of course, if you're doing a hog or a CPU hog, memory hog, node drain, there's toile in your infra. So your infra will overload spiking your chaos resource consumption and your budget. So it's definitely an expensive thing to do. So we need to be more mindful and approach this as a framework rather than an expensive experiment exploration. So yeah, that's about it from my side. Thank you guys for joining in and listening to the talk. You guys can scan the QR and leave some feedback. Thank you so much. Awesome. I'm open to taking questions, but before that, I also have one of the maintainers of Litmus with me. So I would just like to introduce Shubham to join me and we both can take Q&A's. Shubham. Cool. Will you guys have any questions? We are here. If not, it's fine. I mean, I don't mind. We do. Hello. I've got a question. Over on your right. Where are you? On your right. Hello. OK, hi. You showed the Litmus stuff running kind of part of a CI CD pipeline. And so you had what I'm guessing is like an existing environment setup where the application would be deployed and then you run all the stuff in. How efficient, at least that's my understanding, how efficient do you find that? Is that affecting like lead times of how much stuff gets through CI CD? If stuff fails chaos, do you still deploy it and then just be like, this isn't massively resilient, but it'll be fine? What's the attitude and setup around that? So first thing is not actually a CI CD pipeline. We have made it look like it's a CI CD pipeline, but it's not. It's actually your control plane. So you're deploying components to your plane, and then you're actually executing chaos in there. So you have your dynamic execution plane. You can choose which cluster you want to run it against. But the idea here is once you have a use case, you want to go for it and see the result with the chaos result CR that you get. It might not be, it might give you the exact log or the exact fault scenario that you planned out and you might say that this isn't very effective. So you might need to change it to something else. So you generally do, bodily it is a very simple example, to be honest, but you generally do things like node level disruption or something that might actually go wrong in case of a tremendous load. Like Raj mentioned, he used a Jmeter or a load testing tool on top to simulate the exact kind of behavior. So let's suppose APIs are completely open. So you can use the API as well as your own architecture and then plan it out accordingly. So that way it's more meaningful for the individual use case. Yeah. Sorry, you probably have. OK. Do you test in production? Or do you stick it on test environments? OK. So that's interesting. We do test in, not in production. We have it in production. We don't use game days. But we test in pre-prod. Generally, that's the safest that we think it is. Because you definitely can break stuff in production, but. You know. Cool. I probably have more questions about litmus, but I receded my time. Sure, go ahead. Thank you for the talk. How do you recognize the experiment in your traces? And you mean the observability side? Yeah. So. Yeah, but how can you distinguish if it was your experiment or a real failure? Can you? I mean, you want to specifically see who ran it, who ran the experiment in the trace. Is that it? Or you want to just see the result or what went wrong in the experiment through the trace? Let's say by the end of the month, if you look at all the failures, and some of them were your experiments, and some of them were real failures. So can you find that in traces? Yes. We do have a general availability pipeline here at the top. But yeah, you can customize it. You have a, not this, the group success percent. But I mean, in this dashboard, it's not shown. But yeah, you generally have, you have to create your Grafana dashboard such that you can actually count the number of failures. And you should not keep it the same as your production. I mean, your production should have a different dashboard, definitely. And the platform your production. No, you definitely can do it on production. But a pre-product generally, a replica of your production. So you can definitely try it there. And you can break production that's on you. But you can replicate the Grafana dashboard in there. So this is more of a testing dashboard where you're seeing testing data from chaos, from litmus, not the production dashboard. Yeah, thanks. Sure. Yeah. Hi. I have a question. Is it possible to declare these tests using YAML? Yes, it definitely is. So in litmus, where we actually edit or where we do the blank canvas, the entire YAML is visible to you. So you can edit your YAML and declare it as much with the granular details that you want. So the entire YAML is visible to you. Also, if you want to create your own experiments, you can do that by the litmus SDK. So it will help you bootstrap and get started and you can create your own. And also, for the Chaos Hub, where you can see all these faults, these are also declarative. So you can take these faults and you can create your own thing, not here. But yeah, these are just some templates that we have. But of course, you can pick these templates as well and extend upon it. Use the SDK to make it more declarative. Right. And can you lock down the UI so that users don't create things from there and only through, for example, Flux or Argo? For that use case, what you can do, there are two things. You can, of course, use the RBAC capabilities of litmus in Project Setup members. So you can give your members permission or access to what kind of control they would want. The second is you can actually use the litmus APIs so that way you can only give specific people access or permission to do those tests. But we can't really lock down the UI because it's an admin panel, but you can give access to whoever wants with what level of granular details. That's nice. When it comes to RBAC and end users, as a cluster admin, I want my end users to have limited accessibility, so names-based access. And so I would expect some sort of service account impersonation to be in place. Is that planned or what implementation are you thinking of to make this more multi-tenant, so to speak? I shouldn't get answer that. Can you repeat the question? Yeah, so if we have tenants in a cluster, we need to make sure that even if litmus is running with cluster admin permissions, let's say, it itself, when running tests for a specific user or tenant, so to speak, it should be limited to that tenant's permission set. Yeah, so in the ML itself, we have an option to provide the service account. So if the service account, which is binding to that user, can you show us that? Yeah. So if you will use that service account, which does not have those permissions, then it will use only it will not run that experiment. If that user does not have that permissions. OK, because what I'm thinking of is even if a user could perhaps reference the service account that they should not be able to use. So is it possible to restrict this service account to a specific namespace? Yes. So I mean, this is a good requirement. We can add a validation the UI itself. So if one user does not have permission, I mean, we will allow only the specific users because we have the user management as well. So we allow only those users. We can whitelist the namespace as well for a specific user. So that we can add in a future roadmap, but right now we do not have that. Yeah. That's just some ideas that come to my mind. And another idea like what this lady mentioned here about the status and identifying whenever you have had an outage. What I'm looking at is, for example, Kyverna, which returns a result of scans that it has performed. Same with Tecton. It gives you a pipeline run. So a specific run of a pipeline as a CRD. So you see what has run, which parameters, and when it ran. So in a similar fashion, to implement everything using CRDs could be of inspiration to look at these patterns of design. We also have CRD called Chaos Results. So there you will see the all reports and everything. You will see the target details and the historical data as well. Thank you. And if you see the Grafana dashboard as well, there is annotations. So if there is some variation in the data, then you can see. I mean, can you show? So in that annotations, in that reason it means this spike is because of the chaos. So if there is no annotation, it means it is not during the chaos. So it is some natural chaos or something. Cool. Thank you. Thanks, yeah. But these are great additions. We're definitely looking to it. Also what she mentioned about the Grafana, we can also add a plug-in built-in something. So it's easier, yeah. Cool. We are a little out of time. I would like to wrap up. But we can definitely discuss offline, yeah. Thank you. Thank you guys so much for the talk. Thank you. Thank you.