 Good morning. Good morning. Good afternoon. Good evening, folks. Welcome to this webinar. So just a quick introduction to myself. I'm the VP of engineering here at Harness, and I sort of helped them build many parts of the software delivery platform. Prior to this, I spent most of my career building infrastructure, both at VMware and Nutanix. So I was part of the team that actually built virtualization at VMware, and we sort of took that a notch above at Nutanix. But today, we are going to talk about engineering efficiency at scale, which is what do you need to look at when you really want to achieve your business objectives as an engineering organization? And how do you do it at scale? How does Dora metrics fit into this? What else do you need to look into this? So one thing I want to straight away put out is this. I want to make this interactive. So please feel free to ask questions at any point in time by posting in the Q&A box. And after every couple of slides, I'll sort of take a look and see if I can live answer those questions. Okay. Great. So let's get started. So straight away, what are the Dora metrics? And why do we care about it and what are the caveats you really need to think about? So the Dora metrics really just tell you what is your ability to ship change or set of changes to your customer and how do you do it without breaking things? That's what they really try to help you measure. And if you do it to sort of help you highlight what are the problems. But what they don't tell you is effectively whether you are achieving your business goals. And so that's something we will touch upon and everywhere. The one other thing a lot of people miss the Dora metrics is that this just does not apply to your services that you're hosting. This actually applies to your tool stack as well, specifically your DevOps tool stack. Because if you do not have the ability to make changes quickly and in a reliable manner, you pretty much are it's as bad as your services not meeting the Dora metrics as well. The other thing I'll also point out is excellent tooling is very important to again do very well with respect to Dora metrics. But culture is also really important. And what you need to do is you need to use your tooling to sort of embed the culture into it. And in fact, I'll show you guys a few examples of how we do it at harness. So that gives you more insights into this process. So I think most of you should be familiar with Dora metrics. If not, I will just go over this fairly quickly here on the slide. So there's four key Dora metrics. The first is deployment frequency. This sort of informs someone what is basically their ability to sort of ship code on and what frequency can they do that. Okay. So as you can see, the top organization sort of can do it on demand any number of times a day. And for other organizations, it varies between like once per day to like, maybe once every few months, right? Now, just because you can doesn't mean you keep doing multiple deployments every day. Particularly for this metric, it really should be driven by your business needs. What are your customers asking for? What is the right thing for your business? That's what determines this. Okay. Just doing multiple deployments a day and sort of pushing changes that your customers don't need is not necessarily the right thing for a lot of people. Then there is the question of lead time for changes. How long does it take for you to basically go from committing code to putting it in production? What's the amount of time it takes for that? And this is exactly where tooling is extremely important. Are you efficient in your tooling? Are they available? And if that part is not working, I think this particular metric will start lagging. Then there is the time to restore service. If something were to go wrong, how long would it take for you to restore the service? As you know, in the modern SaaS world especially, we're all striving to be at least three to four nights. In the least, the good organizations are at six nights. And the only way you can achieve that is your downtime is fairly low. You have to pretty much be in the elite section of how we measure this in order for you to actually meet anything above four nights. And finally, there's change failure rate. How often do your deployments pay? This is not, in my opinion, of course, it should be reasonable. Again, there's no need to really aim for the top elite status here. The reason being, look, catching failures early is a good thing versus basically putting it in production and then realizing it's something as long. So I think what needs to be more encouraged is the ability to catch them rather than basically slowing down to a point where there's no failures, but then you're impacting your business goals. So the balance is really important. So let's keep moving on. Okay. So let's talk a little bit about how you bake this whole process into your tooling. How do you make this part of your DNA? And I think this is an exercise we went through ourselves at Harness. We really wanted to make sure that quality, security, and the ability to verify that our deployments are working, be baked into our pipelines. Not only did we want to bake them into our pipelines, we wanted to mandate that there be no pipelines without these steps. So first, let's look at where they fit in. For example, if you look at the left-hand side of the slide, where we describe the continuous integration process, of course, you have the build-in test, and then you generate artifacts as a result of it. But really important is basically the quality and security scanning, especially static scan. We incorporated the steps to make sure that every PR that is coming through, we effectively have to pass a quality and security. Otherwise, we don't basically generate artifacts. Let's assume it passes and then an artifact is generated. As part of the deployment stage, now we want to make sure the artifact generated is actually secure, followed by some sort of provisioning and release strategy. Now, there's many types of release strategies you can follow, the classic being rolling or canary or blueberry, but there's now a new way also of deploying a code, which is important to, of course, you need the artifacts to go to your production environment or your test environment or pre-prod, but it also is there's another way to roll out your feature, which is to a feature flag. So now once you put your artifacts there, you can decide how you're going to roll out your feature to your customers. You could roll it out to a smaller segment based on geography or you can roll it out to a predetermined percentage of users. Now, that really gives you a lot of flexibility, especially with what we call testing in production, because a lot of times as you become a large organization, you do not have the ability to replicate all of your production in your staging or pre-production environments. So the only way you can really test something is in production. So what you do before going to production is that you make sure that you do the basic sanity testing, but then the feature itself gets tested in production. Finally, you will do a task security scan, again, most likely a staging or a pre-prod environment before heading to production. And the last thing, and this is really important, is what we call deployment verification. Okay, this is something we use internally, which is we monitor some key metrics. Let me give you an example, page like pledge your time for some of our apps. And what we do is that the minute you start doing the deployment, let's say you rolled out your artifacts to 20% of your deployment infrastructure. And we start measuring these metrics before and after the deployment and specifically to the 20% that actually changed. And let's say if your page load times are higher than they were before, clearly that's a right fly. That means we initiate an automatic rollback based on that. If things look great, we automatically roll forward and the percentage of deployment keeps increasing until it's deployed to every node in your infrastructure. That makes sense. Did any questions folks at this point before I head further? Because we're going to get a little deeper in this. All right, let's keep going. Now, all this is great, right? And I think, I would say a lot of companies actually figure out how to do this at a smaller scale. Like when you're like, you know, 30 people company or 50 people company is fairly easy. And that's the same journey we went through at Harness. We basically in the last two years, three years, we went from like basically 50 people to 500 people. And but the key thing is how do you do all this at scale? What do you need to care about at scale? And by the way, this is the important part of all the software engineering organizations I've been part of, almost nobody does this. So it's really important that, you know, we again look at it, understand what the modern practices are and also think about what matters when you're doing a software delivery at scale, right? One is the number one thing for any software organization is velocity, right? How do you make sure you keep your velocity and without even at a very large scale, right? And as probably anybody in a reasonably sized organization knows that organizations load up significantly as they grow, right? So how do you keep that? Second is quality, okay? You as you're onboarding new people into the company, not everybody might be aware of what breaks things, right? What breaks things, okay? And so we really need to look at it and make sure that everybody knows how things work. And also you sort of use that's where governance comes in, which is you need to make sure that these, the quality gates or whatever other things you care about are part of your pipelines and hard coded in there so that nobody can override them, right? So you need this guardrails for people to not fall off and sort of do something that affects your quality or availability, right? Finally, you need to do all this extremely efficient. So it's great that now you have, you know, now you've sort of figured out how to do your pipelines, but what if it costs so much that the businesses, you know, it's impacting how much what your profit margins are, right? You really need to make sure that you're optimal in how you're doing this and how do you measure that? That's also very important, okay? So there's a question on Q&A from Shafali. Let me take a look. When you're implementing at scale, what are the key metrics you start with versus enabling and showing everything at once? Okay. So I think, I'm not sure I completely understood the question, but really what it comes down to is the metrics are all, they're always going to be there, right? But the key ones you really want to start with is exactly the Dora metrics. That's the basic ones you want to really start with, right? And that's why those are the top ones you need to look at. The other ones I would start adding are like, for example, the efficiency metrics. That's very extremely important as well. So it's great that you have all the availability and whatever else you need, but what if it's costing you a lot? What if it's, it makes your business unviable, right? It's extremely important you look at those as well. If that answers the question, if there's a follow up, please put it in the Q&A box, right? So let's talk of value stream management a little bit here. So what does this mean? And how does this differ from the Dora metrics you just saw? Okay, what Dora metrics really helps you with is understand whether you have the ability to ship code quickly. And can you do it with a reasonable quality, right? What it does not help you understand is whether you are achieving your business objectives. Business objectives are typically defined by OKRs, right? I think some people use KPIs, but I think OKRs is a new gold standard, in my opinion, right? And really, there's different types of OKRs, right? There's product related OKRs, which is, you know, for example, you might be wanting to ship a new product, or you won't be wanting to provide a new service, right? That's usually defined by one or more OKRs. Then there's people related OKRs. There's something like a lot of organizations ignore, but at scale, it's extremely important. Are you burning out your team? Is your team happy with the work they're doing? Are they growing in their careers? How do you measure all of this? Extremely important, right? Then there is quality. How are your customers perceiving the quality of your product or service, right? What's your uptime mean like? What's your availability being like, right? How long do you take for things to get restored if something breaks? Okay, then there is cost OKRs, which is, you know, the budget for your engineering org and the associated infrastructure and whatever else is sort of fixed, right? It's a percentage of what your company can afford, right? Or how much money it has or is willing to spend reinvesting in the business. So you need to know whether you're fitting into that budget or how you would fit into it. Finally, there's the compliance, which is, depending on the industry, there's a lot of things you need to take care of. For example, if you're selling to the feds, you need to be fed when I'm compliant. If you're in the medicine business, you need to be HIPAA compliant. If you're operating in Europe, you need to be GDPR compliant. How do you make sure you're not falling, you know, you're not breaking from that norm and sort of not being able to achieve this? Okay, so these are all different OKRs that get defined. Okay, the most important thing for people to achieve these OKRs is that you need to reduce friction for your engineering team in order to help them deliver value to customers. This is absolutely the number one thing that you need to take care of. Okay, and how do you do that? And I have some key takeaways here and I'll actually dive into each one. Number one, your tooling should always work. This is a top problem I've noticed even in my previous companies, engineers get extremely frustrated and tooling does not work, right? It affects their efficiency, it sort of undermines what the mission of the organization is. And more importantly, it really starts taking a toll on their morale in terms of their ability to effectively deliver, right? Second is removing efficiencies. There are some obvious ones. There are not some not so obvious ones. I'll actually walk through a couple of examples in the next few slides, right? Third, measure and look for bottlenecks. Okay, if you measure, you will always find the bottlenecks. If you don't, you won't know what is wrong, right? It's very important you do that. And finally, I already talked about this people-related metrics. Always watch out to make sure that people in your organization are happy. All right, let's talk a little bit about tooling should always work, right? So so there's a couple of examples here, by the way. We will talk about which is this is the journey we went through at Harness, right? Number one was we were using Jenkins for Harness CI. And I'll actually, in a second, talk about the issues we had with Jenkins and why had we had to move to another solution, right? The second was also the custom feature flag solution we were using, which was pretty rudimentary to we had to actually use, you know, again, a production grade, really enterprise class feature flag solution and how that impacted us as well. These are both big decisions. And, you know, it really improved our ability. And as you will see in some of the metrics, I showcase later. First, what was broken with Jenkins? Okay, there were quite a few things. Let me highlight the top things. Number one thing was difficulties in troubleshooting. When something went wrong with Jenkins, we needed that special team we had to jump in and really figure it out. It's not like any developer could jump in there and figure out what was going on. That was very, so that doesn't scale. That means that you have to grow this organization that specifically takes care of your Jenkins as your team grows, right? And that's basically means you're sort of linearly adding resources to it, which is not very efficient. Okay, the maintenance warhead was high. The team size dealing with this was fairly reasonable, right? It was a significant percentage of our engineering. But that I mean, it's still in single digits, but it's significant in the sense that it's not directly adding value to our customers. Okay, that was that. And finally, there's, you know, the best cycles were really long. The test cycles were long. There were no real obvious ways of optimizing in it, right? And really importantly, also was that governance was a big deal. We just could not mandate every pipeline had to have all these steps. We could not mandate approval stages properly. And so it just basically sort of, you know, forced us to, you know, basically be looking at it manually every single day versus hoping that the tooling would help us automate some of these best practices. Okay, so by moving to Harness CI, which is the enterprise product we built ourselves, and we actually shipped this to others, we actually sort of figured, you know, overcame many of these issues. Okay, let's talk a little bit about what was broken with our custom feature flag solution. So this is another interesting example. We had our own custom feature flags and actually work pretty well, I would say it'll be about 75 people, right? Because we knew exactly who had access to it. It was just two or three people. So if you needed to know, hey, which if this customer has access to this particular feature, you just send that person an email or a Slack message and that person will say, let me take a look. And if it turns out it's, you know, if they're past their bedtime, maybe just wait for a few hours, and they'll respond in the morning, right? The minute we started growing to five products and 200 people just started breaking, right? Now we were too dependent on one or two people to do this. And then we overreacted to it and give access to a lot more people to do this. But then we didn't have proper, you know, our back controlling who could do what. And so we released features that should not have been released or we were struggling with figuring out who had access to what, right? And basically the right people didn't have the right visibility or the right power, right? Finally, we decided that, you know, we have to solve this problem for ourselves and everybody. So we actually sort of built our feature flags product and addressed a lot of these issues. And also as the benefit, side benefit of this was also that, you know, we actually could follow complex workflows for actually releasing a feature, not just basically turn it on or off, right? You could actually do a rollout, gradual rollout, you could figure out exactly the geography, you could roll this out to what a specific user or whatever it be. I'll pause here for any other questions, if there's any. For a second or two, and then we can move. Okay, all right. Let's, there it is. How do we get visibility into all the feature flags that would be excellent question, right? So this is just, you know, I don't want to advertise a specific product or so. But if you look at some of the commercial offerings or feature flags, what they do is they have SDKs, which help you publish the specific feature flags you're using. And also you can go to the GUI and really look at the list of these, the features that are all present and who it has been released, right? And the thing is, the most important thing is you can control who sees this. And you can also sort of control which feature flags, who has the power to change certain feature flags, right? And also who has the power to roll this further or roll this back and things like that. Okay, I hope that answers the question. Yeah, please type any follow on questions and we'll take it in a second there. Right. So let's talk a little bit of removing inefficiency. And here's a couple of examples. Okay. Okay, there's another anonymous identity that has a question, do Kubernetes based system need feature flags? Absolutely. Everybody needs feature flags. In fact, Kubernetes is just infrastructure. Feature flags is what goes into your application, right? For example, let's take something really basic. Let me think of a good example. Let's say you're talking of TurboTax. Okay, let's say they roll out a new feature that sort of helps you, sort of helps you plan your retirement better, right? Kubernetes cannot help with that, right? Well, what you need to think about is your users and how you're going to roll out features to your users, then who has control over them. Okay, that makes sense. Right now, let's see, there's another follow up question. Can you build multiple images from each client and deploy in their tenants? Okay, so I'm not sure what that question actually means. So, if you can elaborate whether you're talking about, if you can give a lot of more context on this, maybe I can help answer this question. Okay. Thank you. We'll take it in a second, right? Let me cover this slide. So, the time to, let's talk a little bit about the migration to Bazel that we had to do, right? So we actually used to use Maven for our test and build systems. And if you really look, this is an actual, you know, graph of all that, how long it actually took for us to run all of our different types of tests. And these are taking data points taken once every month or a four month period, right? And if you really look at the time it took for most of our tests, tests almost doubled or tripled in each of these cases, right? And the reason for that was our team grew a lot during that period. And also, you know, we were adding new services and new tests associated with it. And one of the things with Maven is you can't pick and choose. When you're running like in a Mono repo, you can't pick and choose what builds and tests you were specifically running for each developer, right? And so what we did was by moving to Bazel, we were sort of forced to create a dependency graph. And people could just, when they make code changes, we were only building the necessary parts of it using a global cache for everything else. And also the tests were associated with the specific component that you were building. Okay, that's great. But that's not sufficient. What if you just made like, you know, five line change, which only really affects one test even within the specific service, right? And for each of our services, the tests take anywhere between 10 to 15 minutes, right? How do you do that? And for that, what we did is we used something called test intelligence. And let me sort of walk you guys through how this fits into the whole process. Now, this is how a typical PR process looks like, right? A developer basically goes, creates a local working branch, creates a pull request on it, right? Sorry, that starts coding and then creates a pull request on it finally goes through the review process. When they're going through this review process, what they do is, obviously, they have the changes, they send it for review, they get feedback, they make changes, they go to build again, and then a test cycle again, and then submit the review again. And they might get more feedback, again, code changes, build and test. So what we found out was, for harness, we do about 700 odd PRs per month, pull request per month. And for each PR, we actually do 3.5 PR checks per merge, which is, if you look at the cycle of code changes, build and test, we repeat that about 3.5 times per each PR. And so that resulted in about 2.5,000 PR checks a month, right? And basically, the thing, what we're doing with test intelligence, let me walk through that in a little bit in detail here. So what we do here is we instrument, dynamically instrument our code to figure out what is the exact set of tests that impact the specific change you made to your code. Like, for example, if you change only one line of code and there's only one test that covers that, we will be able to figure that out, right? So if you had like 100 unit tests, we would only run that one instead of the 100. So what we found out was, once we started using this module, what we call test intelligence, we found out that the number of test runs saving with TI, we were saving about 35% time in the amount of tests we had, we could avoid running with test intelligence, right? And if you let's look at the little bit about the math, which means that, if the average test time without test intelligence was 43 minutes, right? And if you save 35% time, what we found out based on 35% saving was that we were saving 3.5% years in a calendar year with test intelligence. Okay, that makes sense. Okay, let me try and answer this question here. There's a question, Kubernetes systems are deployed based on container images and let's say they're deployed in multi-tenancy with their isolated environments. Okay, can you build an image specific to a customer requiring only modules? They're supposed to have instead of a single system with massive feature flags and figuring out what features are deployed for each client. Absolutely. I think what this really comes down to is, yes, if you're doing more of an on-prem solution, this seems like you could make it work. But if you're a SaaS solution and it's sort of shared between tens of customers, which is what a lot of people do, it sort of makes sense to basically use feature flags, right? Thanks for clarifying your previous question. And also, I really think that you should plan for scale. Like what's being described here to use Kubernetes and images to do this isolation, I think works better on a smaller scale. But really, ultimately, you have managing the features is the bigger problem, not how you achieve it, so you could technically use a different mechanism, but how you manage those feature flags on top is the most important part. And if you basically, by doing it at the infrastructure level or the image level, I would say it just becomes really, it actually, I would say at scale becomes extremely complicated. You really have to know what's working. And if you have to debug, it becomes really, really hard versus, yes, your code might look a little more clunky with feature flag SDKs built into it. But if you figure out a good mechanism to do it or a good system to do it, it becomes much more readable. And also, I think the amount of time you'll spend debugging, you know, anything more than 100 users will reduce dramatically. Thanks for the question. That's a great question. So let's talk a little bit about how we actually measure and look for bottlenecks as well. So these are actual real examples of some of the metrics we look at at harness. So just look at our deployment frequency. This is from the past week. This pretty much, I would say the same, even if you go back a few months. On some days, we do over 100 plus deployments. Again, as I said, you really need to look at your business function and decide what is the deployment frequency you need to have. The reason we have so many deployments is we have a number of services, we have a number of products that you know, we ship and they're all deployed independently. And that basically is what highlights, right? But if you look at the ability to also change like some days, it's lower, some days, it's higher. It's just the most important thing is do you have the ability to do this? Okay, that's part one. The second thing here is that if you see October 16, 17, that happened to be the weekend, so nothing much there. If you look at 15th, that's thanks to some of our awesome execs at Harness. We actually have the second Friday off every month. And this is in addition to other holidays, right? And so it's just beautiful that we get the day off. And so our deployment frequency was much lower that day, but it's still surprising that we have 10 plus deployments even on an off day. So this is something I think we're extremely proud of on how smoothly this part works for our organization. Now actually, let's look at something that probably the number is not where you think it should be, the change failure. If you look, we are at 25%. Typically, when you talk of an elite organization, you're talking less than 15%. So it's really important you have dashboards like this to figure out exactly where the problems are. So if somebody told me you have a 25% median failure rate, the first, my first reaction is, but why? And immediately you jump into a report like this and I can see, hey, can I sort by the list of projects that actually have the highest deployment failure rate and immediately you can see where the problem is. In this case, if you see most of the projects with the highest deployment failure rate were actually test projects in our case, which is fine. It was great. And they're actually probably not even deploying to production. Okay. We were quickly able to figure this problem out and say, that's fine. I think this is something we can ignore. And like maybe for a future dashboard, we can just not put this sort of, remove these, filter these projects out so that we only look at the ones that really matter. Finally, how long does it take for you to do the deployment itself? Now this is, you're really looking for anomalies here. Like if you look at this graph, clearly we have this actually, I think two projects here that have this 24 hour thing where we really run the deployment for an extremely long period of time, like it's almost a day and then it fails. So you know that something is wrong with these two projects. But if you look at everything else and if you look at the mean duration, it's really good. It's less than 30 minutes for us to do our deployment. So immediately helps us focus on where, which projects are the problem areas can work with the team leaders and they can just go fix this themselves. Finally, we should, I talked about cost, like a few times here. And this is extremely important that everybody should pay attention to this because this is the runaway train that can derail all your projects. In our case, our logging cost was pretty high. It was increasing significantly. So I don't have this graph for more than the period here, which is for like a quarter. But basically what it was is that this had been all growing slowly over the last year. And so we looked into this. This is our logging cost and GCP. And we looked into this and we said, we have to take care of this. This is disproportionately high for what we are doing. And somewhere in August, we started working on it and you can see a trend downwards from there on and how we're addressing this. So again, to summarize, I think looking at these metrics, extremely important. This is how you know what is wrong. This is how you identify bottlenecks. And just please make sure that you're building systems that actually allow you to measure and identify these bottlenecks. So then you can do the other things I talked about. Like in our case, we moved the base of what we built, test intelligence. These are all efficiencies we gained after realizing what was wrong with our process. That concludes my presentation. I'm here to answer any questions from here on. So I think there's some good question here. Is there a good resource to better understand correlation between different metrics? Yeah, it's an extremely good question. So it depends on, again, which part you're talking about. So let me give you a couple of examples. So one is, if you look at the deployment verification part of our product, which is called continuous verification, what we really try to do is we really try to help map between the specific change that you were doing. For example, a deployment or whether you did an infrastructure related change and point out to specific failure and show you the correlation between them. That might be one way you can correlate the metrics. Typically, these are pretty hard to do exactly what is wrong. I think the key thing here is change related to specific metric not going right is possibly the thing that's the best way to correlate things versus just correlating different metrics together. Because what you're really trying to do at the end of the day is root cause. Why, yes, some metric is not where it should be for you. Hopefully that answered the question. I'll take the next question. What are you currently working on towards in terms of roadmap? Yeah, we're doing, there's a lot of stuff. I can't completely talk about everything. But I think as a company, I would basically say what we are focused on is we want to do everything in the software delivery space, which means everything from the point you write code to putting it in production and making sure it works as you expect in production. So anything in the realm of it is what we do. One of the differences between us and I would say what's out there from a competitor standpoint is that we are a deep company. We don't want to build something because it's supposed to be there. When we pick a project, we really want to make sure we are addressing really strong pain points and our offering is actually differentiated against the best in that segment. For example, if you look at continuous integration, we incorporate a test intelligence into that. If you look at feature flags, we are incorporating the end to end pipeline into that. If you look at cloud cost, we focus on Kubernetes. Nobody was doing that well when we started. So we don't want to do any shallow projects, but if that helps you answer the question. What was the biggest return on investment for harness as you looked at these metrics at scale? Where would you look first in your opinion? I think the ability to look at all the metrics I just showed especially in the end, I think that was huge. Really having those metrics, knowing how your deployment frequency was going, that was a really big deal. The second investment I would say is moving away from Jenkins was a really big deal because it really lifted the spirit of an engineering organization because we had been hearing constant pain over like I would say 18 months on how we were unable to scale and just people being extremely frustrated about that. That would be the second one I would say was a really big deal for us. Harness provides integrations with third-party services like GitLab. Yes, we do. I mean we integrate with pretty much anybody. We are a platform, but yeah, absolutely. We do. At Harness, do you have metrics you track directly against individual engineers or do you stay at the team level? So great question, Sean. So we actually do have metrics even at individual engineers, but we don't use that for day-to-day operations. The reason we don't do that is because I don't think they're that useful. It is in very rare circumstances, very useful. Let me give you examples. If you look, different engineers bring different things to the table. There are some engineers what we call are really good at design. There are some engineers that are really good at collaborating and there are some engineers who are just really good at producing code. We can't measure everybody at the same thing and some of these are not quantitative measures. They are qualitative measures. So I don't think it's really that important. I think measuring at the organization level and looking at the inefficiencies in the system is more important than measuring at the individual level if that answered the question. Okay, I think we have a few more minutes so I'm just going to hang around here if that's okay Marisa for a few more minutes to answer questions. There you go. Okay, great question here again once more. Can you use Harness with GitHub Actions and how do you deal with overlap? We actually connect very well so you can use any CI solution with our continuous delivery or continuous verification or feature flags product that works pretty well. On the other hand, you could just be using GitHub SCM and you can use our CI solution. The way we if you were to use GitHub Actions is we actually have all sorts of trigger events where you can tell us when an artifact has been generated and we can basically take it from there and help you with the rest of the department. But yes, we actually integrate really well into most SCM and other CI solutions out there. There's a question from Taha. How can we make action process and get failure of process? I'm actually not sure I got the question. Apologize for that. If you could clarify that, I would appreciate it. Okay, any other questions here or clarification to the previous question? How do you deal with rollbacks? Excellent question again. So I think the way this two things here. So one is the rollback strategy actually really depends on the app. You need to provide how you would do the rollback. But what we have developed is a system to automatically figure out whether a rollback should be initiated. So for example, I was just giving you the example of page load times. It could be any metric you care about. So we monitor those metrics over a period of time and sort of create a baseline on how your app behaves. So when you go ahead and start rolling out a new artifact or even start rolling out a new feature, we start looking at those metrics in context of when you did it and start seeing whether those metrics deviate from a well-known baseline. If they do, we automatically initiate the rollback. If they don't, we automatically initiate a roll forward. So what we can help with and what we do internally ourselves is basically that level of automation. But how you do the rollback is something like the steps you need to do the rollback is definitely depends on the app and has to be taken care of by the application owner. Yes. Okay. Thanks, Taha, for clarifying your question. Yes, absolutely. So I think you can, there's a lot of actions you can take on the CI CD pipeline at different stages and on different events. And it doesn't even need to be failure. You can just define the event in which you would like to trigger a failure response or a notification response for that matter, or an approval step for that matter. Right. You can do all this and yes, all of these integrate on Slack and you can basically act on either an approval or a failure or even a success case, whatever it be, on Slack or some of the other tools. I mean, we have pretty deep integrations with a number of other tools out there, including Jira. Yeah. So can you give more examples of key engineering metrics you can get transparency. Yeah. So I think there's a lot of metrics, right? So the thing is, there's two sides I would say, there's sort of this line, which is, hey, how's the team doing in terms of productivity? And the second is how well are they doing from the point that code has been produced to how it goes to production? And what is the quality of the code that goes? Okay. So the first case I talked about is the former case is all about the agile metrics, right? We're looking at the agile metrics. We are seeing, hey, how many tickets is the team able to take? What's the size of these tickets? We don't want to measure against different teams. That's not correct because how you size tickets really varies per team. What you can look at is the velocity of the team over time and see how they're doing because they have a certain system and within the team, you compare it to the team itself, like how they have been doing in the past, right? And so those are important metrics to see the productivity of the team, right? And then I also talked about people metrics, extremely important, just basically measure their engagement. For example, we use a survey tool, which really gives us like into these insights every month, right? So those are all really important metrics that we measure to figure out how efficient our team is and also how happy they are. Now, the second part is where we focused on today, which is once the code is generated, what does it take for it to go into production? And how much time is it taking? What are the issues we are running into? Okay, hopefully that answered your question. Okay, just waiting for some more questions. Maybe give it a few minutes. There it is. Okay, John from Jonathan, is it a good practice to assign symbolic values to each metric? For example, each for each job failure or deployment done, company lost certain money. Yeah, this is an excellent question, John. I think, and by the way, this is a pretty debated question as well. So a lot of times, I mean, let's look at it the other way, which is can we assign a dollar value for every commit that our developer does, right? And the answer, what we found to that is it's not simple because typically, if you look at the per commit basis, you cannot really figure out what is the impact because two commits together are probably worth an X more than what each commit would have been worth by themselves. So I think abstracting it a little higher, especially at like a service level or a product level helps a lot more versus going down to each commit or each failure. Okay. Again, I think failure is, in the case of failure, it's a little easier to assign it, but the dollar amount to it. But even then, I would basically say, I would basically say that the right way to do this is look at how much, what was the downtime you had in a given quarter and really associate the money lost from that point of view versus a specific failure because it's really hard. Like maybe you lost a customer just because of one failure or maybe you lost customer because you have a lot more downtime in a quarter, right? I would say the latter is more likely, which is people get frustrated when you have repeated failures versus a single failure. Okay. Another question from Taha, any AI and ML in harness? Yes, we do use AI and ML for our continuous verification product, which is the deployment verification I just talked about in the slides, right? That is how we figure out whether your metrics are sort of not following the pattern that they were following in the past. We learn and build a mathematical model using ML. So sort of compare against that to figure out whether your deployment is going well or not. Okay. Supporting programming languages? Yeah, I think, I mean, CI, we are agnostic to programming languages. So that's just, yes, we do support those. Okay. Okay. Another question, what have you seen to have made the biggest impact on your teams in your journey, building the platform and so forth? I think I just mentioned this, I think the Jenkins thing was a really big deal. I think it made a huge impact. The second thing we are seeing making an impact is also the test intelligence part of it. So even in our very first version, we are saving 35% time for our developers on average. So we are very conservative if we cannot figure out whether we can run only a specific number of tests, we just run everything. Okay. So basically, there's lots of runs where we are running everything, but the runs, we don't run everything. We really save people our team a lot of time. For example, instead of running 100 unit tests, we only run one or two and it happens quite a bit. And that's how they end up saving 35% time. So that has, in my opinion, really encouraged our engineers a lot. They just feel good about how fast they can move and not context switch every time they go through a PR check process. That's one more thing. So I would say even on the feature flags side of things, especially a product management team is extremely happy about it. They really feel that the visibility and the control they have is so much better than what they had before. From the engineering side, I would say, sure, I think that they're happy it's more systematic than what was there before. But I think it's mainly the big impact is on the product side and the customer success side. Okay, guys, let's give it one more round of questions, I guess. Okay, perfect. Looks like we are done. Awesome. Thank you so much, folks, for the awesome questions. There's one more question. Let me take this. Can we integrate multiple clouds like AWS, Azure? Yes, the answer is yes. Okay. All right. Great, guys. Thank you so much for attending the webinar. It was a pleasure to take all the wonderful questions from the team. I look forward to more webinars like this. Okay, thank you so much. Bye. Have a great day. Thank you, Srinivas, for your time today. Thank you, everyone. Just a quick reminder that this recording will be on the Linux Foundation's YouTube page later today. Okay, thanks so much and have a wonderful day, everyone.