 Hello, everyone. Thanks for joining our session today. My name is Fabio Liviere, and I'll be presenting together with Trini Vazen Partacerati. Let me start by saying a few words about us. We are passionate about what we call live experiments, because we believe that live experiments can help organizations answer the phone in question. How do you know, how can you know whether or not every single new code release that you make delivers value to your customers and to your business? We co-founded a project called Iterate that embodies our vision for a live experimentation of Kubernetes application and how it can help businesses. And we have experienced creating solutions to help SREs and to automate DevOps practices. Before we go into the specifics of live experimentation, let's take a step back and look at current trends when it comes to cloud-native code release practices. These trends, we extracted them from a recent CNCF survey report that was made available publicly in November of 2020. That survey found that the majority of organizations release code weekly or more frequently, and that there is a widespread use of CI and CD tools in production. But those findings do not come as a surprise. That survey also revealed that there was a 7% increase in the number of organizations that release code daily, sometimes multiple times per day, which is also not surprising. However, paradoxically, that same survey uncovered a trend that was surprising to us. From 2018 to 2020, there was a big decrease, a 21% decrease in the number of organizations that adopt fully automated release cycles. So organizations seem to be raising code more frequently, but automating less. That is not intuitive. So why is that happening? There are a few explanations for the phenomenon. Perhaps there are newcomers that are just trying out automation solutions to see what works. Maybe some organizations have the need to control a few aspects of code releases. Or perhaps there is a lack of trust in the tools currently used for cloud native release robots. What can we do to change that? So we are on a mission to change this state of affairs. So here are the goals for this presentation and how it relates to the trends that we just saw. First and foremost, we want to raise awareness of facts, misconceptions that we've seen, and pitfalls that organizations might encounter when adopting fully automated solutions for code releases. We want to elevate the practice of release automation. In this process, we will introduce the notion of live experimentation of Kubernetes applications and show cases where it can bring value to your business. We will frame common practices, such as kinetic release, AB, ABN testing, conformance test, dark launches, et cetera, as live experimentation. And we will show you a live demo of live experimentation that is applicable to any Kubernetes-based stack. From this point on, the presentation will be divided into two parts. In part one, we cover what we call principles of trustworthy release automation. And on part two, we're going to see a live demo that embodies best practices for fully automated release code release cycles. The first thing to notice is that if you want to adopt a fully automated solution to code release, it must be data-driven. You want to know whether or not your release is going well or not, and why. So therefore, in order for you to trust a solution, that solution must be able to interpret the data properly so that you can understand what is going on every step of them. And there are two important properties that must be satisfied by a solution that is providing you with a fully automated release code release. Those properties are accuracy and repeatability. Let's talk about accuracy first. Suppose you have an oracle that knows, for instance, that your canary release is perfect, is good and satisfied, your acceptance criteria. An accurate solution would consider that canary a success. Similarly, if you have an oracle that knows that across three versions in an ABN scenario, version two is the best one, then your solution must tell you that that is the version that is the winner. Now, repeatability is equally important. So if you run a rollout today, do a rollout today, or run an experiment today, and get the result, if you run it tomorrow with exactly the same code, you must get the same result. If your solution starts flip-flopping, today it will tell you that the canary is good, and tomorrow it will tell you that the canary is bad. Even though the code hasn't changed, then there's something not quite right. And we did some research and we observed that some solutions to progressive rollouts, they suffer from the problem of not meeting both accuracy and repeatability. So what does a solution need to satisfy those two properties? First and foremost, it needs to have statistical rigor. It needs to consider all data points, all observations since the beginning of a rollout or an experiment so that it does not make premature decisions about success or failure or which version is the best. And it needs to be capable of using that data to perform sound assessments of all the versions considered. So you look at the accumulated data and make sound judgments and assessment. Those assessments get updated every step of the way. Finally, statistical rigor empowers a solution to be resilient to high metric variance. Let's say, for instance, you care about latency as one of your metrics for one of your acceptance criteria. So if the latency is all over the place because of conditions externally to the code, you still want your release automation solution to do the right thing and not be sensitive to that high variance. That brings it to the next point. A solution needs to be able to distinguish what is noise from the cloud or what is a result of resource constraints and what is actually caused by the code behavior. Finally, a solution needs to be able to adjust the traffic split across all the versions considered based on the statistically correct version assessments that it makes every step of the way. Now, it's important to remember that organizations don't release code just for the sake of releasing code. There's always an underlying business need. And when you release code, you actually have an opportunity to deliver business results as you roll out new versions. And in order for you to be able to do that, you need to consider two things. Number one, there is your success criteria in terms of service level objectives metrics. Typically, organizations might rely on mean latency, tail latency, error rate, exception rate, et cetera. In addition to that, you need to consider business rewards metrics, such as those related to user engagement, for example, conversion rate, and those related to revenue, monetary metrics. Now, in order to consider these two elements that enable organizations to deliver business results as two or more versions are rolled out, a solution to release automation needs to enable the following question to be answered. Among the versions that have acceptable performance, which one actually benefits the business the most? And that is the winner version. In answer to that question, a solution needs to separate two key elements that affect user experience. And they usually are conflated. And they should not be conflated. One is how attractive each of the competing features are to your users. And the other is how the versions of the applications are performing with respect to your SLOs. Those are separate concerns that need to be taken to account together, but also separate the problem. And in order to learn as you are rolling out and earn as you're rolling out, a solution needs to be able to perform progressive traffic shifting throughout the entire process. One note about terminology. We have observed in the cognitive community that sometimes the term AB testing seems to be misused. Sometimes it's used interchangeably with canary release. And sometimes it is equated to traffic shifting via HTTP headers and cookies. Now, that traffic shifting is just a mechanism. It's not AB testing per se. Let me draw a culinary analogy here. So if I tell you that I have water, flour, and yeast, can I tell you that I have bread? Do you believe that I have bread? All you know is that I have ingredients. I might not know what to do with them. So similarly, just because one can do traffic shifting using HTTP headers and cookies does not mean that AB testing is actually supported. There's more to AB testing than traffic shifting. Traditionally, AB testing is thought of solving the following problem. You want to compare two oriented versions of your application. And you want to identify the ones that provides you with more user engagement or that delivers more revenue. And typically this is done in the following way. You split the traffic between a control group and a treatment group and you get a data. You look at the data, hopefully the experimentation system will tell you which version wins eventually and then you decide what to do with that information. Maybe you want to roll out. And this is a practice that has been done in the front-end domain, web and mobile applications for a while. Now, in the cloud-native world, we like to think about AB and ABN experiment, which includes the following things. The ability to incorporate service level objectives in terms of performance and correctness metrics, for instance, in addition to business rewards, not only business rewards, but in addition to business rewards, the ability to optionally progressively shift the traffic towards the winning version and optionally rolling forward or rolling back at the end of the experiment when the system actually makes a decision. So what is needed from a release automation solution to achieve that? First and foremost, it needs to use the underlying platform to be able to implement traffic splitting and user segmentation. You're gonna see those two things in action in our demo shortly. And it needs sophisticated algorithms for comparing and assessing all the versions that are competing in an AB or ABN experiment. Continuously adjusting the traffic speed based on these assessments, if that's what you want to do. And importantly, deciding when the winner, when a winner can be declared with statistical confidence. Those are the ingredients and this is the secret sauce that enables you to fully, truly support AB and ABN scenario. Now we're going to switch to the part two of our presentation. I will hand it over to Srinivasan Parthasarati who will take you from there. Thank you Fabio. Hello everybody. My name is Srinivasan and I'm going to show you a demo using this tool called iterate that we are developing to enable cloud native experimentation. A little bit about iterate. Iterate is an open source cloud native release automation and experimentation platform. So if you're developing Kubernetes applications and machine learning models, when you release new versions of those apps or models, you may be interested in optimizing business metrics, validating SLOs, ensuring that your version satisfies SLOs and protect end user experience. Iterate makes it easy for you to achieve all these goals. Concretely, iterate provides a new Kubernetes resource called an experiment. You can declaratively specify AB, ABN, canary or conformance experiments using iterate. And since it's a Kubernetes resource that you can create, you can embed iterate experiments as part of your CI CD pipelines. You can embed iterate experiments as part of your GitOps pipelines. And what iterate will deliver for you is it will compare different versions of your applications and it will pick the best version of your application and automate the rollout of the best version of this application. So as part of its experiments, iterate uses metrics and it can use metrics from any REST API. So if you're collecting metrics for your versions in let's say New Relic or Prometheus or Sysdig or Google Analytics, any REST API that provides metrics for your versions, if you can query it, then iterate can use it. You can use iterate on top of any Kubernetes technology stack. The demo that I'm going to show you use as Knative, that's a serverless platform on top of Kubernetes, but you can use it for let's say KF serving that enables ML or serving. You can use it with STS service mesh or any other service mesh or any other ingress technology of your choice. And behind the scenes, iterate uses AI and statistically rigorous algorithms to evaluate different versions of your applications and achieve progressive delivery and automated rollouts. All right, to the demo. So the demo that I'm going to show you is an ABN experiment. And as I mentioned, I'm going to use Knative serverless platform along with Istio to do this demo. And as part of the demo, we will also use metrics from New Relic and the Prometheus metrics backend. The demo will also involve use of business metric, optimizing a business metric along with ensuring SLOs. There will be three competing versions of your applications that will be in this demo. And we will also use traffic segmentation and progressive traffic shift. These are all some of the features that you will see as part of this demo. Incidentally, this demo and several other demos are documented on HTTPS iterate.tools. So you can head over there, look for the demos yourself and do these demos yourself in under five minutes. Okay, so how is this demo going to unfold? There will be three competing versions of the application, as I mentioned, V1, V2 and V3. And we are going to do the experiment only for a particular user segment. In particular for users from Wakanda. So if you're from Wakanda, you may see V1 or V2 or V3, but if you're not from Wakanda, if you're from the rest of the world, then you will only see V1. This experiment does not even affect you. And the goal of the experiment is to find a winner. And there are two things that I wanna make sure while finding the winner. I want to ensure that the winner satisfies SLOs, in particular latency and error rate SLOs. And I wanna ensure that the winner maximizes user engagement. So if there are multiple versions that satisfy SLOs, among those versions, I want to pick the one that maximizes user engagement. This is the winner of the experiment. And let's say V3 is the winner of this experiment. And what I would like to do also as part of the experiment is progressively shift traffic towards V3. So I might have started at 5% for V3, but as soon as I know that V3 is winning, I would like to start increasing it, let's say move it to 25%, move it to 45% and so on. Eventually make sure that all users end up with V3. This is how the experiment will be automated by error rate. So manually I'm going to deploy V1, V2 and V3. In this demo, I'm just gonna use Cube CTL, but in production, you'll probably end up embedding these things as part of your CI CD pipeline or GitOps pipeline. So deploy V1, V2, V3, those are my application versions. I will also use an STO virtual service that will achieve traffic segmentation for me, Wakanda versus the rest of the world traffic segmentation. I will define the error rate metrics that will be used as part of this experiment. And I will launch the experiment. Once I launched this experiment, iterate will periodically query Prometheus, it will query New Relic, get the metrics for each version, assess the versions, determine a winner and over the course of the experiment, progressively shift to a content traffic towards the winner. This is the process that is automated for me. The evaluation, traffic shifting and decision making is what is getting automated for me through the iterate experiment. All right, so with that introduction, let's head over to the actual experiment instructions. I am going to start by creating the different application versions. Let me go ahead and create the STO virtual service that will enable traffic splitting. Let's look at what this virtual service delivers for us. Okay, so as I said, the rest of the world is unaffected by this experiment. So if you're not from Wakanda, 100% of the traffic is going to end up with V1. However, if you are from Wakanda, even then, right now, all the traffic is going to end up with V1 because the experiment hasn't even started. So as far as all users are concerned, the only version of the application that exists is V1. So this is the state of the affairs right now. In production, these versions will receive traffic from actual end users, but for the sake of this demo, I'm just going to simulate traffic. I'm going to create some synthetic traffic right here. Okay, let's go ahead and define the metrics that will be used as part of this experiment. And let's look at the metrics we just defined. Notice that I'm just copying and pasting these commands and these are really QCTL apply in different various forms of Kubernetes resource creation and manipulation demands. All right, so let's look at the metrics we just created. Here is an iterate metric and it's called user engagement. This is the business metric that we want to maximize. And I am getting this metric from new valid. So I have to iterate the fetch new relic metric called user engagement. And I've supplied it a query template that pretty much looks like a new relic breast query. Similarly, I have defined other metrics. For example, the mean latency metric. This time I'm getting the metric from Prometheus and I've defined a Prometheus metric query and template and iterate will fetch this and other metrics from Prometheus. Okay, so I've defined the metrics. Now I'm ready to actually launch the experiment. Let's go ahead and launch the experiment. And let's take a look at the experiment specification. This is an iterate experiment resource. And as I said, I am interested in performing an ABN experiment. So the testing pattern in this experiment is ABN. There are other types of testing patterns that you can use, but this one uses ABN. There are three versions that we will use as part of this experiment. V1, V2, V3, they are specified here. Let's look at the criteria that will be used to judge these different versions. First of all, there is the reward criteria. The user engagement metric that we just defined is what we will use as the reward. And I would prefer reward to be high. There are also the SLOs, mean latency, tail latency and error rates are the metrics that I will use for defining the SLOs. When I define an SLO, I can specify the metric and I can also specify upper limits or lower limits associated with the SLOs. Interestingly, there is a section in the experiment called strength. So I can specify here as part of the section, I can specify things like how sure iterates needs to be before it declares a winner. So here I'm specifying that iterate needs to receive 100 requests for each version before it can confidently declare a winner. And not only that, iterate needs to be 99% sure that it has found the best version before it can declare it as the winner. So this is the way for me to ensure statistical rigor and robustness as far as the experiment. There's also the duration of the experiment itself. I've said that the experiment lasts for 100 iterations and there is a success, between successive iteration, there's a gap of five seconds. So there's approximately total of 500 seconds is the duration of the experiment. Okay, so let's go ahead, take a look at what the experiment has done so far. So first of all, experiment is just another resource. So we can first of all, watch the experiment resource. Okay, so experiment is proceeding so far, 13 iterations of the experiment since you're completed now 14 and it's running. Let's also look at the traffic split. Remember everything was going to be one at the beginning because the experiment hadn't started. Let's look at the state of the traffic split at the moment. Okay, so first of all, the rest of the world is still going to be one. That's how we wanted the, they should not be affected by the experiment. However, if you look at the Wakandan traffic, traffic is shifting from V1 to V3. It seems like somehow iterators determined that V3 is a good version to move the traffic to from V1. V2 is also getting a little bit of traffic, 3%, but it's just minimal traffic, residual traffic for exploration purposes. Most of the traffic, all the traffic was in V1, but now it's beginning to shift to V3 from V1. Okay, so let's also take a look at the SLOs and metrics in depth. So here I'm using a tool called iterate CPL. It's a command line utility that comes with iterate that makes it easy for me to take a look at SLOs and metrics for the different versions as the experiment is proceeding, so in real time. So here the tool is telling me that 21 iterations have been completed 22 now and the version recommended for promotion as V3. And why is that? Because first of all, V3 is satisfying all the objectives. All versions seem to be satisfying the objectives, but in addition, V3 is maximizing the user engagement metric. So my definition of the winner in an ABN experiment is among the versions that satisfy objectives, pick the one that maximizes my business reward and that's exactly what V3 is doing. And that's why V3 was picked as the winner by iterate during this experiment. Okay, so the experiment actually featured several of iterates powerful features, right? That includes declaratively being able to specify experiments, a progressive shift in traffic, SLOs, traffic segmentation and so on. There are several other features that we did not demonstrate here. You can try it out yourself. There are other types of experiments like conformance and canary experiments that you can perform. There is also other types of traffic patterns that you can use. You can use a user-specified traffic split instead of progressive traffic split. You can fix the split yourself. You can use dark launches. You can indicate, you can include SLIs, service level indicators as part of the experiment and iterate will enable you to observe SLIs also in addition to SLOs and metrics. You can use traffic mirroring. And interestingly, you can integrate iterate with Helm or Customize or just plain YAML JSON application manifest and use it for a version promotion, automate version promotion at the end of the experiment. So as I mentioned, all of these demos are available in iterate.tools. So please do check it out and try out these demos yourself. You can join the iterate Slack workspace and also head over to the iterate GitHub repo to get more answers to your questions or start other discussions. We look forward to seeing you there. Thank you.