 All right, hello. My name is Alan Chang. I am a software engineer at IBM Research. And today, I'd like to talk to you about iterate and open source project that I've been working on. So what is the core problem? When you're developing an application or an ML model and you want to release a new version, how can you ensure that the new version is performant, that it works as intended, it will not affect users negatively? How can you ensure that it is truly better than the original version? Our solution is iterate, a metric-driven release optimizer. It's a tool that can automate SLO validation, as well as AB and AB in testing. Iterate makes it easy for you to ensure that new versions of your apps and ML models perform well and maximize business value. Iterate makes it easy, oh, sorry. With Iterate, you can perform experiments. It can do a variety of tasks, such as let me see if I can bring my mouse over. All right, just forget about my mouse. Collecting metrics, validating those metrics against service-level objectives, SLOs, and also recommend you the best version among many other tasks. And it's made for DevSecOps, MLOps, SRE, and data science teams. And it's also easily extensible to work with your apps and ML models and the tools that you use and whatever additional functionality you may need. This open-source project is started at IBM Research, but it's being developed in the open. And we're working with different communities, like Knative, K-Serve, Selden, Litmus Chaos. And that's what's allowed Iterate to become what it is today. So I hope after this talk, you'll think that our project is interesting, and you'll check it out. All right, so here are some of the things you can do with Iterate. You can do load testing with SLOs. So Iterate has built-in mechanisms for generating load and collecting latency and error-related metrics. And later on in the talk, I'll give a demo of how you can do this with HTTP and GRPC services. Iterate can also utilize custom metrics. So you can easily define what metrics to collect and how to collect them. And then you can use those metrics for your experiments. I'll be giving a demo of how you can do this with metrics from Istio. You can also do AB and AB in testing. This is actually a feature that we're in the middle of porting from a previous version of Iterate. I don't want to go into too much detail, but basically in the previous version, we're using a CRD model, whereas now we're using a local namespace scope model. And the difference between these two models is that with the current model, the local namespace scope model, we don't require you to install any custom resources. So you don't need to be an admin of the Kubernetes cluster. Anyone can just use Iterate out of the box. But AB and testing is still being supported for the CRD model. And we're working hard on porting it to the current model. You can also experiment locally as well as within the Kubernetes cluster. We have a GitHub action, so you can include it in your CI CD pipeline. And as I mentioned before, it's easily extensible. I mentioned a few ways that you can extend this for your application. For example, you can define custom metrics. You can define custom tasks that will run in your experiment. And you can also configure Iterate to work in your CI CD story. So to give you some more context, let me see if you can get the mouse again. Where's the mouse? Oh, there it is. OK. So an Iterate experiment is basically just a set of tasks. So the two most prominent tasks are collecting metrics and assessing versions. But there are other tasks being developed. For example, we have a readiness check task that will ensure that your services are running before continuing with the experiment. And under the cover, Iterate experiments are just home charts. So this makes them simple, declarative, and reusable. And we have already created a few different kinds of experiment charts that you can use right away. But of course, you can easily create your own. All right. So I have a few demos for you today, four in total. So the first two have to do with low testing for ACP services. So I'll be showing you how you can do this both locally, as well as in Kubernetes. Then I'll show you how you can perform low testing for GRPC services. And lastly, I'll show you how you can use Iterate with traffic mirroring and custom metrics. And during these demos, you'll be introduced to the concept of experiments, as well as the Iterate CLI, which will help you run these experiments and visualize results. So we will go to our website, Iterate.tools. So let me click on this. All right. And it kind of breaks things. All right. One second. Oh, god. One second. Let me maximize this. All right. So it's kind of awkward, because the mic is in the center of my chest, so I can't turn my head. All right. So let's go to tutorials. And we'll take a look at the first one, benchmarking and validating HTTP services. So you'd be interested in this demo if you wanted to see how well your HTTP service is performing. And to do this, we'll be using the low test HTTP experiment chart. And we have a cool diagram that will show you basically what will happen during an experiment. So as I mentioned before, Iterate has low generation capabilities. So Iterate will be making requests to your HTTP service. And ideally, your HTTP service will be responding accordingly. And as I mentioned before, Iterate also has built-in metrics. So it'll be utilizing these requests and responses in order to create metrics, latency, and error-related metrics. And lastly, it'll validate those metrics against any SLOs that you have defined. So I already have a sample service running in the background. Let me scroll down a little bit. So I have the sample service running in the background already. And so to launch experiment, it's very simple. You just use this one command. And I will explain what's going on in here. All right. Yep. All right, so you can see on the top over here, right? We're doing Iterate launch, minus C, low-test HTTP. So this is the experiment chart. We want to use the low-test HTTP experiment chart. And this experiment chart has one required parameter. This is the URL. And this is the endpoint to your sample service, right? And in addition to that, we also have some SLOs to find. So we have the error rate. We want the error rate to be 0. We also have mean latency. We want the mean latency to be less than 50 milliseconds. And we also have some other SLOs. We want to make sure that the latency at these percentiles are at 100 and 200 milliseconds. All right. And we have started it. And we can see it ran two tasks. It ran the gen load and collect metrics, HTTP tasks, as well as the SS add versions. Now what we can do is we can look at the experiment report. So if I do Iterate report, we get a nice summary of what occurred during the experiment. So as you can see here, the experiment has completed. There were no failures. We ran two tasks and they both completed. We can see all our SLOs were satisfied. And we can also see a list of additional metrics that were collected as part of the experiment. Note how during this demo, we didn't need to set up a database or anything like that. We just use all of iterates built in mechanisms to do this. All right, so let me try to figure out where my mouse is. All right, now let's try running the same experiment but in Kubernetes. You can see here we have the ACP service as well as Iterate running within your Kubernetes cluster. And you might be asking why you would want to do this. Well, there's a few different, there's a few good reasons why you want to do this. Firstly, your ACP service may not be exposed externally. In that case, running Iterate within your Kubernetes cluster may be your only option. Secondly, there's many different ways that you can start up a service within Kubernetes. You can use a deployment, a safe set, but you can also use custom resources. For example, you might have a Knative serverless service or you can have a KServe or Selden machine learning service. And Iterate can work with all of these. This is a part of reason why Iterate is so powerful and flexible. So let us give it a try. So I already have everything set up in the background. So I have a mini cube running. Wait, let me first clear this. So we start from a clean slate. You can see, yep, I have my ACP bin service running. And so if I want to repeat the experiment, I need to make a few changes first. So the first change is I need to be doing Iterate K launch instead of Iterate launch. This will make sure that Iterate is running within the Kubernetes cluster. And then the second thing is I need to change the URL so that now it points to the service running within mini cube. So I'll be using the in cluster endpoint for this. So let me just change this, ACP bin default. And I think that's it. So let's give it a try. And it's launched. And I guess I can show you this. You can see that there's this job that started that's basically Iterate running. But what you can do is you use Iterate assert, completed to see if the experiment has completed. And it has completed. So if we look at the Iterate report, it's as expected. Everything has completed. And our SLOs were satisfied. And we also get some cool metrics. All right. So let's take a look at our next demo, which is on benchmarking and validating GRPC services. So this will look pretty familiar to you guys. It's pretty similar to the previous two demos. This would be useful if you want to see if your GRPC service is holding up well. So to perform this experiment, we'll be using a slightly different experiment chart. This is the load test GRPC experiment chart. And I already have set up a sample service running in the background. And so all we need to do, again, is run one simple command, this one over here. And let me clear this. All right. And you can see here, we're going to launch the load test GRPC experiment chart. And this experiment chart has a few different parameters. It has this host, call, and protogurl. These are all GRPC-specific parameters that are associated with this experiment chart. And we also have some SLOs. And you can see this experiment actually has three different tasks. It has this task called run, gen load, and collect metrics GRPC, as well as SSAP versions. Now we can do iterate report. And we can see here the experiment has completed. And unfortunately, we did not satisfy all SLOs. And this is probably because I have a lot of stuff running on my computer to support all these devos. So unfortunately, the only SLO that we satisfied is error rate, but not the latency SLOs. And we also have some additional metrics. All right. So let's go to our last tutorial, which is about traffic mirroring and custom metrics. And before I do that, I actually need to do something here. I need to change the context of my Kubernetes. So you won't be seeing any changes. OK. All right, so let me move my mouse. All right, traffic mirroring. If you guys are not familiar with traffic mirroring, it's this idea that you can duplicate traffic for another service. So let's say you have a version 1 service and a version 2 service. And you want to know if the version 2 service is ready for release. Well, you can set up a traffic mirroring with Istio so that the version 2 service gets a copy of all the traffic sent to the version 1 service. And then you can monitor the second service. And what this allows you to do is it allows you to see how the version 2 service is running with real traffic without affecting end users. This allows you to test the VT service in a safe environment with you're testing it in the environment that it is intended for, with minimal risk. And that's exactly what we'll be doing in this demo. In addition to that, we'll also be demonstrating custom metrics. So as all this is running, Istio is also generating metrics about how these services are performing. And it's storing that into a Prometheus database. So instead of relying on the build-in metrics that I described earlier, this experiment would actually query the Prometheus database for those metrics from Istio. And it'll be using those for the experiment. And to do all that, we'll be using this new experiment called SLO Validation Istio. And this is also a bit different from the other experiment charts that I showed earlier in that it actually runs a task on a schedule. And the logic behind this is when you're dealing with traffic mirroring, you're working with real traffic and real traffic is unpredictable and it may not be consistent or readily available. So this experiment should be run over a period of time where you can see how the version 2 service is performing. All right, so let me see. Yeah, so let me first clear this. And cube CTL get all. You can see I changed my context. And now we have a version 1 service and a version 2 service. And I set up traffic mirroring as I described earlier. And so I think we're ready to run the experiment. So if I take this and put it here, and I actually need to add one more thing here, no download, and then I run it. OK, so what's going on here is we're launching the SLO validation SEO experiment chart. And this experiment chart has different parameters as previous ones. We have a cron job schedule, which determines how often the tasks will be run. And we also have a provider URL. This is the URL to the permissive database that we'll be querying for metrics. We also have the destination workload and destination workload namespace, which allows you to select the service with the traffic mirroring. And we also have some SLOs to find. All right, so if we do iterate k report, we can see we have, it looks like everything worked well with the experiment completed. And we had no task failures. And we satisfied the SLOs. And we got all these cool metrics. Actually, there's one thing that I forgot to show you guys. So I'll try it right now. You can generate a cool, like, HTML version of this report. And let's see if it works. So report.acml, all right. Open report.acml, all right. It opened up on my other screen. All right, that's great. One second. All right, let me try to show what it looks like. So you get this cool, cool version. And unfortunately, this experiment doesn't have any nice histograms. But you get the picture. All right, so let me go back to my slides. And let me full screen this. Enter full screen. All right. All right, to recap, I showed four demos. I showed a good variety of experiments. So you can see how simple and complex these iterate experiments can be. I showed you how you can use iterates built in metrics in order to load test HTTP and gRPC services quickly and easily. We did not need to set up a database or anything like that. And I also showed you how you can run iterate both locally as well as in Kubernetes. And the latter is important for some reasons. For example, your service may not be exposed. And iterate can utilize a whole range of Kubernetes resources. And lastly, I showed you how you can use iterate with traffic marine and custom metrics, allowing us to use metrics from Istio. And here's the roadmap. Here's what we have planned in the future. I mentioned this before, but porting AB and ABN capabilities, we also want to have notifications. So we want the user to get notifications about when an experiment has completed and also any data that they might be interested in. And we could create a Slack notification or some kind of other notification that will trigger a CI CD process. We also want to have more custom metrics for other service meshes and databases. So today, I showed you custom metrics for Istio where we were able to collect metrics from Istio and Prometheus. But I've also been working on templates for K-Serve, OpenShift, LinkerD, and there's many more service meshes and databases that we can support. We also have this AutoX service that's in the works. And the idea is we want to automatically launch experiments after detecting changes in new versions and services. And maybe we can also create traffic marine templates. So in today's demo, I showed you how you can use iterate with traffic marine already set up. But we can also create templates to help you set up the traffic marine. But before we do so, we want to hear what you think about iterate and if this is something you'd like to see. So feel free to scan this QR code. It'll bring you to our GitHub page. You can take a look, join our Slack. Maybe give us a star if you feel like it and join the conversation. So I hope you enjoyed my talk. And are there any questions already? So at IBM Research, we're always working on cool ideas. And we think that AB and AB in testing and SL validation, it should be core to any kind of project. So that's kind of it. I'm not really sure what else to say. Yep. Any other questions? So the question is, where does this being used? So again, I mentioned that we've worked with K-native, K-Serve, LITBUS Chaos communities. And some of those guys are using iterates. And yeah. I think we have something on our GitHub that shows some of our adopters. And so. Any other questions? All right, well, that concludes my talk. Thank you for coming.