 Hello, my name is Alan Cha and I'm a software engineer at Ivy Research and today I'll be telling you about iterate and how it can benefit you as a permutase user So before anything we need to answer the question. What is iterate? Interate is an open-source Kubernetes experimentation platform It already makes it easy to ensure that your communities apps and animal models form well and maximize business value So what are some of the things that iterate can do? Well, here are some of the use cases Iterate can conduct performance tests and SLO validation for your ACP and GRPC services And for your performance tests, you can use custom metrics from external databases such as from Prometheus It can also perform AB and ABN testing and it can generate load for your tests and much much more Which will go into more detail later So how does iterate do all that? Well, iterate introduces the notion of an experiment which is a list of configurable tasks that are executed in a specific sequence For example, let's take a look at this diagram Perhaps you have multiple versions of an application and you would like to know which one is best You can create an iterate experiment composed of three tasks Collect metrics, assess versions, promote linear to help you figure that out When you run this experiment, it will collect metrics, perhaps from Prometheus And using those metrics, it will assess the different versions by comparing them to your SLOs And lastly, it will recommend you the best version This is a very simple experiment, so keep in mind that iterate can do a lot more For example, we also have the concept of a single loop and a multi-loop experiment A single loop experiment will only run the tasks once, whereas in a multi-loop experiment, the experiment will run the tasks periodically Perhaps you want to run your experiment over a long period of time Maybe you want to see how one version of your application handles live traffic Where the amount of traffic depends on the time of day or even the course of week This would be one reason why you would want to run a multi-loop experiment Recall that an experiment is composed of a number of tasks So let's take a look at what kind of tasks are currently available We have the ready task, which checks if your deployments or services are Whichever crewmate these objects you are using are ready We have the GRPC and HTTP task, which generate requests for your GRPC and HTTP services And also collect the latency in your early metrics We also have the assess task, which allows you to define SLOs and checks if your app versions satisfy them We have the slack task, which allows you to send a message to a slack channel We have the getup task, which allows you to trigger a getup workflow And lastly, we have the custom metrics task, which fetches metrics from a database or from a REST API And I think this is a task that will be most interesting to all of you who are attending Prometheus Day Because many of you already have your own way of collecting metrics And are using Prometheus to store said metrics And the custom metrics task will allow iterate to plug into all that pre-existing data To help you ensure that your apps and MMO models are performing well We can use all of these tasks in order to compose and experiment the best fits your use case For example, you can create an experiment where you first ensure all of your resources are running using the ready task Then, using the custom metrics task, you can fetch metrics from your database And after assessing your different versions, using the assess task, you can trigger a getup workflow Which will start some process for your getops project based on the results of the experiment In this manner, iterate experiments are declarable, easy to use, and don't require any additional programming You just need to list the tasks you want to use and provide whatever additional arguments are needed for those tasks So, let's try out iterate First, we can do a basic experiment where we do load testing on Kubernetes ECB service and validate some SLOs To summarize what we will do, we will have some ECB service running We will check if the app is ready, then generate HTTP requests and collect latency and error-related metrics And then finally, we will validate those metrics against SLOs So, to start, I will need to create some sample ECB service In this case, I will create the ACDP bin service Then, all I need to do is launch the experiment This experiment has three tasks, ready, ACDP, and assess The ready task will ensure that the ACDP bin and service are ready within 60 seconds The ACDP task will generate load to the app, in this case via the link, and collect latency and error-related metrics The assess task will ensure that the mean latency, less than 50 milliseconds, and the error count is zero And lastly, setting the runner to job means that this is a single loop experiment, where the task will run just once I will also add a note download flag, and this will ensure that I do not need to reinstall iterate resources After waiting for a bit, we can get the report As you can see, the experiment has completed There are three tasks, and we completed all of them The app has met all of the SLOs, and we can also see additional metrics that were collected as part of the experiment We can also produce a nice HTML version of this report And this is just the same data Now let's try an iterate experiment, where we use the custom metrics task to query metrics from Prometheus This tutorial describes how you can utilize metrics from Istio, specifically metrics that Istio generates and stores in Prometheus via its Anon We can see here that in this experiment, instead of generating load and collecting latency and error-related metrics, we are getting metrics directly from the Prometheus database This will also be a multi-loop experiment, where the task will run periodically To set up the experiment, I will need to create another sample ECB service, as well as generate some load, so Istio can store capturing metrics Now we will generate some load Okay, so all that's left is to launch the experiment So, in this experiment, we have two tasks, custom metrics and assess The custom metrics task also requires you to define the metrics that you would like to capture, as well as how to do so This is done by providing a link to a configuration file We provided a template for getting Istio metrics from Prometheus, and we plan on providing more templates in the future, but it is simple to create your own configuration file It is simply a list of HTTP requests The assess task will ensure that the error rate is zero and the mean latency is less than 100 milliseconds The runner is set to cron job, which means that this will be a multi-loop experiment And in addition to that, we need to provide a cron job schedule This means the experiment will run the task periodically every minute And as I mentioned before, I will add a node download flag We can see the experiment report here And as you can see, we had two tasks, and they were completed, and we were able to meet the SLOs So, to summarize, we saw how easy it is to get started with performance assessing and SLO validation with iterate We learned about tasks and how they can be composed in order to create an experiment And we got to see how iterate can also utilize metrics from an external database, such as from Prometheus We have some fun stuff in the works, such as an APN SDK As well as the way to automatically launch iterate experiments after detecting version changes I hope you enjoyed this talk, and I hope you enjoyed the rest of Prometheus Day Thank you! This is a pre-professional certification designed for an engineer or application developer With a special interest in observability and monitoring The main areas the exam covers are the following Observability concepts from Prometheus fundamentals, from QL, instrumentation and exporters Alluring and dashboarding Why should I prepare and take this exam? Well, there are good reasons for that I think it strengthens the foundation And it motivates to delve deeper into learning more about the cloud-native monitoring ecosystem Besides getting a level of credibility under the belt So, this is where it all started The opening remark skin-out announcements Call on Europe 2022 Following my interest registration I was invited to take Online Proctor Multiple Choice Prometheus Certified Associated Data Exam in July Now, figuring out where to start could be daunting Along a proper curriculum Eliminates most of the trepidation that comes along with learning a new skill As it helps to not miss any of the core concepts As I prepared for the exam I created a publicly accessible repository on GitHub Where I have linked curator resources to each of the exam's topics Mainly from the official thought-outs Documentation of Prometheus itself In addition to some excerpts of the Prometheus upper-running book by Prey and Brazil Before starting with Prometheus It's worth getting more acquainted with the observability concepts and terminology Such as metrics Service discovery And the service level objective to name a few Prometheus, beyond doubts, is a major pillar of the cloud-native stacks This is certainly not Prometheus in two minutes But it's more of an overview of the Prometheus ecosystem What capabilities it has and what to use each for The Prometheus server components are their triple components That scrapes metrics from endpoints offered by the discovered targets The time-serious database where the metrics data gets stored And the Prometheus HTTP server that offers a rest endpoint Along other systems to query metrics data Regarding instrumentation, Prometheus scrapes metrics from instrumented jobs Either directly or via an intermediary push-get-way for short-lived jobs To collect application-specific metrics A client library needs to be added to an instrument The code in Paris programming languages To instrument a third-party system An exporter can be installed on the host server Or next to an application Another aspect is alerting Prometheus alert manager is capable of routing alerts to different channels Such as email or Slack For dev-visualization, Grafana can be used Additionally, Prometheus remote-write which is basically sending data to other systems Like a long-term storage And this is where Thanos, for instance, comes into the picture If you haven't been working with Prometheus for some time now I would recommend taking any getting started course Or you can also watch Prometheus introduction at KubeCon Europe 2020 by Julius Polz The co-creator of Prometheus As for PromiQL, it amounts to more than one-fourth of the exam questions Hence the increased importance of knowing how to use it properly Also beyond the exam Here's an example of a query to get the error ratio of an HTTP server for the last 10 minutes I shout out to Promlabs for the PromiQL cheat sheet Needless to say This is not to be memorized, but it does help to get more familiar with the query structure And last but not least, alerting and dashboarding Open source tools like Grafana bring life to data through intuitive visualizations Of different metrics being monitored Grafana makes it feasible to see the current state or collected metrics as a graphical representation And to create custom dashboards Monitoring is often paired with alerting tools that notify engineers when metrics have exceeded or fallen below certain thresholds Specified in accordance with the service level objectives A tool like Prometheus alert manager can be used for this purpose And also it gives ways to filter alerts before sending notifications So what happens after taking the exam? After taking the exam The results of the exam will be emailed to the exam taker within 24 hours after taking the exam And if passed, a certification and a badge will be issued How cool is that? I received the results about 5 weeks after taking the exam as to as a beta one That is no longer the case as the exam was made generally available last month I hope you find that information and resources useful And I wish you all a great and fruitful keep going Thank you