 You guys hi, thanks for the patience. Let's start now yeah. So welcome to this talk about captain. Deep dive and an update on this CMCF project. We are an incubator project since one year. I am Hannah one of the maintainer. Next to me here in visible is Giovanni which is actually right now at home sick. So yeah sad. But hey, he is with us in the heart. So what can we do? Let's move on. What do we do today? We talk today about captain. I don't know if you are familiar what captain is. So I want to start from why captain, why should you be interested in captain? And you have been until now at the boot and checking out CMCF projects probably. And we have been there also multiple times with our project. And we like to listen to people and to see what people need, right? Especially DevOps and SRE. And that's out of this that we came out with a few tools and a few things that you can use as an SRE and as a DevOps to make your life easier. So here are three things why captain. First of all is based to support SREs. Second of all is actually a server plate of things. There's a lot of tools that you can take, pick and choose according to what is your use case and what is your need. And finally we don't reinvent the wheel. Takes the jokes with the helm actually. We don't reinvent the helm. We are based on Kubernetes primitives. So we are something that you should be used to and it should be very easy and very simple and immediate to adopt if you find it interesting. Today I want to go through three typical use cases that we have talked with people about and that you may be interested in. These are observability metrics and SLOs. For each of these use cases I will present you a part of the toolkit that you may use. I will go a little bit in the details with the dirty yamls and stuff but I try to make this as funny and lightweight as possible because let's face it, it's very late in the afternoon today. But please after the talk feel free to come and discuss all the nasty details. So start with observability. Why do we need observability? What have we seen in observability that may be useful for DevOps and SREs? I want to start and zoom in in a very typical scenario which is the typical black box kind of view of your deployment. You may have your repo where your developers are pushing their new, fresh new codes. You may have your tool of choice such as Argo or Jenkins or whatnot that is trying for your code to become something that is running in your cluster. And then you have the cluster, right? And what you would like is that if Argo for instance see a change in what is in the repo, then GitOps comes in. So you have a sync and your cluster becomes what the repo wishes it to be. But this is not always the reality and sometimes it's very difficult to understand what is happening in between all of these layers. Second problem, this is what typically nowadays an application look like. It is not a single application you have on the Kubernetes cluster. It's not the little deployment you put but it's maybe two or three deployments. It's maybe a backhand service and a front-end service and many other things. And you would like to have a visual about these things together, right? And then when one of these services changes, so you go for version 2.1 of front-end service from the 2.0, then you would like to have an idea of what is happening to the whole application. So if the front-end is failing, you want to be able to see, oh, this is due to the backhand and whatnot. So when you want for this to be observable, you would like to have this kind of connection to be there. This would make your life easier, I guess. So this problem, you were paying attention. How can we solve them, yeah? We can solve them with observability. Yeah, you were paying attention, right? Yeah, good, good. I haven't lost you yet. What does Captain bring you for observability and to solve these two problems? First thing is tracing of the lifecycle of the application, day one, let's say. So Captain, out of the box, you install it, you'd have to do nothing else. You will get information about the day one. So what happens right before, at deployment and right after the deployment. And if you have multiple workloads that are connected into a single application, you will see this together into a single trace. You don't have to go and look around everywhere, just a single one. And if there is an error or something, you will see it in the trace. And maybe your engineers won't go crazy trying to debug a front-end error that instead was a backhand one. And of course, since we have this sort of information, why not visualize them in a nice graph on a board? No, I'm joking. Of course, we also provide you Dora metrics. So information about what was the difference between this deployment you just have of the entire application and the one you had two weeks ago. Is this good? Should you do something about it? Is this as fast? Is this as slow as before? So we give you an overall visual. And we integrate with dash boardings. We use open telemetry and the intermediate. So these are things where you can bring your own visualization tool then to the mix. Magickana, right? You will be thinking, what is this magic about? I have here a little graph. I don't take ownership for this beautiful image, but this represents what Captain does in your cluster. This is an operator. It sits there in between your Argo or whatever tool of choice and your applications. And when you install it there, because it's on the cluster, it can act immediately, start the trace, and give you what I showed you before in the two previous images. So tracing day one and also nice Dora metrics. How do you configure Captain? What you have to do to get these things? This is simple. So the operator is installed, and you configured it with only two custom resources. The first custom resource is the one that tells it where it should send the telemetry data. Okay, so my open telemetry collector is here. Please send everything there, so that I can see it nicely. And the second one is about the deployment itself. Do you have to change anything about the deployment for Captain to work? If you were playing nice and following Kubernetes recommendation, no. The only thing we look at is name, part of, and version. And then we know what is part of your app, and we can collect the trace in a unique way for you. And this is the first use case. So do you want tracing, and do you want Dora metrics for day one deployment of your application? We can help you with that. Let's move towards the second use case, which is metrics. Here for metrics, the issue is sometimes we have so many tools that can help us collect metrics, right? And maybe they don't live in the cluster, and they are in their own silos, and each of them has their own collector. You should install in your cluster and configure and whatnot. And if you guys like open source and you like things to be flexible, maybe this is not the way you would like to go. So let's have a simple, typical scenario, right? Let's say you would like to use your metrics in your cluster to configure an horizontal pod autoscaler so that you can connect with the Kubernetes API, and you can tell to your deployment, hey, grow or shrink in size based on my super fancy metric. What you would like to do is maybe to be able to do this with any provider that your developers like. Maybe you have some team that use Prometheus, some teams use Dynatrace, some teams use Datadog, or maybe you want to migrate from Datadog to Dynatrace, so you want to have a period in which you want to use the both of them. Fortunately, this is not really a possibility. You are kind of locked in with one because in each cluster, you can only configure one exporter for your metrics. So this is not really easy to achieve as it is as things are now. We try to come find a solution for you. We try to give you something to achieve this via another operator. This is the Captain Metric operator. It's basically sitting there like a little broker where you can register multiple of your providers. So you can have both Dynatrace, you can have Prometheus, you can have Datadog, and you can use all of their metrics together for your HPA, for instance. Again, YAML files. What do you have to do for this to work? A little bit more complicated, but not too fancy. So first of all, we would need to register a provider. So for instance, let's say Prometheus. How would this look like? My provider will have a type, Prometheus, because I may have multiple Prometheus instances and I may want to use multiple of them. And where do I go do my query? So the URL, how do I connect to it? Easy. And then I can define my metric. So my metric, for instance, my CPU throttling will be coming from, look at the yellow thing, my provider. So we'll come from Prometheus. Here is the query, row query from Prometheus. I want this to be fetched every 10 seconds. I want the range to be of one minute. Easy peasy. I think this is self-explanatory. So this is how you get the CPU throttling. Remember the name. This is the only thing you will need done to use in your HPA later on for the horizontal pod autoscaler to work. So back to our scenario. Let's talk HPA just to be complete in my example. And let's see how it will look like. Let's say I want to have my potato hat application scaled up and down. Typical example use case. So I will have my horizontal pod autoscaler for my potato hat application. This is something a bit gigantic and redundant like here. Probably not very readable, but let's zoom in in three little details. What you will need to do is have a target reference. So my deployment is potato hat. I want it to grow from one to three when I have a certain condition. And then you define the object that will be used to decide about the scaling. And this object would be, see again the plutings, the captain metric we defined before. Nothing more, nothing less, and you put the value there. So if the throttling is 0.05, hey, please scale up. As simple as that. Cool graphic. This is horrible. This is supposed to say SLO, but yeah, well, you read it there anyhow. So yeah, this is the second use case. Again, you can have metrics for any provider at any time and you can use them in your cluster, for instance, for horizontal pod autoscaling. Final thing I would like to talk about is this. This is an old stuff. I'm pretty sure everybody here in the room knows about this. But since it's old, let me give me the thing, let me have the satisfaction to say a few words about it. So I guess you people most of the time have customers to which you promise stuff. This is SLA, the contract you have with them. It would be cool if you are able to actually measure every time you deploy something whether this thing is actually satisfying this contract. So you maybe have some sort of metrics you monitor for this and you may have a goal for this metrics that it should reach so that you're sure that your SLA is satisfied. How does captain tries to help you with this? Well, in our world, this translates to this little nice cloud of blue. So this is again the captain metric operator. And what we give you is the possibility to combine multiple metrics from multiple providers in your own objectives so that you can then use them to have your SLA, basically. So you will have analysis template resource, which is the one where you connect the provider, so for instance, Prometheus, to your analysis. You will have a definition, which is a list of objectives each of one is referring to a template. And then you will have the analysis, which is the snapshot of is this good, is this not good, is SLA passing or failing. And this is what the metric operator will do, calculate this for you. So moving back, what is the service level indicator in the captain CRD word? Service level indicator is again CRD. And this is what we call the analysis value template. In the template, you again refer to a provider. We have seen this before, the yellow name. This is the Prometheus provider from before. We have a query. One thing that is important here is to remember the name of the template. That's how you refer to it when you are listing your objective. And the other thing to do is to notice that in the query we have this little thing going on. This is maybe probably looking familiar like an Helm template thingy, code template. I'm sure you know how to use it. This gives you the possibility to declare one single value template for a certain query and being able to recycle it in a multiple analysis. So this is how you define the metric. Here, for instance, we have a sample value. And then we have the goal. For the goal, we have the analysis definition. Again, you need the name to refer to it in the analysis. But what the analysis definition is is a list of this objective. See? The objective referred to the template from before tells you fail if value is less than five. Give a warning if value is more than two. And of course, here you can have, you see the little dash. You can have a list of this objective to combine them. And you can give information about, oh, is this more important than others? So you can weigh them, say if it's a key objective and yada, yada, yada. And then at the end, you can compute what will be the pass criteria. So my score overall should be this 75% of the satisfaction of my objectives. So finally, the SLA is done in force. You calculate whether this is a pass or fail by using the analysis. The analysis is a passing time frame you want to compute this in. You can also pass a time range or a specific time. The arguments you want to fill in, so this arcs things, the template from before, that's how we filled it in at the analysis time. Anything can go here. And then the reference to the definition that I showed you two seconds ago. And in the status, so in the same CRD you applied in the status, you will see the object that contains the whole result of the analysis. So it's jquerable. It's easy to access. You can decide what to do with it after. But it will give you pass or fail, but also tell you each of the objectives if they pass or they fail, then why. And that's it. That was the third case, SLO, SLI. I have here a little summary slide for you. It's more or less a call of action from my site. Here you can find each of the three use cases with a link. If you prefer the QR code is the website. So you can have access to the documentation. We have getting started, so you can play with this in a few minutes. And, yeah, basically we are at the boot tomorrow at the entrance. You go on the left, one of the first boots you find. I'll be there. Many other maintainers will be there if you want to chat about this. If you're interested, if there's something that you'd like to add on here, contribute or just have a chat. And speaking of QR codes, if you have any feedback, this is the conference QR code. Feel free to scan it for me. And, yeah, that's it from my site. Do we have any questions? I have, they told me to go there to the microphone because I guess they are recording and I want you guys to be heard. There is this Karaoke set up for you. Hello. Hi. It works. Hey, one quick question on the SLO analysis. Do you, it seems you won't go one back to the template. One more back, sorry. The template. Template. Oh, this is so many animation. I was too exaggerated with this. It's an awesome slide, by the way. Really cool animation. But I'm wondering here, do you think about also reusing Captain Metrics for the query because you're defining Captain Metrics with a query with the use case of bringing Metrics back and then here you again have queries. Wouldn't it be another option to say I'm just referencing an existing Captain Metric? Yeah, so we were thinking about this and one of the reasoning was you would like your SLA to be something you can always recalculate, right? You want this to be a smart metric, something that is always the same, that you can come back and try again and it will be the same result. A metric in Captain is something that is constantly evolving because it's refreshing. So every time you will change and then the whole idea of the SLA kind of comes out. No, I really just thought about kind of at least we just like referencing a Captain Metric that you are reusing the query. I know the query needs to be executed, but I could at least... I don't have to specify queries in two different... Oh, that's a valid point. True, true, true. That could be a good improvement, I agree with you. Up to now we kept it simple and we reproduced the query. We have introduced only for the analysis, the template functionality that is not available in the metric, but I see that there is potential that these two things could become the same. Yeah, I agree. And then one other question, and again to the analysis, that means the only thing I need to do, I need to create an analysis object and then the Captain Metric server is automatically seeing it as a new analysis request. Yes. And then it gives me the result on the same CRD. You should be here and explaining it instead of me. Yes, exactly. Cool, awesome. Thank you. First of all, I love the animations. Thanks for the animations. They are a bit screwed up by the laptop, but I won't complain. So I see that there are a couple of performance gigas. I'm watching Paul Beilog, which is in front here. And I was thinking, if I run a load test, do I have an option to say I want to apply right now after the load test this specific analysis definition so then I can have the feedback on the score. So you could easily, if you combine the two operators I've showed you, because the Lifecycle operator allows you to run any kind of job right after the deployment and then you could have in this job that you apply the resource, for instance. So the analysis definition is running every x, evaluating continuously in the cluster. No, the analysis is a snapshot. You have to apply it once to have one result. The metric is continuously staying instead. So at the end, I can still use the analysis definitions in my case if I need to do that. Okay, cool. Thanks.