 Now we have Matthias and Sergius to talk about Kubernetes matrix API. All right, thanks. Okay, cool. Hi, I'm Sergius. By the way, can you hear me? Works? Okay, cool, great. I'm Sergius. I'm a software engineer at Red Hat. Traditionally, I come from the Kubernetes ecosystem. All things around that topic. And now I'm working on all things Prometheus inside Kubernetes at Red Hat. Together with Matthias. So, hi, I'm Matthias. I'm a software engineer at Red Hat as well. Do all kinds of, like, go Kubernetes Prometheus as well. And I also organized the Prometheus Berlin meetup. So if you ever happen to be in Berlin, we might as well have a Prometheus meetup. So this is the agenda. While you're reading this, please give a quick hands of who uses Kubernetes. All right, who uses Prometheus? All right, same amount. So you all should be users of basically what we are. We use the Prometheus adapter. Awesome. Okay, so maybe you don't know yet. So yeah, this is the agenda. Quickly going through the history. We don't have much time, so let's get going. So the history of Kubernetes metrics was kind of rough in the beginning. It was simply hard coded into Kubernetes. It started with CPU and memory metrics for each port and node. And this was basically the architecture. So we had Heapster in the middle, and it would go to the Kubelet and see Advisor and get the metrics from there and then put it in some storage backend, which wasn't really, like, decoupled from Heapster. So Heapster had, like, a lot of, like, I don't know, if it's, like, feature creep in a way. So yeah, these are the problems. And it was a push-based model. As I said, it had, like, the vendor dumps, opinionated tooling abstraction, and what about, like, Prometheus, right? So we want to, like, use Prometheus. So where is it? So these were the goals for refactoring this, trying to decouple it and introduce an abstract API schema so that we can actually use Prometheus or maybe something else even if we want to. So meet the metrics APIs inside Kubernetes. We have the resource metrics, which are CPU and memory, so what you're already familiar with and what actually was in Kubernetes right from the beginning. And then we have now custom metrics, which are metrics that you can define to be bound to a Kubernetes object. So for example, to a pod or a deployment to a job, something like that. And then we have external metrics. External metrics are simply something coming from your cloud provider, for example, which are not really tied to anything Prometheus-specific, and we can still leverage those inside of Kubernetes. So this is the architecture. We have the API server at the very top. CubeCity, HPA, which is the horizontal pod autoscaler, and the scheduler will talk to that API. And in our case, we have the Kubernetes Prometheus adapter running, and this will actually, which Sagesh will explain, transform the API calls coming to Kubernetes into something that Prometheus understands so that we can use both together. So there are kind of these example implementations for the metrics APIs, something that is often shipped with Kubernetes, so to say by default, is the metric server, and this has only the resource like CPU and memory metrics implemented, and we are going to take a look at deeper into the Prometheus adapter. As you can see, external metrics are work in progress, but it also has the custom metrics already. So going into detail now. Okay, cool. So resource metrics. As you learned, we have three metrics, hopefully three orthogonal metrics APIs inside Kubernetes, and I just might take a step back here to this slide because I really like that one. The important message of this slide is that Prometheus adapter is not the only provider of custom metrics. You can actually mix and match those implementations. You can use the Prometheus adapter to host your custom metrics, whereas still taking your old metrics server to host resource metrics. Or you could use the Prometheus adapter to host resource metrics inside your cluster, but a totally different adapter hosting custom metrics for your provider at will. And two implementations that we also enlisted here are for Azure and Stackdriver. I'm not sure if there exists one for AWS. We simply didn't find it on Google. That's why it's not in the slide probably. So it's important for you. You can mix and match those things per API group, namely per metrics state, SIO, custom metrics, and Excel metrics. You are not married to the Prometheus adapter at all. So resource metrics, the first API of those three. How do we actually first find out if our cluster is able to support that? There is a API services object inside a Kubernetes, which you can query. And if you ask a Kupka to get API services, metrics, state, SIO, it will tell you if the thing is available or not. And the thing becomes available if you deploy the Prometheus adapter and register it as a so-called API service. Resource metrics have two well-defined metrics. As you learned, one is all about nodes and the other is all about pods. Nodes, metrics are obviously not namespaced. Nodes are ubiquitous across the whole cluster. And pods obviously are deployed on a namespace basis. So those two first class resource objects are available inside Kubernetes. And you can query them the same way you query any other object inside Kubernetes, as you would say, I don't know. Kupka will get deployments. Kupka will get pods or whatnot. You can do Kupka will get pods.metrics.kfsio and it will list all available resource metrics inside your cluster. In this example, we simply enlisted the Grafana resource metrics and you will get an output like this, which should be very familiar for any other Kubernetes object. So now the question is, you know, what you see here is we query resource metrics for Grafana. We get the output, we have two containers. One is named Grafana. The other one is unnamed. One is occupying 10 millis of CPU and some memory. How do these metrics get into there by the help of the Prometheus adapter? Obviously, oh, sorry, before we go to the other slide, this is probably the command that some of you know, the kupka top part under the hood, it does exactly the same what I just did manually with kupka to get metrics for the pods. The configuration for the Prometheus adapter is pretty straightforward. It's a simple little config map inside Kubernetes and it follows a pretty simple structure at least for the resource metrics. And as you can imagine, when you do a kupka top part at the end of the day, there must be some Prometheus query being executed. And that's all what the config map is about. So when you specify this configuration, you can specify rules for retrieving resource metrics. So one is for querying CPU metrics per container and that's essentially like a very simple rate-based query against Prometheus. And the other one specifies the query for executing or for retrieving metrics regarding nodes. As you can see, all of this is totally configurable. When you deploy the Prometheus adapter that is shipped inside the repository, we come with the predefined query for you, but you can all tweak this by yourself and we have a pretty simple go template-based system where all the parameters to instrument that query will be dependency injected for you. So the query is being executed. Some metrics are being retrieved from Prometheus. So how does the connection now go on when you get a metric series back with values and the actual object inside the Kubernetes world? So the other section inside that configuration is called resources. And that's because we want to associate some label names that come back from Prometheus with the Kubernetes resource objects that we are querying against. In the case of resource metrics, these are either nodes or parts. So what you will see here is like a simple map-based representation. This is what I called a roverite configuration. When a label name called node is present inside the metric, we will map it one-to-one to the Kubernetes resource node. And this resource name node comes from the verbs that you are used to when you execute kubectl commands. So we could equally write in there nodes, the polarization form. The Prometheus adapter behind the scenes will normalize this. So for the node and namespace name labels, these map one-in-one with the resource names from Kubernetes. One exception, and I know there is some dangling pull requests out there, at least some effort there to normalize this. And CAdvisor unfortunately exposes metrics regarding parts with the part underscore name label. So we have to map this to the resource type part manually. All of this is also configurable. So if you have some more other label names which sort of do not have a perfect impedance match with the Kubernetes world, you can configure this here via overrides. Another thing, the metrics of obviously you saw we have per container metrics. So you also have to provide the mapping what the label is all about, which specifies container metrics. And that's the last part for resource metrics configuration. I specified this query here as you remember. Container CPU uses seconds total. So one little hint for you to sort of like find out or discover yourself how these metrics look like in the Prometheus server. There is one nifty API call, API v1 series which you can give it a selector and it will give you back like metadata about the metrics series. And question who didn't know this API call before? OK, that's cool. Just remind this yourself this is very nice for debugging and finding out about like the structure of the metrics that are present inside the Prometheus. For instance, you will get back all the label names that come back here. So as you may remember from the previous slide we have the label name, container name which we will associate with the container account. We have the label name, name space pod and pod name which we associated with the corresponding resource names from Kubernetes using the overrides configuration. Same goes for memory. Same principle applies here. As you learned we have CPU and memory metrics for resource metrics. Same stuff, no magic. Instead of having some CPU metric here we are invoking a sum of container memory working set bytes from CAdvisor. And one little bit you have to tell the Prometheus adapter of the window because that's what the Kubernetes also API expects. The window you took the measurements from and these must correspond to the window that you specified in the rate function. So take care that you match those two guys. Custom metrics. Resource metrics has been quite easy but custom metrics are a little bit of a different beast. Resource metrics are predefined in the Kubernetes API. Custom metrics is something that you totally define yourself. So you can invent totally custom metrics and based upon these metrics you can instrument the horizontal pod outer scaler for instance to scale up your parts. So the process here unfortunately is not that easy. Custom metrics API is always as Matthias mentioned bound to a very concrete Kubernetes type or Kubernetes resource, pod services, jobs ingress objects, whatever is present inside inside your cluster. Same thing as above. You find out if custom metrics are enabled by calling get API services view on beta one custom metrics KSIO and see if it's available. Unfortunately since the metrics that come from the custom metrics world are not first class Kubernetes resource types it's not that easy to query them as like kubectl get part stop metrics. Fortunately there is a raw command which lets you explore the Kubernetes API internally and if you do kubectl get raw slash API is custom metrics KSIO it will list you like all the available custom metrics inside your cluster which are out to discovered. In this case what I prepared here for you canonical example, requests per second metric that I configured inside my cluster and you will get a nice representation of this one. So exactly same question as above how does this metric get there? And the Prometheus adapter has like a four step process to accomplish this. First one is it has a method of discovering the metrics inside your Prometheus all configurable and you will learn how to accomplish that. Then second step as before once we found out about our metrics we have to associate them with the Kubernetes resource types then we need to provide some naming of those metrics inside our cluster. These names may vary for instance if you apply the rate function the name HTTP requests per second total doesn't make any sense because the rate function is a per second function or the result of that is a per second value. So you may want to apply the per second suffix to that metric name. So you want to configure that. And finally as before you want some way of configuring the actual query of those custom metrics against Prometheus. Again config mac as before instead of resource you specify the section rules and in order to configure the metrics what you set up here is a so called series query. So in this series query we will be executed against the very same API endpoint of Prometheus that I showed you before and I recommend you to because that's what we've been struggling with right what is the selector what metrics are actually being returned when I configure this beast here when you take this HTTP requests total namespace is not empty and stitch it into this series API discovery endpoint of Prometheus you will get a nice list of metrics that will be fetched from the Prometheus adapter. In this case HTTP requests total and in the selector we specified namespace to be not empty in the hope that these metrics are somehow related to Kubernetes right and it depends upon you to specify those selectors as explicit as possible in order not to have false positive and fetch metrics which don't have to do anything with Kubernetes at all. Second point association and again we have the resources section in the previous example you learned about the so called overrides configuration the second method of configuring this association between label names and resource names is a very simple template so what we did here is we set dot resource and what we simply assume is that the label name equals one in one with the Kubernetes resource name that is present inside your cluster so you have both possibilities to configure those resource associations and you can mix and match both inside this configuration so again example as before and here you will actually see a nice false positive the namespace label name will be associated with namespace objects part with parts, service with service and the job actually is false positive here therefore we very often specify those overrides directly to do these associations because it's not always obvious if all those label names really refer to Kubernetes objects second name naming third thing naming so once you have your metric HTTP request total you probably want to apply a rate function to it so we must change the name that is exposed inside Kubernetes and here we have a very simple regex-based approach and in this case we just simply here group here the prefix underscore total take the group one and append per second so you have quite some flexibility here to do this conversion of names so HTTP request total becomes HTTP request total second final thing once you have everything in place you found out about your metrics you associated them with Kubernetes resource objects and you converted your names you actually want to specify the actual query so in this case we simply have a rate-based theory which we execute against it with again some go-templating mechanism in order to get a very concrete metrics value of custom metrics we don't have such a nice way as with resource metrics you can execute this kubectl get-raw command with the endpoint custom metrics state-sio for instance in this case namespaces some namespace and then some part and then slash and the metric name that you specified in the Kubernetes world and if everything works out nicely you have a value that finally comes from Prometheus and just as a proof you know we have here my test setup I was simply executing five requests per second again it's the pot-info part and when you take this theory that we configured in the very last step of the configuration it should be at the very same value so I recommend if you do configure the Prometheus adapter do these kinds of sanity checks to see if the query is executed correctly if the metrics are being discovered correctly so this is a nice way to check this out future as you see this is quite an elaborate configuration setup and we spent some time we work at Red Hat and also to configure it correctly for our own purposes actually there was the motivation for this talk because we were sometimes confused by our own tools and we actually want to improve things on the Prometheus adapter side so first of all there is a very nice pull request by the community out there to add external metric support currently it is hosted under the username of Soliras who has been initially working on the tool we would like to move it to the Kubernetes organization obviously hopefully a year from now I won't need to spend so much time on explaining the configuration of the Prometheus adapter anymore we want to tackle this quite complex configuration and obviously since this is a pretty abstract discovery mechanism we recognize some scalability issues with the current approach so we would like to tackle the current issues with that one so we do plan a pretty beefy refactoring of the adapter itself in the future nevertheless for resource metrics we do feel that it's quite stable to be used as of today for custom metrics hopefully you learned how to do the mapping yourself so you can actually use it as of today too maybe the recommendation would be write rigid selectors to reduce the amount of metrics exposed inside the Kubernetes cluster finally HPA configuration once and that we will just spend one slide on it because we don't have a lot of time once you have your metrics ready and exposed you obviously can use them as any other metric inside the horizontal pod autoscaler configuration and in this case here I simply scaled this pod info part to based on the target average value of one request per second per part so that's about it yeah, you want to know? so if you want to get into running the Prometheus adapter and use all of this what we just explained the easiest is probably to use QPrometheus which is inside the Prometheus operator repository we ship Prometheus adapter preconfigured for resource metrics you still have to do some manual work due to the limitations we were just talking about for custom metrics and you can also check out the custom metric examples in the Prometheus adapter repository so for getting started just go and QPretel apply the manifests in QPrometheus oh and for the custom metrics there is a deploy sub directory inside the Kubernetes Prometheus adapter upstream repository itself so if you want to play around with that but again you may or may vary you may also want to mix a different adapter implementations depending on your environment so yeah how to get involved because this is a custom if you want to get involved we have ZIG instrumentation and ZIG auto scaling which are the special interest groups in Kubernetes and especially the Prometheus adapter is part of ZIG instrumentation and we have like both B-weekly I'm not sure auto scaling is like 3-weekly meeting sometimes and there is also a mailing list but you can also go into Kubernetes and ask questions about these topics so yeah I think that's it if you have any questions feel free and we are going to stick around afterwards as well questions cool as you head out if you find any trash please put it in the bins there oh we have a question hi thanks but my question is not related to this because you are I'm interested in the GoPath are you using this on production GoPath you are mentioning one of the creator of GoPath yeah I'm using it on a daily basis and I know like I don't know like 20 people that use it too so but we can discuss it afterwards okay does it have a matrix endpoint not yet okay no questions thanks folks