 Hello, next up is Michael who is going to talk about monitoring Kubernetes using Kubernetes. Hello everyone. As Karl already said, I want to show you now one possible way to monitor Kubernetes with Prometheus and OMB Labs Edition. You probably heard of already. You get to know what OMB Labs Edition is in a couple of minutes during the talk. First, let me introduce me. I'm Michael. I'm doing monitoring now for about 12 years, mainly using Plano at Nagas. As we learned today in the morning, this kind of old-fashioned Nagas, but it still works some way, sometimes. But one focus is open-source monitoring. I do only open-source monitoring with the current tool set. I'm a senior monitoring consultant at Munich-based console. Just to drop the logo here. They paid my expenses today. And now for the real thing, to the background. I'm a Nagas guy. What should I do with Kubernetes monitoring? Obviously, Nagas doesn't work in such an environment. But our DevOps team at console had the chance to implement a proof of concept of a Kubernetes environment at a customer to be more precise. It is an open-shift environment, but from a monitoring point of view, it's quite comparable at some unknown Munich car manufacturer. So it was a quite nice chance for us. And we have already monitoring instances there running. So it was obvious that we should try to monitor Kubernetes there, but we did not have any clue about how to do that, especially with Nagas, the way to do. So here is where Prometheus comes in. Prometheus seems to be a natural choice when doing Kubernetes monitoring. It has an integrated service discovery for Kubernetes. And as another example, it retains labels between Kubernetes and Prometheus. For example, when you define a label or an annotation in a Kubernetes definition, you have a good chance to have it retained in your Prometheus metrics. And another reason why using Prometheus is there are a lot of resources, tutorials, blog posts available on the net. For example, Brian's Robust Perception blog or Fabian's CoroS blog. And there are tons of examples on GitHub where you can browse through. So I want to start with the actual implementation we tried. The first building block of this implementation is, as already told, the Prometheus integrated service discovery of Kubernetes. A very simplified configuration would look like this in your scrape targets definition within Prometheus. Basically in my demo environment, I used the Prometheus Kubernetes YAML example directly from Prometheus GitHub account. Just left some parts out which are not necessary for the demo. So basically this is like that in this diagram. This is a Kubernetes cluster running with some cluster nodes. There is a Prometheus instance deployed within. And this instance talks to the Kubernetes API and gets some metrics, for example, from CAdvisor or from the API itself. For example, Guard of Metrics contains AP service data, container CPU statuses, Cubelet, add ETCD statuses, and so on. This is the first part of this implementation. The second part is the node exporter. We already heard about node exporter. This is an exporter from Prometheus itself to export hardware and OS metrics, for example, provided by the kernel. In our Kubernetes development here, this node exporter is deployed as a demon set. As a demon set, it runs per default in every cluster node available. So if someone decides to add another cluster node, it will get automatically started there. And with this annotation, Prometheus scrape true, Prometheus will automatically find this new node exporter. So when adding new cluster nodes, new cluster instances, no one would have to change something in the configuration. It would simply appear in Prometheus. So it looks like that. Here are the node exporter instances as demon sets started on the nodes and the running Prometheus discovers these and scrapes its metrics. This provides us with a lot of additional metrics, for example, CPU metrics, disk file system, and so on, from the separate cluster nodes. One third component is CubeSafe metrics. It's a project I found on Kubernetes itself. This one is focused on the health of API, not the API, or the CAdvisor metrics that Prometheus already scrapes. It exposes metrics, for example, deployment health, node health, pod health, all of these internals of a Kubernetes or OpenChift deployment. Here as well, I set the annotation scrape true so that it gets discovered automatically. This is the diagram for that component. It's deployed as well and gets automatically discovered. Here is again some of the metrics that are now exposed from this. So now we have three types of ingestion. The Kubernetes services cover itself, the node exporters for the node metrics of the cluster nodes, and the CubeState metrics. In a little demo environment here, I can show you these metrics. This demo environment is based on Minicube. It's a subproject of Kubernetes for running sample Kubernetes on your laptop. The sample config you can get on my GitHub account, links and presentation slides. By the way, you will find on the FOSTEM schedule event page for my talk here when you need it. So try it out. So I have here a Minicube running. Stages, for example. It is running. I have already opened the dashboard where you see I have two deployments, CubeState metrics, Prometheus as a deployment, Demonset for the node exporters, and here in Minicube there is only one cluster node. Therefore, for the pods, I have only one running node exporter pod and one pod for the CubeState metrics and one pod for Prometheus itself. Let's have a look at Prometheus. So this is the Prometheus instance running on Kubernetes on the Minicube demo environment right now. As you see, it already ingests a lot of metrics of this running Minicube. To be honest, I don't know what to do with this bunch of metrics, but the operations team of Kubernetes probably will know what to do with it. I'm fine with providing these metrics for someone else to know what to do with it. So the list is pretty, pretty large. So you will see here from the targets configuration, I have the scrape targets, AP servers, nodes, Kubernetes pods, service endpoints, and Prometheus itself. As already said, this configuration is basically the same as available as an example on the Prometheus GitHub account. If trying that out or using my list of services, second, you get some deployment files for your Kubernetes, 0.1.2.0.7. This is on my GitHub account. You get three example dashboards, the JSON files for Rufana, and later then the definition for the OMG. For example, in the Prometheus config map is included the configuration for running Prometheus inside after the official semi-official examples. So now we have a Power Point List. Now we have a Prometheus running ingesting metrics, and we are at a point where I don't know what to do with it now, because we would need many other components to have a running monitoring system and a useful monitoring system. We would need persistent storage for the time series database, for example, for the configuration files when needed. We would have to separate instance of an alert manager with its configurations. Obviously, we would need Rufana. Maybe we would need a push gate reset quickly on the instance. This would look like that in a Kubernetes environment. So deploying all that inside a Kubernetes, I thought, that's not my kind of stuff. We already have that. We already have many of these components here in our classical OMD Labs Edition monitoring. The OMD Labs Edition is basically monitoring in one package. OMD stands for Open Source Monitoring Distribution, but distribution is kind of misleading. It's rather than separate Linux distribution, for example. It's a platform where we bundle stuff that has proven useful for us consultants to get a monitoring up and running in almost no time with a customer. One big point is, of course, it's completely open source. It's based upon the experiences around Nagios monitoring. Nagios is in this package as well as Asinga or Asinga 2 or Neiman, a separate monitoring course. All the forks are available. It bundles, as already said, best practices of many years of consulting experience. So we simply wanted to do the same stuff over and over again at a customer. You don't need no root access to run a monitoring site. You only need it for installation of that monitoring package. And it's the musterlism, it's a sample solution at this Munich car manufacturer for monitoring projects there. The only labs edition, as the name implies, is developed by Console Labs. Console Labs, you may have noticed it, is a platform from console employees that do open source stuff in their work and their part time. But we are already free for other contributions who want to work with us, who want to do something great. And OMD is not only the only project here. For example, Citrus or Sakodi for end-to-end testing is another project. There's Fabia and stuff, for example, as well available in OMD Labs. But it's not that OMD is only a single package for monitoring. That would be okay for a fixed starting point. It has already as well been one super nice feature, I think. Sorry for that. You see what we here include. It's a small part of what we ship together with OMD. We recognize Asinga, struck as an alternative user interface in FluxDB, as well as Prometheus and Grafana is also built in there. And also some monitoring plugins, proven useful. And also we shipping also Ansible, because we regularly need it to do some monitoring tasks. So we decided to put it into one package to minimize configuration needed. And here's the other feature, the killer feature, I think, besides from an OMD is a separate monitoring instance that runs off the same binaries on the file system. Basically, you don't only create my site and have an environment where you can set, for example, your monitoring core to Nagios or to Asinga or to Neiman. You can enable Prometheus or Grafana per instance. You can also copy, for example, the prod instance to stage when you want to test larger changes in your monitoring or if you want to do version upgrades of OMD. You can deploy several versions of OMD in parallel on your server and can selectively upgrade every site to a specific version. So, for example, you could do OMD copy prod to stage, update this new site, test if all of your monitoring stuff still works. And if it does, you could set this one or remove stage production. It's also one command and disable or delete the old production environment. So you're quite safe when doing upgrades, when doing tests. And, as already said, you have a possibility to run different configurations on one machine or different custom installations. For example, customer 1, customer 2, and they are separated completely, so you don't have to care about if customer 1 sees monitoring information of customer 2. More information about OMD Labs Edition, you find on labs.console.de. So, back to the implementation. When setting up OMD, we already have Prometheus there. So we could scrape the Prometheus Kubernetes metrics from the OMD. I've decided not to do so. Why? It's quite hard to access pods inside Kubernetes. The SDN makes it quite hard to access some non-privileged port as I found out. And it's also hard to access the API from outside Kubernetes. This is especially true for OpenShift. There you have to be a cluster admin and a cluster admin role to access the API and exposing that from outside with TLS certificates and tokens. It's quite complicated to get this information outside. So I decided to use Federation for that purpose. I will come to that later. Federation basically exposes the metrics from the inside Prometheus. And with a Regex matching, an outside Prometheus can scrape these metrics. I thought this is a great idea to get something like that running. We have this Kubernetes environment with the inner Prometheus and the outer Prometheus in the OMD environment federating all Kubernetes metrics from the inside to the outside. I found it worked. So we can see here. For that I start an OMD environment. So my OMD is not running. So someone will kind of have a flashback. It's a little bit more modern interface for Nagios, but it should be very familiar. So in this example I have some monitored hosts with Nagios measurements, for example. And then I can access the OMD Prometheus. And here I have running the Kubernetes Federation and the metrics of the inner Prometheus. Two scraped targets set up. They are running straight up and they are delivering metrics. You should see the same huge list of metrics in the outside Prometheus. We are with Prometheus. We also have Grafana here. Grafana. This has access to the OMD Prometheus. For example I have added a node dashboard, which I got from Grafana.net, which shows you the metrics and health state of the cluster node. The node exporter gathered metrics. System load, memory, test usage. Another dashboard here is the Kubernetes dashboard, which shows the running pods and CPU usage. You see Prometheus is quite high here. But I found out it still works. This is the usage network IO. And I bundled as well an extended dashboard, where more metrics are provided, for example. System service. Container CPU usage, for example. These three dashboards are also available on the GitHub repository with the demo data for you to play with. So at first glance, this looks nice. We have a Prometheus. We have data in there. We have a Prometheus running inside Kubernetes, where we don't have to care much about simply deployed. It's simply running. There's no storage there. We have an outside Prometheus running within the OMD site, which garters its metrics via federation. And at first glance, it looks like everything is working. But there are some issues. Yeah, two weeks ago, at a blog post with Brian, who told me, no, this is not the purpose of federation. One should do aggregation and minimize data by doing aggregation, federation, for example, or putting only relevant metrics out of other instance of Prometheus, pulling all that metrics could cause, for example, race conditions or something like that. But I decided to try it anyway. It runs for a couple of months, to be honest, in our proof of concept as well in our internal testing environment. And I did not notice any problems. So, yes. Let's see it. Another issue we are facing, especially this customer, the inner Prometheus is now open to be accessed by everyone. This is not allowed in production, in a proof of concept. We can do this, but not in the production environment. So, I still have to figure out how to secure this inner Prometheus and be still able to get metrics out of it. So, this is an open point. I have not looked into it up to now, but we know we cannot go live with this issue. Another issue combining the traditional OMD and Prometheus is, yes, you should not just alert, you should manage to alert when we mix them. How do we get rid of an overhead of configuration? So, when using Nagius, should we root through alert manager or the other way around, should we gather metrics through Prometheus and create Nagius alerts out of it? So, this is kind of an open task, too. As we learned, open long-term storage for Prometheus is in the making. I think Julius works on influx to be exporting. I think I read something about it. I don't know. I thought it. I don't know. So, our customer wants long-term storage for at least some of its metrics. We could try to do it with a federation of aggregated metrics. Therefore, the customer needs to know what he wants to store for a long time. But that could be a viable way using federation here for one long-term storage for Prometheus for the time being until another solution exists. And the last issue with it is the coverage. In our internal environment, our DevOps team had a severe outage last week. And simply no ports, no deployments worked. Nothing worked anymore. And they did not see anything until the Kubernetes cluster was up and running again because all the internal Prometheus was down. No metrics were delivered. So, they decided they wanted to additionally have an external monitoring of crucial concepts, components. For example, moving out the node exporter from inside Kubernetes to the machine itself or monitoring machine health traditionally using Nagios. We have not yet decided what to do. They also wanted to have important services like ACD monitored as well as some main API queries that indicate the overall health state of a Kubernetes cluster. This is the main point which is from the lessons learned here. I think you'll need some kind of external monitoring. The approach to have an inside Prometheus and gathering data there is nice for the customers, for example, of your Kubernetes environment where they need to know what their deployments do if their service is running correctly. But from the point of view from the operation side from the Kubernetes cluster, I think the approach of an inner Prometheus is not really sufficient. You need an external monitoring as well. Be it Prometheus, be it traditional Nagios. That doesn't matter here for us. O&D consists of all of these components. So it doesn't matter really. So thanks for watching. So questions. So the system is running right now as a proof of concept still? This is running as a proof of concept in our internal testing environment. There's nothing yet to deploy to production. And your external Prometheus server is just one server right now. You're running on one external Prometheus server. Yes. Our O&D environment runs normally at most of our customers on a virtual machine. This is the fact here also. The O&D runs fine on VMware or something like that. Next question. Why do we need two Prometheus instances? One that is inside Kubernetes node and another one inside O&D. Why can't we just have the Kubernetes Prometheus instance feed directly to Grafana? Sorry, I'm just suffering from... Why not just bring this inside? Which slide? So why not just have the inside Prometheus as the question? This one. Yeah, you could do this as well. But you have the problems, the same problems. When you're Kubernetes cluster doesn't work reliably anymore. You won't see any metrics. You won't have any monitoring in the worst case. So of course it's possible to deploy Grafana on the alert manager in here. I think it's the most covered way to do that. But this problem is not gone, even if you have two instances of Prometheus, right? One outside as well. Still your problem is not gone. If Kubernetes is down, you still have the same problem. So the question is why we need another one if your same set of problems are still existent? This external runs separately. So if this cluster dies, you lose the metrics provided from this one. But you still would have those one that could scrape directly the cluster nodes. Or monitor crucial components from the Nagios. So if you lose everything inside the Kubernetes, in the best case it doesn't matter. Next question, going once, going twice. Thank you very much.