 My name is Leon Barron, I'm a Solutions Architect at Tigera and in this video we're going to run through the steps needed to take a bare bones Kubernetes cluster to a cluster that has Calico, Prometheus and Grafana deployed and running and Calico components exposing metrics for Prometheus consumption. Now how we'll do that is we're going to take a Kubernetes cluster that has no CNI installed in it and using Helm will deploy Calico open source and the Prometheus stack. We'll then enable the Calico components to expose metrics and ensure that Prometheus discovers these metrics. Finally, we'll create a sample dashboard in Grafana to display these metrics. Now before we begin, let's first take a look at the lab environment we have set up. Here we have a three node Kubernetes cluster installed using Kube ADM and a bastion host where we will be running all of our commands from. We have one master node and two worker nodes and the node network is 10.0.1.0 slash 24. The pod network is 10.48.0.0 slash 24. And finally the service network is 10.49.0.0 slash 24. The Calico components whose metrics we're interested in are Calico node and specifically Felix, Calico Typha and Calico Kube controllers. As a brief overview of what each of these components do, the Calico node runs as a demon set on every cluster node. And Felix, which is the component that we're interested in, is a core process running inside of Calico node. One of the main responsibilities of Felix is realizing and enforcing Calico security policies on the data plane. Now Calico Typha is a caching data store proxy that sits between the Calico nodes and the Kubernetes API server. And really its main function is to allow for scale. And then finally we have the Calico Kube controllers. They monitor the Kubernetes API and they perform actions based on a cluster state. So I've now sshed into the Bastion server and to confirm the Bastion servers network information, we can run IP ADDR. We can see that the Bastion servers IP addresses 10.0.1.10 slash 24. And that's in keeping with the lab architecture. Now, it's from the Bastion server that we're going to run all of our KubeCuttle commands and Helm commands to really build out this infrastructure. But to get a feeling for where we currently are, let's run the command KubeCuttle get nodes. So we can see that we've got three nodes deployed, one control plane or master node and two worker nodes. And the IP address information is again in keeping with lab architecture. You may also notice that these three nodes are in a not ready status. The reason being is because there is no CNI deployed here and we can further confirm that by running a KubeCuttle get pods. Here we can see that we've got a number of pods that are indeed running, but the pods that are running are actually reusing the IP address from the node that they're on. So these pods are host network to pods. The two core DNS pods are in a pending state and the reason is because they're trying to get an IP address from the pod network. But because we've got no CNI deployed, there's no way that they can get an IP address from that pod network. So let's do that. Let's deploy Calico open source and get a CNI deployed and get all of these pods up and running. As I had mentioned, we're going to use Helm to deploy Calico open source. And the first step to doing that is adding the project Calico Helm repo. Once we've got that done, we're going to create a namespace called Tiger operator. And this is the namespace that we're going to target our Helm install into. So if we run the command Helm install Calico project Calico Tiger operator, I've chosen the latest version available at the moment and we're going to install it into the namespace Tiger operator. Now the Helm install has completed, but we can check the deployment status by running command kubectl get Tiger status. So here we can see the API server resource and the Calico resource, none of which are available and the Calico resource is progressing. We can get a deeper understanding of what's going on by running kubectl get pod minus a and here we can see all of the Calico components that are being deployed. So we've got the Calico API server, we've got the Calico kubectl controllers, Calico nodes, Calico Typha, and the CSI node drivers. Now, because the Calico node pods have been deployed, if we rerun the kubectl get Tiger status command, we'll now see that the Calico resource is available. We can also see that the API resource is available too. So if we rerun the get pods, we now see that everything is running to confirm completely. If we do a kubectl get nodes, we can see that now all nodes are in a ready state. So with Calico open source deployed and all the pods up and running, let's now continue on to deploying Prometheus. Again, we're going to be using Helm to deploy Prometheus. And again, the first step is going to be to add the repo. Once that's added, we're going to create a namespace monitoring where we're going to target the Prometheus deployment. We then run the Helm install Prometheus stack into the monitoring namespace. So once this is completed, we can check the status of Prometheus by running a kubectl get Prometheus minus end monitoring. So we can see here that we've got the Prometheus stack kubeprom Prometheus and we've got the version got one desired, nothing is ready yet. So it's still going on. We can check further by checking on the pods in the monitoring namespace. And here now we can see that we've alert manager that's running, got the Prometheus stack kubeprom Prometheus that's also running. We have Grafana that's running. We've got the Prometheus operator running the state metrics and then the Prometheus node exporter pods all running. So now that they're all running, we should get a good output from here and indeed we see now that it's ready. We can also check Grafana and to check Grafana, we can take a look at the services that are in the monitoring namespace. And here we can see all the services deployed, one of which being Grafana as a cluster IP address and an IP address listening on port 80. Now that we have Calico Prometheus and Grafana deployed, let's look at how we can expose both Prometheus and Grafana so that we have access to their UIs. In this lab, we already have an ingress controller deployed, as you can see here. So I'm going to expose both Prometheus and Grafana using this ingress, but there's no need to use an ingress. If you don't wish to, you can always use a node pour for a load balancer to accomplish the exact same task. Now, let's take a look at the ingress needed to expose Prometheus. So you can see that it's of kind ingress, we've named a Prometheus service and we're going to deploy it into the monitoring namespace. Once the ingress is fully open running, we should be able to browse to this host or URL to access the Prometheus UI. And down here describes what service the ingress is going to match to. So if we run a kubectl get service on the monitoring namespace, we can see that this service name matches to this cluster IP service. And we can see that that service is listening on TCP port 1990. Now, if we want to look further into that, we could run a kubectl get endpoints for that service. And we can see that it is being backed by this IP address. If we want to look even further again, we could do a kubectl get pods minus end monitoring minus or wide and then grep for that IP address. And there we're going to see the pod that backs the service. So this is the pod that's backing this service and this pod is the Prometheus server itself. Okay. So let's now apply that ingress. Now, the ingress is going to take a couple of seconds to apply. We can monitor that by running a get ingress. And we can see that this is the ingress that's currently being deployed, but we've got no addresses associated with it. So if we were to open a browser to this location, we wouldn't see anything. Take a look again and eventually we're going to see addresses that are attached to it. And once they're attached to it, then we know that we've got access to it. So as you see here, we now have these two IP addresses attached to this ingress. So if we were to open up a browser to this location, we should now see the Prometheus UI and indeed we do. We can even check the targets that Prometheus already has. And it's quite interesting because we've deployed it through Helm and it was the full stack. It's already monitoring itself. So it's already monitoring alert manager. It's monitoring the Prometheus operator, Prometheus server, state metrics and a node exporter. It's also monitoring some Kubernetes components like Kubelet and the API server and core DNS. So this is all great. What we want to do is we want to add the Calico components as targets here. But before we do that, let's first expose Grafana as well. The next step is to expose Grafana using our ingress and it's going to be very similar to how we exposed Prometheus previously. Let's take a look at the ingress YAML needed for Grafana. We can see it's of kind ingress. Again, we're calling this Grafana service and we're deploying it into the monitoring namespace. This will be the host or the URL that we're going to use to view the Grafana UI once the ingress is fully up and running. And again, we can see that this ingress is matching the service named Prometheus stack Grafana. And again, we can check the services in the monitoring namespace. And we can see that this service matches with this cluster IP service. And this cluster IP service is listening on TCP port 80. So we could look through the endpoints and the pods that are backing the service. But for now, let's just apply the ingress Grafana YAML and take a look at the Grafana UI. This will take a few seconds. So what we can do to monitor it is run a kubectl get ingress minus N monitoring. And we can see that the Prometheus service of course already has IP addresses attached to it. But the Grafana service that we just applied has nothing. So let's wait until that's populated and then we'll look into the Grafana UI. So now we can see that both Grafana service and Prometheus service have addresses associated with them. So let's now open up this URL to view the Grafana UI and we're presented with the Grafana UI. Here we can take a look at some things. For example, we can take a look at the data sources. And what's interesting is that we're already connected to the Prometheus data source because we deployed this with Helm. Everything is aware of everything else. So this Grafana is aware of the Prometheus that it was deployed with and it's configured the data source for Prometheus. We can also take a look at some built-in dashboards. So we've got core DNS dashboards, Kubernetes API server, etc. And if we take a look at one of the dashboards, we can see that we're already getting metrics. What we want to do now is we want to allow Prometheus to pull in the Calico OSS metrics so from Felix, from Tyfa and from Kube controllers. And once those targets are discovered and metrics are being pulled in, we will create a dashboard for Felix and for Tyfa and take a look at what type of data and metrics that we can see. Let's take stock of what we've done so far. We first deployed the Calico components using Helm and this gave us components such as Calico Tyfa, Calico Kube controller and a Calico node pod on every cluster node. We then deployed the Prometheus stack using Helm, that deployed components such as Prometheus, Grafana, alert manager and node exporters. We then configured an ingress to allow UI access to both Prometheus and Grafana. And from looking at the Prometheus UI, we could see many targets have already been discovered both from the Prometheus stack itself and from the Kubernetes cluster. But now we want to add the Calico components to these Prometheus targets. Next, we want to get a Calico component metrics into our Prometheus deployment and to do this, we need to follow three steps. First, we need to enable Prometheus metrics on both Felix and Tyfa and to do this, we're going to edit the Felix configuration resource to enable Felix metrics and we'll edit the installation resource to enable Tyfa metrics. After this, we need to create services for both Felix and Tyfa and it's going to be these services that Prometheus will use to discover the endpoints and what ports on that endpoint it needs to scrape. Do note that these two first steps are completed by default for Calico Kube controller so we don't need to do anything further there. And then finally, we create service monitor resources that will tell Prometheus about the Felix, Tyfa and Calico Kube controller services. So the service monitors are really what's tying all of these pieces together. So let's now enable the metrics for both Felix and Tyfa. As we had mentioned, for Felix, we're going to patch the Felix configuration resource and we're going to set the Prometheus metrics to be enabled. For Tyfa, we need to patch the installation resource and here we're patching it by specifying the Tyfa metrics port to 9093. Now, once we've got them enabled, we need to create services for both Felix and Tyfa for Prometheus to discover those endpoints. So let's take a look at the Felix service that we're going to create. So you can see that it's of kind service. It's called FelixMetricsSVC. It's in the Calico system namespace and the label is K8SAP Calico Felix. We've also specified the metrics port to be 1991 and that's the TCP port that Prometheus is going to scrape to get those metrics. So let's now apply that. Let's now take a look at the Tyfa metrics. Oops, sorry. Let's now take a look at the Tyfa service that we're going to create. Again, it's of kind service. We're calling it Tyfa MetricsSVC in the Calico system namespace. The label is K8SAP Tyfa Metrics and again, we specify the metric port as 1993. So let's apply that also. Now, with those services created, we're ready to grab metrics from those endpoints. Let's take a quick look at the services in the Calico system namespace. And we can see that we've got both FelixMetrics and Tyfa Metrics and notice that their cluster IP type, but they've got no IP addresses. We're not really using them from a networking perspective. We're really just using them so that we can discover the endpoints. And we can see that it's 1991 and 1993. Notice the CalicoCube controller metrics is already exposing the metrics TCP port. So we don't need to do anything there. If we run a show labels, we can also see the labels that the service monitor is going to reference to pick up those endpoints. So now that we've created the services we've enabled the metrics, let's create the service monitors. So let's take a look. So here's the Felix service monitor. We can see that it's called Felix service monitor. It's in the monitoring namespace. And what it's doing is it's selecting the Calico system namespace and then it's selecting a service based on this label. And we can see here that that is going to be the same as this. Now something to be aware of here is this label up here, the Release Prometheus stack. When we create these service monitors and we're creating them in the monitoring namespace, it's this label that's telling Prometheus that it can use this service monitor and the information within the service monitor to grab new targets. So let's apply this. Now if we take a look at the other service monitors, we'll see they're very similar. So look at Tyfa. So again, it's a service monitor. Again, we've got the Release Prometheus stack, which tells Prometheus that it can use a service monitor. It's called Tyfa service monitor deployed in the monitoring namespace. We're looking at the Calico system namespace and we're selecting K8s Tyfa metrics, which is indeed the label for this monitor that we created. So let's apply that too. And finally, let's take a look at the cube controllers. Again, it's a service monitor. Again, we've got the Release Prometheus stack. We're calling a Calico cube controllers monitor in the monitoring namespace. We're matching the Calico system namespace and here we're matching K8s at Calico cube controllers, which is the same as this label for this service. Let's apply that. Now with that done, we should be able to take a look at Prometheus targets and just see if they've been spotted. Now this can take a few seconds. We can already see that we've got the Felix service monitor. Now we've got the Tyfa service monitor. It might take a few more minutes for the cube controller service monitor to appear, so let's just wait. After a few seconds, we now see the Calico cube controllers monitors. So that's everything now. We can expand these. We can see that they're up. And if we were to take a look at these IP addresses, these IP addresses are going to be the pods that Prometheus is scraping. So that service has shown Prometheus what the endpoints are, what the pods are, and then Prometheus is scraping those pods on that TCP port. To be able to visualize these metrics in Grafana, we need to create a dashboard. We can, of course, create any dashboard you want with the metrics that have been exposed, but for this video, we will create a Felix dashboard based on what is provided in the Project Calico docs site. If we navigate to this URL, we can see a Felix dashboard.json. That is all the configuration needed to create a default Felix dashboard in Grafana. So all we really need to do is copy all of this information and then change the data sources, because in this example, the data source is just specified as Grafana. We need to specify it for our Prometheus and our Grafana instance. So all we need to do is change data source Grafana to data source type data source in UID Grafana. So we'll replace all. And then we can find another data source that references Calico demo Prometheus, which, of course, is not referencing our Prometheus installation. So let's also change that. So to change that, we just need to change this. Copy that. So now we're changing anywhere where it says data source Calico demo Prometheus to data source type Prometheus UID Prometheus, and we replace all of them. Now we simply just need to copy this, go to our dashboards, import. Here I can just paste the JSON that I have. Of course, you could save it and upload it. I'll just load it in here. You can see that it gives a name Calico Felix dashboard. And then when I import it, we'll immediately get all of the metrics that we need. So to recap what we went over in this video, we first deployed both Calico OSS and the Prometheus stack using Helm. We then exposed Prometheus and Grafana's UIs using an Ingress. We created services for both Calico node and Calico Typhand. Remember the Calico Q controller service was already created. And then these services were referenced by the service monitors that we also created to tell Prometheus how to discover the Calico endpoints it needed to scrape. We verified the targets were discovered correctly in Prometheus. And then we created a dashboard for Felix exposed metrics based on a default dashboard provided in the project Calico dock site. I hope this video has been helpful and we look forward to seeing you in future videos.