 Welcome everyone. My name is Caroline and I will be talking to you about cost-efficient multi-cluster monitoring with Fumisius, Rafaana and LinkerD. So I'm a senior DevOps engineer at BWI which is the IT services provider for the German Federal Ministry of Defense and at BWI my work centers around building a secure private cloud for the German Armed Forces. Before that I worked at Philly Connect for five years. I started there as a software engineer and later on I joined the cloud platform team. And yeah apart from my day job I'm working towards my master's degree in computer science at Georgia Institute of Technology. So today however I will talk to you about my previous job at Philly Connect. And to be a little more specific I'll tell you about how we found our path to finding a monitoring solution that fit our needs and how we used LinkerD to make it work. So a little bit more context about Philly Connect. Philly Connect is a European financial services provider with offices in Germany, Spain and Italy offering connectivity to over 3,000 financial institutions and banks in 11 countries. So Philly Connect's main product is an open banking platform which serves as a basis for account aggregation, payment initiation, categorization and account and portfolio switching solutions. And what's also important to say in this context is that Philly Connect is a regulated payment institution. So security and compliance is a really big topic and it has to be considered at every step of the way. Right now before we talk about solutions let me first explain the actual requirements we had for our cloud monitoring setup. So basically we needed some kind of system to collect application and infrastructure metrics so we could use those metrics for alerting and also to help debugging and to manage incidents. And that solution should also guarantee secure data transfer and ideally not cost a large amount of money so it should be rather budget friendly. And based on those points we already figured out that we didn't need any type of long-term storage for the metrics. I know that there are a lot of solutions out there using object storage and some systems like Thanos if you might have heard of it. But yeah in our case it just wasn't necessary because we had a rather short retention time of two days. And yeah this is what our initial solution looked like. At that point we only had a single production cluster and another cluster for development. So we had a single central Prometheus deployment that scrapped metrics from infrastructure services and the production services. And yeah then we had a central Gofana that used this Prometheus as a data source. So it's nothing special you've probably seen it a lot along your way. But then as time passed Philippe Connect's business requirements started to change. So previously we only had two bare metal Kubernetes clusters on which we hosted all production environments. But now we had to deploy the whole open banking stack to AWS in the Bahrain region so in the Middle East. And then we also had to migrate some production environments to Google Cloud in Germany and Spain. And on top of that we had to manage two Kubernetes clusters in private clouds on open stack and VMware Tanzu. So now among many other things we needed to kind of rethink how we provide observability over like a not so small amount of clusters anymore. And this is what our first let's say naive solution looked like. We basically just rapid fated the whole stack among all the clusters. Well the good thing was we could just copy and paste everything. It was easy to deploy. But yeah it got more and more confusing with all the different endpoints and it needed a bunch of resources. So yeah overall it just wasn't a good and sustainable solution. We then figured out that it makes sense to centralize certain services that are needed among all the clusters. And in that regard it was our first intuition to use a central Prometheus to federate all other Prometheus instances. So federation means that one Prometheus is configured to scrape metrics from another Prometheus instead of the metric source directly. But yeah you might already notice it can get ugly pretty quickly. So first you have to decide beforehand what kind of metrics you actually need for your dashboards. And then you have to melt that into a filter in your federation config. Because if you don't do proper filtering there's a really high risk of just completely overwhelming that central Prometheus. So yeah it just won't be able to handle all that incoming data at once. And then yeah you have to traffic overhead if you constantly ship metrics from one place to another. And yeah disk space can also be an issue if you have to store everything twice. So after going back and forth a little this is the solution we decided to go with. So we would drop the central Prometheus and instead use the other Prometheus instances as data sources for a central Grafana. So this way you don't need to pre-select metrics and you don't need to maintain this complex federation config. And what's really good you just pay for the traffic that comes from the queries within your Grafana dashboards. So if your Grafana usage pattern looks somewhat similar to ours and you look at dashboards every once in a while and you don't run all kinds of crazy queries every day it's just significantly less expensive. And if you add a new cluster you can just plug it in as a new data source for the Grafana. And yeah it's also great you save disk space. Now yeah you're probably right if you think that this kind of topology yeah leads to the central services being the single point of failure. So what we did is we used a regional GCP cluster for the central cluster. Yeah so we replicated it over multiple availability zones and the control plane was also replicated. So we tried to make it as stable as possible. All right now how do we make this happen. So in theory we would expose every single prometheus to the internet and create a bunch of DNS entries and then because we need to cure a data transfer we would generate and distribute certificates everywhere and yeah so we only have MTLS. Now even if you're using software excerpt manager yeah it sounds like a lot of work right. And also managing all those DNS entries it can be tedious and yeah we know that naming is one of the hardest problems in computer science probably know that. So we started looking into ways to automate this whole process and we then realized that with linkerd we basically already had a solution at hand. So linkerd we've already been using it in production for a couple of years and yeah it's done a great job for us especially the automatic MTLS has been really helpful and yeah it saved us from a lot of pain and the good thing is you can also use it for inter cluster communication. So linkerd supports cross cluster communication with multi cluster extension which is deployed separately from the control plane and just like the in cluster deployment it gives a unified trust domain which can be validated at every step of the way. It's designed to be failure resilient so you won't lose a whole cluster connectivity if you just lose one. And also it supports basically all types of networks whether it's like BPCs whether the certain cloud provider or some kind of cross data center connectivity. So yeah overall you get the same functionality for cross cluster communication as you get with linkerd within a cluster. And now this is what it would look like in our monitoring use case. Now we have two clusters here the left one is a workload cluster and the other one is the central services cluster and the two clusters are linked so that the central cluster acts as a source and the workload cluster acts as a target. Now in the workload cluster we have a Prometheus instance and in the central cluster we have the Grafana and both of them are meshed through linkerd so we have the proxies running left and right as sidecars. Now in the central cluster we have a service mirror running which is actually one part of the multi cluster control plane and the service mirror watches for services in the target cluster with the mirror linkerd IO exported true label and it replicates those services into the source cluster so that the Grafana in the target cluster can use this replication for service discovery and calls to the Prometheus and the source are routed through the multi cluster gateway which is another part of the multi cluster control plane and this is basically what this whole setup actually feels like it's pretty simple you basically don't even need a diagram for that but yeah for from the Grafana point of view you can just treat the mirror Prometheus service exactly like any other service in the same cluster so you have cluster internal DNS name which is automatically assigned by the service mirror and you don't need to worry about certificates at all because the whole mutual TLS stuff is handled through linkerd so I already mentioned earlier the Prometheus instances are connected to the central Grafana as data sources and yeah that's what the data source configuration for the Grafana would look like so for the Prometheus URL you can just use the cluster internal DNS name and the Prometheus port and yeah the protocol has to be HTTP because yeah linkerd handles all the MTLS stuff and yeah this is an example of Grafana dashboard what it would look like so on the left hand side you have a drop down menu where you can select all the available data sources and you could just choose the Prometheus of which you can see yeah metrics from the source cluster so yeah let's talk about how we deployed the whole multi cluster setup when we looked at our existing linkerd installations we realized that we needed to do one more thing before we could start deploying the multi cluster stuff and that is we needed a shared trust anchor so the trust anchor is the root certificate for yeah that that signs the identity certificate which in turn signs all the data plane proxy TLS certs so in the end the identity issuer acts as the CA for one cluster while the trust anchor is a shared CA over all the connected clusters well in our previous setup we hadn't connected our clusters before and that's why we had separate trust anchors for the liquidity deployments but yeah before we rolled out the multi cluster stuff we had to exchange the trust anchors for a shared one to be able to proceed so how do you do this first of all it's not as scary as it sounds like I know that certificate rotation can be scary some of you might have been there and the good news now is that you can do it without a downtime by just bundling the old and the new trust anchor so there are some really helpful resources in the linker d docs um on manual trust anchor rotation and what we did is we just went through all the steps a couple of times with yeah a few test clusters we set up specifically for that and then we created our own runbook and yeah in the end we just went through it step by step and it just worked really well all right now that we have a shared CA we can proceed with the control plane deployment and there are basically two options on how to do that so the first one is to use the linker d cli tool using the linker d multi cluster install command which you can see here together with kube control and it installs a multi cluster control plane and yeah it includes the gateway the and a few other things such as our bag and custom resource definitions for cluster links now in my back then team at philip connect we pretty much deployed everything in a github's way so it was really important for us to keep it that way and we were running deployments with our go cd or with git lab pipelines and we basically used helm charts for everything so that's what we also wanted to do for the multi cluster components so luckily there's a helm chart for that in the official linker d helm registry it's called linker d multi cluster and yeah it's pretty easy to use and a really good advantage is that you can yeah configure way more things than if you just lose use the cli command so that's also why the helm chart is the recommended way for production usage for example if you already know that you only have unidirectional traffic from source to target you can just disable the gateway and that way you can save a load balancer ip so yeah just think about it right um the next step after the control plane deployment is to actually link the clusters and yeah of course there's a linker d cli command to do this um which creates the link custom resources and a few other things to make the cross um cluster connection work and one of the crucial parts here is that you have to run the link command in the target cluster to get the target cluster credentials which is a cube config so the cube config of the target cluster has to be persisted in a secret in the source cluster now how did we deploy the cluster links um yeah just like the control plane we wanted to stick to githubs and deploy everything in the form of helm charts the only little hurdle now was that there's no official helm charts um to deploy all the resources um that are generated by the multi cluster link command so what we did is we just ran the command in one of the clusters and we baked the output into our own helm chart um yeah that we deployed in the central services cluster and the chart that we created it provides a list of cluster links where you can um configure the external gateway ip's of the target clusters um and a few other things and the cube configs for the targets actually are not part of the chart they have to be provided through existing secrets so you can either manually create the secrets before you roll out the helm chart or you can use some kind of external um external secret management tool um yeah depending on how you manage your secrets in your clusters but yeah anyway you won't get around running the multi cluster link command at least once um to get those cube configs and to somehow save them somewhere if you want to look at an example of this helm chart you can scan this this code and yeah it's a link to a helm chart my personal github account all right uh we're reaching the end of the session so um let's wrap things up if you're dealing with a multi cluster or even a multi cloud environment um it makes sense to keep your metrics local but to centralize access to your metrics it can be significantly more cost efficient it's more scalable if you add new clusters and it will definitely save you some nerves and especially if you're already using linkerd as a service mesh or if you're thinking about using it you should um look at the multi cluster extension because it makes cross cluster communication just as easy as linkerd within the cluster and the last point it's possible to deploy multi cluster components in a github's way using helm charts for the most part you just need to somehow get the target cluster credentials once and after that it's just like any other declarative deployment all right that's it for my part um thanks um i will be joining the roundtable discussion at 1230 if you want to talk a little bit more about that about multi cluster linkerd multi cluster or um yeah using github's for linkerd um any kind of topics just join me there and thank you thank you we have a couple minutes for questions if anybody has any two at once great hang on a second i'm going to do this one first and then you hi hi um i have a question what was the hardest part when when you need to set up in production multi cluster environment what was the the hardest part setting the on production environment in multi cluster it was kind of quiet i'm sorry the question was what was the most difficult part about actually setting up the multi cluster extension in production in your environment um the hardest part i mean the scariest part was um obviously the the trust anchor rotation but yeah we tried that a couple of times and after that it was just deploying a couple of helm charts and it's just like rolling out any other service like yeah okay hi there so basically i do have two questions the first one is if you link the clusters does it provide some kind of a tunnel that you're able to do a two-way communication between services uh in the clusters so that's the first part and the second part is that is there is a possibility if i do the the linkerd tunnel um to access the main kubernetes control panel on the other clusters because like the case would be for example to do the argocd multi cluster configuration and to like tunnel it through the linkerd for example thank you well um i mean yeah two-way communication is possible so if you export services from one cluster to another you just have to add this this label to service so it gets replicated to the other cluster and if you want to do the same the other way around you kind of have to the same you just have to have two gateways in both clusters um yeah and then you have to the mtls tunnel for a secure communication and um the other question the the second was the second one seems to be whether you could use multi-cluster to tunnel the kube api itself i think you could do that if you just mirror the service i mean it's you can mirror like any kind of service i think that's an excellent question to bring up at the roundtable actually where we can also double check with a couple other folks okay all right thank you very much much appreciated thank you