 I think it's actually the first time I'm talking through a microphone, so. Oh, that's fine. Yeah, this talk is basically about sharing the story of our team when we were trying to apply GitOps principles to maintaining a large number of environments. And yeah, I think there's a few important points here due to know that at the beginning. I don't want to sell you anything, right? It's just about showing basically the way we are working because we've been heading out to Google to kind of find like other kind of publicly accepted ways to approach this multiple environment problem. And we didn't find anything, so we come up kind of with our own way. Yeah, and I want to share this. Maybe we also missed something which is completely obvious. So maybe we're doing it entirely wrong. In this case, I'm also happy to hear your concerns afterwards in the Q&A. But yeah, let's get into it. So the agenda of this talk today, first I want to give you a bit of context, like who are we, who's, who's Rosh, and what are we doing? And then I'll be going into these technical details here about how we use GitOps. And there will also be a demo and I don't know if we have time for Q&A at the end, but I try to finish on time. So first of all, yeah, my name is Alex. I'm actually from Zurich in Switzerland. It's nice to be here. I'm super excited to be talking here and glad you're joining me. By the way, like, Zurich is this picture there on the bottom left, so I can highly recommend it in case you're going to Europe to visit. It's an awesome place. But yeah, currently I'm working as a technical lead at the Edge infrastructure team in Rosh. And actually for this one, who here has heard of Rosh already? Because I think it's the, oh wow, that's actually more than I was expecting. So for those of you who don't know, right? Rosh is a healthcare company. They're doing, or we're doing like a lot of stuff, like medication, right? Cancer therapies is one big division. The other thing is diagnostics. So in case you have heard of COVID, like all these PCR tests, or many of the PCR tests which are being run in the labs, they are being run on Rosh and analyzers. And what is such a chemical company? What are they doing in the world of software, right? And the point is that we all know software is becoming more and more important everywhere. Also in healthcare, the digital healthcare is a big thing which is coming. This is one use case which exists, or will exist in the future, which is like this more and more personalized medicine. So you can imagine if you want to kind of, yeah, customize your medication for every patient, you can't do this with like teams of senior scientists, right? There are not enough senior scientists available on this planet. So you need to do a lot of automation there. And basically I'm working in a team, like an internal platform team in Rosh where we're trying to make this happen. So we are basically building a platform which makes it easy for Rosh to deploy applications into laboratories, into hospitals, into pharmacies, doctor's offices. And we're doing this basically by deploying a large number of Kubernetes clusters around the world. And we are basically like the promise we were making is we are managing these clusters and you as the application developer team, you can just use them, you can just install your ham chart there. All right. So with this, let's talk about how we use GitOps. And the way we use GitOps is basically the way everybody uses GitOps. So we have a Kubernetes cluster here on the right. We have a Git repository on the left. And in the middle in our case we use flux and customize to synchronize these two things, right? Now the interesting part comes here. We not have one Kubernetes cluster, but we have many of them as I mentioned. And like the important part here is all these Kubernetes clusters in our case, they are not like handcrafted different environments for different teams, right? All these Kubernetes clusters we're talking potentially about thousands of them at some point. They should look more or less identical. So they should have the same stack deployed of like Prometheus, Grafana, whatever you want to deploy there. And so mostly the same, although there might be like small customizations, like you can imagine if these clusters are running in different locations to different hospitals, you might want to give them a name, for example, right? So that this name is visible on the Grafana page. And the question we kind of run into is like how do we do GitOps in this case, right? And the answer we came up with in the beginning was this one. And so we just had a single Git repository and we all connected all these Kubernetes clusters to the same Git repository using flux. And I mean, this obviously is kind of nice because it helps us to keep all these clusters in sync. But over time we realized that we are running into some issues here. And the first one was these customizations, right? So how do we, like if all these clusters are seeing the same files, how do we like display different name in one of the Grafana instances? The next one is also like these were planned customizations, right? So we knew in advance that we want to display different names in all these Grafana instances. But like the reward is also messy, right? So we need to be able to deploy some hotfixes. And yeah, how do we do this in this case, right? How can we deploy a hotfix to a specific cluster without affecting all the other ones at the same time? And kind of in the same direction goes the question of can we roll out? Like how can we, I don't know, for example, update from ETHIS to a newer version but not deploy this change on all the clusters at the same time. Then the next one, or the next issue we were running into was separating access permissions. So yeah, I mean with like modern Git host or tools, right? Like GitLab and so on, it's kind of easy to separate right access permissions. But in our case, like these clusters are really separated, like spread around the world. Different people are responsible for it. Like how can we separate, for example, read access permissions on these repositories? That's kind of hard to do in Git. And last but not least, another problem we had was geographical co-location. So connectivity is not always a given as much as we would like to have this. So we needed to somehow host these Git repositories also close to the actual clusters which are using them. Now the solution, I mean, there are like a bunch of solutions to like each of these individual problems, right? But in the end, we decided to go a completely different route, which is to create, yeah, a separate Git repository basically for every cluster. So that we have a one-to-one mapping for these clusters and Git repositories. And the nice thing is that basically we inverted this list here on the left, right? So we solved all the problems we had before. But now the new issue is popping up, like how do we now keep all these clusters in sync? And not only like how do we keep these repositories in sync, but also how do we initialize them? I mean, we don't want like a person sitting there like writing these these files to have a gig repository kind of manually, right? Now the solution we came up with was hopefully like fairly obvious when looking at this is to use a template. And the nice thing about this approach is that here on the right side, you basically still have your standard GitOps approach, right? You can still use all the tools you know and love, like Argo, Flux, or Terraform, whatsoever. And the new party on the left, we actually internally started calling it Magops for MetaGitOps, obviously. And the question which was kind of popping up here, like what tools exist to help us, yeah, help us with this process, right? Is there anything out there we could use? We tried a few things, but in the end, we decided that like none of these existing solutions was fully working for our use case. And this is why we set out to first create like a new tool. We started using internally, which is called FoxOps. And this FoxOps tool, which I will demo afterwards, is basically helping exactly with this, right? This helps initializing all these repositories. And the nice part is it also helps to keep all the repositories in sync by allowing you to make updates in the template and then rolling out these updates to the actual, we call them incarnations without actually destroying any hotfixes that were made. Like if we change something in the template and we have some hotfix active in a particular cluster, we can still roll out these updates to this particular repository as long as the change is not directly in the same location. In which case we would get a conflict. Now the way FoxOps works is basically like every web application works. So there's an API with a database behind the scenes. And there's also like a very basic UI at the moment. Like all of this is very basic at the stage. But the idea is this API is basically, yeah, it takes templates from Git. So templates are also just Git repositories. You make versions by creating Git tags. And using the API, you can say, hey, please create me another incarnation of this templates at this location. Or please roll out this update at this location. And the SQL database is there because FoxOps itself keeps a bit of a state to keep an inventory of all the incarnations that has been creating. So we can use this FoxOps UI in our case as like the single point of like seeing, okay, we have all these like Git repositories they are all at this and this template version. Yeah, so it helps us to keep an overview basically what we did in the past. Although, I mean, just for you to know, right, in our case, we mostly use the API directly, especially for creating. So we automated everything into end. So we use only the API, but the UI is always there for some kind of manual intervention if needed. Yeah, I'm not going into detail of this slide. By the way, we open source FoxOps. It's also a fairly new topic for us. And it's super, super alpha at this stage. So please don't use it. But like here you can see like just maybe afterwards if you want to check the slides, like how it compares to the existing tools like cookie cutter and copier, which you might have heard. The important part is I think that the biggest difference is that cookie cutter and copier, these are command line tools which you can run on your laptop. Whereas FoxOps is kind of this like this hosted service which you could host somewhere and then you get this overview of like who created which incarnation. And I think I'm done with my slides already. Going to the demo. Just want to say that the disclaimer again, so FoxOps at this stage is somehow a POC which made it into production as usual. So please don't use it or please use it, but only do so if you have some time to also contribute to it because you will very quickly run into like missing features or bugs which will not make this a pleasant experience at this stage. All right, so let's see the demo. So what I want to show you in this demo is basically a use case similar to the one I was explaining before, right? So let's imagine we're a software as a service company, we are hosting our product for our customers but for some reason our product is not perfectly multi-tenant capable. So we need to, yeah, for every customer we need to create a separate instance, maybe spin up some separate cloud re-resources for this particular customer. And basically what you can see here is I just prepared a small GitLab group. We basically like this infrastructure thing is like all our different customer accounts here. So these are the repositories in which we want to commit our GitOps infrastructure as code for every customer. And then here we have a template folder at the moment there's only one template. And this is the template we're going to use to fill all these incarnations. So I'm going to show you the template first and then afterwards I will show you the FoxOps UI and I think we will just be going to like two use cases. The first one would be like let's onboard a new customer. So spin up like new infrastructure for them. The second one let's update an existing customer to like a newer version of the template. So looking at this template here, just how it looks like, I mean it's basically how a template always looks like. There is a metadata file in this case it's called fengine.yaml. In this metadata file you basically can define like variables that you can use in this template. So in this case this template only takes a single variable which is the customer name and it's getting a type of string, right? And then here in this template sub-folder these are the files which are then actually committed afterwards by FoxOps to the incarnation. So in this case we want to commit some Terraform folder here. Actually we're a very cheap software service company. The only thing we do at the moment for our customers is creating them a password. And also a username and you can see already here this is now actually like the FoxOps templating syntax. So here we're using this template variable. We have just been defining before. And by the way FoxOps internally is using so FoxOps is written in Python it's using the JINJA to template engine so you can do all the fancy templating stuff you want like loops, if statements, variables, transformations, whatever. Okay so, oh and the last thing which I can maybe show you here is that templates are a version, right? So we just use git tags in this case. At the moment we're having a version 0.1 and there's also a version 0.2. The only difference in version 0.2 is I updated a Terraform provider. I updated a random Terraform provider from one version to another. All right now let's go to the very basic FoxOps UI. So this is the overview page of FoxOps there. There are not more pages at the moment. But what you see here are basically these two incarnations I have been creating already before. So let me quickly change back here just to you for you to see this again. This was our structure, right? And these were our different customer accounts. So far we have two or three of these accounts and let's say now we want to create another account for a customer who just bought our product. And unfortunately we don't have any auto completion here or selection whatsoever. But basically here on this create incarnation page we are saying okay this is the target repository we want to initialize the template and this would be in this case infrastructure account C. Then FoxOps does support creating these incarnations and subdirectories but in this case we will just do it in the root star directory. And here we specify where the template is coming from and I hope I remembered the correct part here. I think it's demo slash templates, infra templates, yeah. And let's also initialize them with version 0.1. Down here we give the variables, right? So in this case the customer name, he would be like in this case customer C. If I click on create, I mean before I click on create I would just show you that this repository is actually empty. So I'm not cheating here. So clicking on create and this should hopefully not take more than, it actually worked crazy. Always scared of these demos. Yeah, the files appeared. So here you can see like we have the Terraform files and if we check this output.tf from before we can see that FoxOps rendered this value or less this value into this particular file. Yeah, second use case would be let's update one of our existing customers. Like this is our pilot customer, right? He always wants to get the new versions first. Like all we have to do here is I clicked on incarnation. I typed in like please use now this version 0.2 template version, making an update. And let's see. So what you would see and now this button here became active which is like taking you to the latest merge request which was created by FoxOps. So for all the other ones I just initialized the repository without doing any merge request. This is now the first time we're doing an update here. So if I click on this button I get to this merge request. It actually was merged already automatically. There's a feature in FoxOps to to auto merge. These updates I use this. But what you will see here in the changes basically is that yeah, this was the update I was doing and then the templates repository. So basically updating the Terraform provider from 3.4.0 to 3.4.3 including the log file. And yeah, I think that's it for the demo already. Let me see. Oh, I have one more slide. I mean where could something like this be useful? I mean besides this like GitOps use case which I was showing you here. Like another use case we're looking into is for microservice repositories. You can imagine if you have a bunch of microservices you're maintaining, you probably have a similar tech stack. You might want to share pipeline configuration. You might want to share built tooling, directory structure whatsoever. So having a tool like FoxOps. In this case you could also use like one of the other tools like copier especially. But to keep all these repositories in sync it's certainly a nice way of doing it. And I think that's all.