 At the risk of stating the obvious, compliance can be hard. It's really tough to move fast and maintain a competitive edge while still taking into account all the necessary fail safes and standards imposed by third parties, whether those are regulatory bodies or internal or external customers. To manage that, you need consistency and automation. And to achieve that, you need some sort of infrastructure as code. Architecturally, there are a lot of different ways to approach that, and our speakers today have tried several of them. Janus Deenst and Jan Kolas from DB Cystal had to build a multi-tenant infrastructure that met both internal standards for process separation and external ISO 20000 service management requirements. And they did so by leveraging a couple of different architectures built on GitLab. So let's have a listen and see what they learned. Hello and welcome to our talk. We are here from Germany and I'm joined by my colleague Jan Kolas. I am Johannes Deenst and we are sharing the journey from Managed Cloud to GitOps. A journey we had over the last two years in our team and we want to deploy several client clusters per customer in HWS and how it all started and how it's going and where we are now. We are sharing in this talk today. So let's give some context where we're coming from. We are both employees in the same team at DB Cystal. DB Cystal is a subsidiary of Deutsche Bahn, the main German rail and logistics provider in Germany. We are operating worldwide with over 300,000 employees and DB Cystal is driving the digitalization of Deutsche Bahn in every aspect and in service station. And what we're doing specifically and why it's important that we are deploying several client clusters in the cloud, we will share in the next screen and I give over to Jan. Thank you. Yeah, we are running and building the DB Content Hub and in detail it's a headless content management system. We are providing it as a service in the whole firm and basically it's running on a whole infrastructure cluster. So it consists of several servers running on Linux, running our actual software, of course, running different other applications as EngineX, as a reverse proxy and so on for that later. And of course we are using the central services of the AWS cloud as the RDS relational database service, the Elasticsearch service and so on. And in front of all the customer clusters we are also running a centralized API to lighten the data access to control the request traffic and so on and of course to count the traffic because it's a small part of our business model. And the main requirement of this whole infrastructure was due to data privacy projection was to really separate each cluster from the other because we definitely were not allowed or it was definitely not allowed that the data of one customer cluster could reach the data of the other customer cluster. We had to ensure this. Therefore, this infrastructure we will present to you and more to that in a few minutes and in front of it, as I said, is the central API. It implements of course a routing mechanism based on an API key per customer so it interprets the key and routes the traffic to the correct load balancer in the background. And the data privacy protection was not the only requirement you had to fulfill. Deutsche Bahn is a big company. I always call it a corporate environment in a corporate environment there's a lot of certification, specification and requirements and constraints and one specific constraint is the ISO 20000. And this is over a few of the ISO 20000 and we are focusing specifically on the service management system requirements. So I can talk a few hours about ISO 20000 but we won't do that here because you want to learn something about infrastructure as code, for example and how we use it with GitLab. So the main things I want to talk about in short is the change management and release management and there are specific requirements to change management and release management if you are ISO 20000 certified. And one requirement, for example which is easily implemented with a version control system like GitLab is you need to have a request for change if you want something to get into production and this is a merge request or a pull request how you call it is not important but there are also other requirements for example you have to have release notes and release notes I remember doing it a few weeks earlier manually jotting down the commits what got changed in the software and with automation and pipelining you can automate release notes to a high degree or even 100%. What also is important in the pros of ISO 20000 which is common sense but also best practices testing and this is also aided by pipelines you can automatically test specific tests and automate all the tests if you are willing to do it. What we also have a requirement and I don't think it's very specific to ISO 20000 but I want to mention here we have to plan every change or every release in a centralized tool so every interface partner we have or every other team which uses our software knows about the change which is potentially breaking or interrupts the service and you can put this in a pipeline too if you are able to automate and put everything into Git for example so from the requirements to our first try and our first customer because the speciality of our team is we are not building a project but we are building a product which is sold to many customers and our first try was we were the first customer where we said okay we have a managed cloud and we want to deploy something into this managed cloud and what we did there was putting Ansible playbooks in a Git repository which is cool because then a developer from our team can check out a Git repository and deploy the software to or the infrastructure the configuration of the infrastructure specifically to the servers because managed cloud always means less automatization potential and you have to buy or configure the infrastructure beforehand a little bit and say okay we have a relation database for example and an easy two instance if you are in HWS and then you check out the Git repository as we did here on our local laptop and configured the provisioned infrastructure there and there is one big downside of that it's convenient but the big downside is you can't be sure that the provisioning of the infrastructure or the configuration of the infrastructure is the same as in the Git repository what you do want is that the configuration in the Git repository is the truth and is identical to the configuration on the server and one anecdote I want to share with you is that we had a plugin for our board software there and it turned out that the configuration of the plugin or the implementation of the plugin wasn't the same as in the Git repository in fact the one who deployed it changed it locally on his computer and we didn't know what he changed because it wasn't in the Git so we had to re-implement it to match the requirement when we updated the plugin so this was very tedious and there is also this compliance problem if you can't be sure what is on the server then you are not compliant you can't say what's the truth and that's a big problem with our company so if you get more customers and not just one we have three here and actually there were three we said ok that's not possible to provision the servers beforehand and then configure them we said ok we have to move to HWS to have a more cloudy approach and the cloudy approach means ok we provision our software from our Git repository and in fact what we did we used HWS cloud formation and put all the user customer configuration into the Git repository so there were three configuration directories and the cloud formation template this has not the problem with the compliance anymore because when you deploy over a pipeline and only deploy over a pipeline to the different customer clusters then you are automatically compliant only if someone goes into production and changes by hand but that's not recommended so the compliance is easy to achieve and the infrastructure is very viable over code so truth in the repository matches the truth in the cloud and this is very powerful one big disadvantage we noticed with our monorepo for though it was that somehow we were able to change the wrong customer configuration because the customer configuration was structured all the same way only the directory name was different and if you are somehow under stress or have to change something in a hurry then you were able to change the wrong customer and deploy it into production which was not so fun and that's the reason why we said ok it's doable that way but it's hard to maintain so we said ok we need a different approach which I give over to Jan people explain it in more detail yeah this is our third try and hopefully the final approach we are doing here it's a combination of Terraform and GitLab Git in general Terraform is a description or descriptive language for infrastructure in the cloud let it be Azure in our cloud in our case is the AWS cloud so in our case we have several assets we are maintaining and developing and enhancing for example as I said the different Docker containers the engine X Docker container our actual software the headless control management system is also bundled in a Docker container and all the Docker containers are running on the Linux instances of course our infrastructure description itself is an asset and all our assets are versionized more for that we have several other assets as our plugins we develop for the CMS system or of course the AMIs the Amazon machine images which are representation of the later Linux operating system running on the servers so when we develop or enhance a Docker container for example the container gets built by a pipeline and when the build is successful the internal registry we are using it's a JFrog and after that the change the new version is propagated in a central repository per customer so we have one repository per customer which covers basically a configuration of the later infrastructure which gets deployed into the AWS cloud now the new version of this Docker container or the new version of the infrastructure description by a merge request is created in the customer's repository this merge request is automatically accepted and the new version is merged into the configuration file of this customer it's just a simple JSON file so now when we want to deploy a new customer cluster or new customer infrastructure we just trigger a job or pipeline in this customer repository the job which is responsible for that takes the configuration file and translates it into GitLab variables and calls the infrastructure repository and passes the configuration as GitLab variables into this infrastructure pipeline in this job that is triggered in the infrastructure pipeline the GitLab variables are also passed into the terraform description of the later infrastructure and using this we can see here things like the AMI image the AMI, the Docker container but also things like how many nodes should be there in the Elasticsearch cluster or which instance, which database instance we want to use in the relational database service of AWS and afterwards just the infrastructure pipeline runs executes the whole terraform stack and the whole infrastructure is built or updated in this cloud and that is basically the whole magic behind our approach one side note is we are not allowed to use unfortunately to use GitLab variables for secrets and so on so we have set up our own secret management also we are using Mozilla SOPS to encrypt the files covering all our secrets these files are then automatically decrypted and stored in the AWS secrets manager all this one note, all this is possible because our GitLab runners have permission to access of course to access our account in the AWS cloud otherwise this approach is not possible with this approach the secrets are stored in the secrets manager and when the Linux instances comes up in Amazon language it is called cloud the server started in this cloud in a script the secrets are taken from the secrets manager and passed into the Docker containers perhaps to get access to the database to get access to the Elasticsearch servers and so on and a very important concept behind this for us is the GitFlow approach for now we are using three branches to develop so when we enhance or change a Docker container maybe the engine next Docker container we create a feature branch do some coding whatever change a Docker file then the usual refuel process is done and if everything is fine this feature branch is merged into the development branch now the build pipeline for the Docker container runs gets uploaded into Artifactory and the new version of this container is created this new version is propagated into the customer's repository and in the customer repository we have the same structure of these branches so this new version gets merged into the configuration file in the development branch of the customer and what we can do now is each branch is a representation of an environment in the AWS cloud so we are able to create a cluster a testing cluster, a development cluster for this customer or we are able to create a staging in our case it's a staging environment in the AWS or production environment we could enhance it with as many branches as we need for now these three branches are enough for us so when we did our tests in the development environment everything is cool we merge the Docker container into the main branch the same process starts from the beginning the main version is merged into the customer repository now we can pre-test for the production the Docker container on the staging environment and if really everything is fine on Monday 10 am we push the button and deploy everything on the production environment of this customer now you can see a real-life example of this pipelines on the top left you can see how we trigger this customer pipeline in this case we trigger a development environment in the AWS cloud so we push this button afterwards the infrastructure repository gets called or the infrastructure pipeline one side note here we have two AWS accounts one for testing purposes this is where the development environment is created and one production account where we create the production clusters of the customer the staging environment and the production environment in this case the development pipeline is triggered and one click later you'll see the Terraform representation of the the Terraform pipeline one click thank you in Terraform you do several steps at first you validate your Terraform description can we build this infrastructure and the second step is you plan what has to be done to create this infrastructure and finally the whole stuff is applied in our case to the AWS cloud we first create the environment of our main application because as you know the application needs a running database needs the Elasticsearch service and so on so you first apply the network storage this is the Elastic file system it's attached to each server running our software the database is created as I said the relational database service and the Elasticsearch service and afterwards our service setup is really deployed and really created and starts up okay we're already nearly at the end so let's wrap it up what we learned today we had the requirements of the O20,000 and also the data privacy protection requirements from the U but also from Deutsche Bahn where we say okay no customer threat should be able to get data from another threat so a real multi-tenant system we had to build there and we had to implement the change management release management as you could see hopefully we are able to do this except configuration as code because of a customer repository where we start a configuration but also with a GitLab flow that we can do release management and change management very easily but that's not the only advantages we got from this approach with Terraform and the configuration as code for each customer we also got a few more advantages and I give the word to Jan once more and we will go over them one by one using our current approach it's very easy to create or to set up new customer stacks in ancient times it was really really hard because we had to do several things by hand and manually and it takes time, the more customers you got the more time it takes of course and now we can just create a simple repository where we have scripts to initialize this new customer repository and then we can just push a button and create the whole production cluster of this customer and of course I would do the next what we also have and this is especially important for me Jan came a little bit later so he doesn't know these ancient times he only knows it from my history explanation but I really find it very satisfying to have a rolling updates now if you are not changing the relation database of course because this needs a downtime we are not doing anything by hand we just say pipeline run with a new configuration and a new infrastructure configuration it just rows into production usually without any problem and this is made possible by the pipeline approach we had chosen and of course several other things we could talk about here and one thing is we have a monitoring stack consisting of Prometheus, Grafana Elasticsearch, Kibana and so on and every machine every server that comes up automatically recognizes itself in this monitoring system and this is also possible because of this automatic approach we can also deploy new dashboards automatically into Grafana we can export the metrics the Linux system, the CPU load and so on into Prometheus and that's a cool feature that helps us very very much and the last one to emphasize the compliance and security approach with GitLab pipelines or with pipelines in general it's very easy and it's very easy with GitLab especially because it's a containerized build we can add stages for compliance checks and update security it's very easily if we need a new tool for example open source license checking we insert a new stage and this is usually just one little block and a little bit of configuration so we are easily compliant in that regard and if there are security updates with the rolling updates we are having no problem deploying them in under one day and very fast so that's it for our talk I hope you got something out of it I'm pretty sure it's a little bit high level because of the short time we could talk about it much much longer I think if you go into detail but we are there for question and answer so you can ask two questions or more if you like if you want to reach us there are our business emails I'm also reachable at Twitter so thank you and have a nice day thank you