 Hi everyone, my name is Alex Viscrano. I'm working as a full stack developer in the platform team for TV.com. Most probably you all know what platform engineering is about, but for those of you who don't, we provide the building blocks to all our developers so they can just not think that much about how to set up something or how to deploy something, but they can just focus on the business logic. You can find me as a reverse user. So what is all this about? I will just make a small introduction about how Kivi started in infrastructure wise and how we deployed code at the beginning. What made us change? Switching from data center to cloud-based and then to cloud native, it's a pretty big change and we have to have good reason for doing so. Why did we chose cloud native technologies and how did we get there? Some small background about Kivi.com, we are an online travel agency, so we sell flight tickets. The main idea that people get usually from when I'm saying that we sell tickets is, yeah, a meta search engine. Not really. We actually own the tickets, which allow us to do pretty cool stuff. Our main feature is virtual interwriting, so we connect flights from carriers that don't cooperate together usually, and we are the ones covering the transport. But at the end, this is all meant to justify that we didn't really need any special architecture, just a database, something that will index those flights so it allows us to search. And that's pretty much it. Everything was running kind of like always on all the startups, a single server. Everything was running in the same place. If it went down, well, everything was down. Of course, we didn't really want to manage databases, pay someone to manage databases, so we self-hosted them. We didn't have any orchestration at all. There was no deployment pipeline, more than just some guy SSH into the server and pulling the new code and re-olding. And of course, isolation was something that we just could dream of. So of course, that didn't last for too long. After having to upscale our databases like three, four, five times, and every time we will bump into newer issues, because we were ingesting more and more flights. Searching was more complicated because of the same thing. We have more data to search over. Elastic search didn't really do the trick for us, so we just needed to change. And we added to something that, using like password stuff, it's microservices. And all the important part here is that all this transition happened naturally. We didn't have to really impose people to just switch to this architecture. People really needed those features. People wanted to have a single place where they could run their code without having to mix it with something else. They did want to run in the same server as a database, and whenever the application would use all the CPU, have the database down. People wanted isolation. People wanted the easy to deploy pipeline. People wanted all these features. So what we did with you? We split the responsibility of how the deployment was done to the teams themselves. Basically, as long as you know how to run it up to you, we started leveraging platform as a service and infrastructure as a service. Almost every product that Amazon offers, we used them, or we at least tried it. Of course, that increased isolation. At least, on the beginning, we didn't have a single server with everything running. But at least, we had some VMs running in some specific places, which is a radio layer. And that's when we switched to GitLab from GitHub. And finally, we could have proper CACD. This is our current, still, and best combo. We moved almost everything to Docker. We used GitLab as a main operations platform for as much as we can. And we deployed everything in Rancher 1.6, so not Kubernetes. And this has worked so far so good. Really, it worked as only for four years. But we needed to change. And that's why we are bigger and bigger every time. And Rancher, it was never probably designed to be managing as many nodes as we had. So we hit issues. And not only issues in the deployment itself, but also features that we couldn't have. Autoscaling, it's not granular enough. You cannot autoscale with Kubernetes. You have to have an autoscaling group in Amazon. And then, of course, you can have Rancher setting up a specific amount of containers per node. But that's not really based on much more than what Amazon can provide. Rancher 1.6 is not really being developed anymore. It's just small batches and small fixes. And everyone has Kubernetes. So why don't we? And also the community. There are every single day at least probably five new tools doing something in Kubernetes. Most of them are overlapping with previous tools. But still, it's a steady flow of new features coming from literally everyone in the world. We have huge companies using Kubernetes. And whenever they face an issue, they provide some tooling, usually. And that kind of has some question for us. When we will reach that point, we know that there's already people doing this, so we can learn from them. So we started to move somewhere else. In this case, Cloud Native Land. And this time, it wasn't natural. People was really comfortable with our previous setup. Actually, as there was no easy way for deploying from GitLab to Rancher, we built an open source tool for that. Everyone was really happy. They just had click button to deploy to Canary, deploy to production, and everything was fine. People liked to click on the Rancher UI to add hosts. They were fine with that. They didn't care about not having that configuration stored anywhere. So this was harder to move. It was harder to justify to people that, hey, you should move because it's better. So this is our current setup for anything new that we are launching. And we are migrating most of our stuff to this setup as well. We are still leveraging Docker images. Everything has its own container. Everything should be run isolated. We still use GitLab. Everything is fine there. And we switched Rancher for Kubernetes. And we are not really considering this as a huge success. We are getting there. So why is not that easy for us? For that, we have to explain our baggage. Where did we start from? The truth is we had hundreds of containers Python apps that were never really thought of running in some huge Kubernetes cluster where they have access to virtually unlimited resources. So decisions made during the development time were not really that good for deploying cloud native stuff. As I said, developers were used to Rancher UI. They really didn't want to bother with anything. They were just used to click on the UI and get a new node. There was no configuration rules. Everything was ad hoc. People who wanted something, they got it. But also that caused a huge upslode for us. Every time there was an issue with some DNS, some hosts being tricky or whatever, they were just being our DevOps team and asking for help. So we needed to enforce some sanity company-wise. The good thing we have is that most of our code base is Python. And they are most of it using the same approach. So it's kind of easy for us to define some standards and enforce them. And again, we have also collaborators from the other languages that we use, like JavaScript or C++. And we try to come up with these standards per technology. And what did we do for all this? The main thing is code templates. Right now, if you want to start a new microservice, we provide templates for you. You just need one of them, fill up your information, and boom, you get a server running. And it's everything considered best practices, the tools we want you to use, CI already set up, it's perfect. We provide infrastructure templates. We don't want people to have to figure out how to transform something every time. So we just provide, as much as we can, ready to use. CI CD templates. Having similar setup, as I explained, being most of the code base Python allows us to have some shared images and shared CI jobs that are just plug and play. You just include them in your CI template and you get static analysis, security analysis, everything for free. You don't have to do anything. Customization, from the beginning, we wanted to go tiller-less in the cluster, so we used customize almost everywhere. So we provide remote bases for you, again, not having to define your ingresses or define, I don't know, data doc agent or anything else. And we even provide automatic monitoring. As I explained, we have shared CI jobs, for example, the deploy to cluster. And in that job, we also added some cool stuff there that, for example, creates a data doc dashboard with all the information about every workload you have in the cluster, so you can have right sizing, information, anomaly detection, everything is for free. You just run that job and you automatically get it. And yeah, Terraform everything. Terraform is literally what is saving us for not having to develop all this stuff. And we try to use it as much as we can. So I just have some examples here. As I explained, we Terraform everything. So when someone wants a new Google Cloud project, you can just use Terraform. And this will automatically send you a Slack message with all the information about where your cluster is located and even that code snippet that will automatically generate all the source files for you to just push to GitHub and have it running. We also generate a readme with all the resources explained and how they are used and where they're located. Everything is as clear as possible. And we generate default resources for everything. We generate a vault path with some common secrets that we have. We generate let's encrypt certificates for everything to be working, generate DNS records, everything is ready only by using that Terraform module. Kubernetes namespaces. Again, we wanted to have some common way to define a namespace. And again, we use Terraform. In this case, it will automatically inject an operator to your cluster, being sure that the secrets are synced from Vault all the time. You will get a data log integration so all the logs are shipped automatically. You get monitoring everything it's set up for you. We even Terraform GitHub users. As there is no easy way for having bot users in GitHub right now, we just decided to let's have a Terraform where we can have a source of truth of these extra users that are not in our single sign-in. They are in this repository, and everything is there. Everything can be only removed by that serverless. We use plenty of serverless in GitHub, and we use, again, Terraform. We have lambda modules ready for you to be deployed. They have the minimum set of permissions needed to just deploy a serverless function. We consider all the features that people need. Usually, step functions, DynamoDB, everything that is there, and everything is ready and configured automatically. So you just create your module, point to the proper GitHub project, and we will automatically fill all the CICD secrets, and we will prepare everything on CloudFormation. DNS records, same story. Just create a Terraform module, and you've got a DNS record. Everything is ready for you. And we use Terraform for, again, as much as we can. We Terraform our ping-aum checks. We Terraform our vault path. Everything in Amazon is also Terraform. Google, all the new code is Terraform. And we even authored some providers when we couldn't find any for Cloud Passage Hello. We just decided to make it. It's open source. Everyone can use it. And yes, everything that is right now being deployed has to be infrastructure as code. That brings us many advantages. We don't have to be guessing how something is configured or wanting to change something because there was any issue, some leak, and not being able to know even where that thing is deployed. Actually, this is our GitLab deployment pipeline, which we use to deploy GitLab from GitLab itself. And actually, we can just deploy a new version of GitLab in three minutes. That's awesome. And these are the main advantages we got from all this single source of truth. Again, we don't have to guess anything. Whenever you want to know something, you go to GitLab, search for it. You will find the Terraform definitions. And that should be what is applied. All the changes are kept in history. Again, if something went wrong, is it to revert commit or to just know why was it done and ask the person who did it because maybe there was some mistake? All the changes are reviewed. So also a good thing there. It's also easy to deploy changes that affect all the infrastructure. Having all these centralized templates and points of source of truth for all this stuff allows us, for example, to rotate our vault configuration across all our infrastructure repositories with a single change. We use centralized security management vault. It's used everywhere except for GitLab CI, which will be used there soon. And also centralized network management. We have a specific project that only hosts Terraform definitions for how our network is defined. So everything it's easy to use. We have just VPCs defined there. Everyone can just use them. And whenever there is a need for a big change, it's easy to review it properly and also make sure that it's not affecting the other resources. And that's it from my side. Thanks, everyone.