 Hello, everybody. Welcome to another ArgoCon presentation about a very exciting topic, how you use GitOps with databases, so not application and not infrastructure. My name is Costes. I'm working for CodeFest as a developer advocate. And with me, I have Rotem. He's the CTO at Arriga. So what are we going to talk about today? First of all, we're going to see what people do today about database migrations and how they integrate Argo CD. We should see some solutions which we don't really like. And then we will give you the proper solution, which is also the correct one. And you will also get a demo, like always. So let's look at some history. First, I'm going to talk about applications. And this slide should be familiar to everybody. In the beginning, people were deploying just by pressing buttons. Then we had the scripts. And then people said, OK, scripts are not very good. They break all the time. And we moved to declarative configuration. And then finally, we are here today at GitOps, which is how you should do things. Now, if you don't understand the difference between declarative and GitOps, I'm not going to talk about this today. But you should all know about the GitOps working group. You should visit the website. Or ask questions. Declarative is just one of the principles of GitOps. You can imagine GitOps as a superset. So that's another interesting topic, but not for today. And here I have the exact same slide for databases. And things are not very good here. Most people are doing either manual deployments or they have some scripts for database migrations. Almost nobody is using declarative configuration for databases because not many tools exist. And unfortunately, nobody is doing GitOps for databases, which is something we should change. And this is something I see all the time with companies. They have a great workflow for applications. They say I can deploy to my application five times per day, five times per minute. And when I ask the same question for databases, they say, oh, we did it manually. Or we deploy once per week. And as you know, it's always the weakest link in the chain. So if you're deploying to your database once per week and you have database changes, then it will happen once per week. So this is really sad because now you become the bottleneck. Your bottleneck is the database, even if your application deployments are optimized. So first, if you have seen my previous presentations, you know I'm a bit controversial. I will start with things that you should not do. And hopefully you will not recognize yourself here. The first and the pattern is obvious. Don't do things manually. So if you don't like deploying applications manually, you should also don't migrate databases manually. You don't have to talk too much about this. If you do anything manually, errors can happen. We are all humans. Nothing is automated. Nothing is not repeatable. You ask people how you did this last Thursday. Oh, I missed this step. Yeah, let's do this step again, and so on, and so on. And it's super stressful, especially for databases, even more because they contain critical data. So don't do this. The other anti-pattern that I see a lot of companies, and I'm the first one to admit I have done it in the past, is when you do database migrations as part of your application. So you launch your application. The application itself updates the database to the latest schema. This was a very popular pattern in enterprise applications in Monoliths, in the Java world. You know how it goes. As I said, I have done it myself. So this looks like a good idea if you're doing a Monolith, but not today. You shouldn't do it. First of all, if you have a good security team, they will tell you that having database deployment tools and deployment clusters is not a good idea. For security reasons, also if the migration fails, the application startup fails as well. But specifically for Kubernetes, and like in Monolith, usually in Kubernetes, you have many instances of your application. So if you launch five pods and all of them try to update the database at the same time, you would have issues. Now I know what you're going to say, that the database is built for concurrent connections and my tools ought to detect that, and so on and so on. But you are solving a problem that should not happen in the first place. Why solve this problem? Don't have it in the first place. So don't do this. So what we should do instead? It should be obvious we should automate database migrations. And essentially, we should handle them exactly the same way as application deployments. So if you say that when I'm deploying an application, I have full control over the deployment. I have my artifact. I know where my artifact is. I have a tool for orchestrating everything. You should do the exact same thing for database upgrades and treat the schema as another artifact. DB migration should be a discrete step. It should not be coupled with the application startup. It should be something separate. It should have its own workflow and lifecycle. And in the end, database upgrades, and this is why we're here today, should get the same respect as application upgrades. Pay the same respect to database migrations, the same respect that you pay to application. They should be handled in the same way. So now let's talk specifically for Kubernetes. What people are doing right now? These are the usual solutions. The first one is obviously the application startup. You shouldn't do this. And then people have seen the Kubernetes features that they get built into a cluster, and they try to map those features to database upgrades. So people say, oh, there is an init container, and I do some stuff before the application starts up. Oh, let's do database migrations there. Not a very good idea, because, again, you are coupling your application startup with database migrations. You can use Kubernetes jobs, which, again, works but has some issues. And what most people would do probably in this conference is to use Helm hooks or Argosy theme hooks. The solution, of course, is to use a specific GitHub set of database operator that we will see today. Now if I go over the disadvantages, if you have tried the solutions, nothing is perfect, especially the first two features are really important. People just pack a CLI tool for database migrations, and there is no visibility on what this tool is doing. When this tool has finished, things are coupled with application startups, so init containers are not a good idea. With Kubernetes jobs, maybe you have some more flexibility, because you can run a job before your application deploys. So you have some control over the ordering. But, again, most people just pack a CLI tool in the job, so they have no visibility. They don't know what is happening. And then if you have a large number of applications and Kubernetes jobs, you need a way to correlate which jobs run for what applications and how they've finished. And this is probably the most interesting one for Argosy theme. People are using Argosy theme hooks, especially the precinct hook for database migrations. And this works for simple cases. But if you look at the Slack's panel of Argosy theme, almost always people ask, yeah, I wanted to run this precinct hook for the initial application deployment, but now I have a rescind for another reason. And I don't want to run the rescind for the application deployment, because the database migration takes too much time. So they ask the question, how do I disable precinct hooks for this specific reason, which is something that you should know you shouldn't work against your tools. There is also no visibility, like the precinct hook is just a black box for you. You don't know what is happening in the database. Actually, you don't know if it has finished successfully or stuff like this. So even though this looks like a good idea, it's not. And there is a better solution. And this is a social proof. As I said, I'm following the Slack's panel of Argosy theme. People are asking the same question, how do I do database upgrades? Or I'm doing database upgrades with precinct hooks. And it doesn't work. And what's the correct solution? So until recently, there wasn't a correct solution. But today, there is. And this is what we are going to talk about, a GitHub operator for database migrations. Thanks, Costis, for the wonderful intro. My name is Ote. And I want to talk about how you can do GitOps for database migrations using a tool that I had the great privilege to be one of its creators and maintainers in open source, which is called Atlas. Atlas lets you manage your database schema as code. We open source it in 2021. It's widely used by thousands of projects in the industry. And today, I want to talk about one of its most popular integrations, which is the Kubernetes operator. So the Atlas operator installs two CRDs, two custom resource definitions, into your cluster. The first one is called an Atlas migration. It is used to manage version migration flows, which you may be familiar with. And we're going to demo it in a bit. And the second one is called an Atlas schema, which is used for declarative migration flow. We're not going to demo this today. But the basic idea is that you provide the operator with the desired schema of your database. And the operator runs a reconciliation loop to automatically plan, apply, monitor migrations to your database for you. This is a relatively new concept. But it is definitely supported by the Atlas operator. The operator supports many popular databases, such as MySQL, Postgres, SQLite, and recently even Microsoft SQL Server. To prevent you from doing stuff that will accidentally cause data loss or cause you all sorts of production issues, the operator comes prepacked with tons of safety features and lets you define different policies about how the operator should behave. This is an advanced topic that we can cover on a different date. You might be asking yourself, OK, so there is some friction with existing solutions. But why should we go through all of the trouble of installing something new, introducing new CRDs, extending the Kubernetes API just for database migrations? And I want to resort to the wise words of someone that I really like. You maybe you know him from his famous DevOps channel, DevOps Toolkit. And Victor says, sure, we can wrap existing schema management solutions into containers and run them in Kubernetes as jobs. But that is silly. That is not how we work in Kubernetes. I want to unpack this statement and give three examples or three things in which an operator is superior to simply wrapping some other tool in a Kubernetes job. So first of all, resilience. When you have a piece of software running in your cluster whose sole responsibility is to reconcile between the state of your cluster and the desired state that you define to the Kubernetes API, it is by definition capable of doing more advance and smarter stuff than what a job can do. A job at best can retry until its retry policy is depleted. But operators can look at the current state and make decisions, informed decisions about what to do. For example, if a migration failed halfway, the operator understand the semantics of the database and can make decisions that a simple job cannot. Secondly, the operator model is all about extending the semantics of the applications that you can define in Kubernetes. CRDs have a spec. They have a definition, what we want things to look like, which we can work, mutate, and analyze with all of the existing Kubernetes ecosystem tools. And the custom resource exports a status, which means that you can observe its current situation. For example, has the latest migration successfully run, or what is the current revision of the database? Finally, the idea of operators or the reason that they are called operators is because they are supposed to codify the best practices in each target domain into software. So imagine if you had on your team the best DBA in the entire world, and you could take the workflow and how they analyze a difficult situation and codify that into software, that is what an operator is supposed to give you. So, enough talking and some actions. Let's show a demo and everybody please take a moment to pray to the demo gods, may they be with us, and let's hope this works. Okay, so what I have is a local MiniCube cluster with Argo CD installed. I have a single, I've connected my computer to Power because you know what MiniCube can do to your battery. I have a single application currently running, which is a database, it's a simple, MySQL container exposed with a service. This will act as the target database that we are applying migrations to. In your case, it will probably be some instance that is managed by AWS RDS or GCP Cloud SQL or something like that. In our case, we're simply using a simple MySQL container. And now I wanna show the example application that we're going to be deploying into this cluster. So it has two important components. The first one is an Atlas migration. This is the custom resource that is introduced by the Atlas operator. What it does is basically wrap our migration directory that contains the database changes that we want to apply. Whenever we make a change to the database schema, we add a script here and it is being added to the deployment. The second component is a simple Kubernetes deployment. That is, in our case, just a placeholder for the backend application. The reason that we're including it in this demo is because if you recall what Costis explained, it's crucial that the database migrations complete successfully before we roll out the next version of our application. If our code expects a new column to exist, it will be very unfortunate if the application starts up. It doesn't know that the column was not created and makes queries to the database that result in a failure. The way we achieve this in our code CD is by annotating our resources with a sync wave. So our migration resources are in sync wave number one and our backend application is in sync wave number two. Let's now apply our application to the cluster and let's see how this rolls out. So you see initially the engine expods are not being installed. They're not being rolled out until the health check for the migration completes successfully. And you see that once the migration completes it, it's completed successfully. Argo CD concludes sync wave number one and moves on to deploy the application. We can look at the events that are emitted by the operator and see that this is the most recent version that was applied. We can look at the current manifest and see that the status exports the exact condition of our migration and what migrations were applied and we can consume this with other tools like Argo CD is doing. Now, just to prove that our migrations ran successfully, Atlas has a nice command line tool that has a nice feature that people seem to enjoy which lets us to create this nice ERD basically connects to the database and produce this diagram for the current schema of your database. Now, let's just show and to end how we deploy a new change to our database. So what we're going to do is we're going to change the add a new column, Argo con and we're going to use Atlas. One of the features that people like in the CLI is the, sorry, I apologize. We need a running database for this to run and one of the features that people like in Atlas is its ability to automatically plan migrations for us. Let's just see if this will work for us now and Atlas can calculate the diff and produce the immigration for us automatically. Once this happened, we will commit this new immigration to Git. Let Argo CD sync the application and see how the operator will run our deployment. So we are now going to add a new column. We're going to push this to Git. Once this is in our Git repo, just to expedite things, we're going to refresh our application. We see that Argo CD goes to Git, looks what is the new desired state of the cluster and we can see that our second migration has been applied successfully. Finally, let's just prove that the schema was updated and we can see that the Argo con column was indeed added. So there you have it, end to end, CI CD, GitOps based workflow for your database. Going back to our presentation, the advantages of running a migrations with an operator, this is the Kubernetes native way. We extend the API by adding new resources and writing controllers to manage them. It's super easy to decouple migrations from your app as a discrete step as we've shown using sync waves. We have plenty of safety features to prevent bad changes from happening. The custom resources expose a clear API that you can consume by other tools just like Argo CD did for us. So you can build higher order workflows on them and in conclusion, it gives you 100% GitOps automation for your database schema. So I think this is one of the most important slides of the presentation. You know, many people come to us and they say, I love Argo CD. It's great for applications, but I have other stuff that are not Kubernetes. What do I do? So if you want to use Argo CD with infrastructure, the answer is crossplane. You should check out the project if you haven't seen it. And now if you want to migrate databases with Argo CD, the answer is the operator. So you have a trinity of tools that allow you to apply Argo CD to everything and have a uniform way of working with all kinds of resources regardless of whether they are in Kubernetes, outside of Kubernetes or database migrations. So this is like the proper way to work with Argo CD and treating, giving the same respect to everything, not just, you know, applications, but also databases. So what have we seen today? You have found about the new Atlas operator which is a Kubernetes native solution for managing databases and doing database upgrades. It defines its own Kubernetes resources. So you have two new Kubernetes resources specifically for database migrations. It's open source. So go and try it right now. This is the URL. There are some features that we haven't shown. You can define whether you want to explain to your operator how your database would look and then it will take some diffs. Or if you want to do it the old-fashioned way, you can just provide SQL statements and say exactly what you want to happen with database up to you. Both are supported. So from now on, you should know how you treat your database migrations exactly the same way as infrastructure. Questions for three minutes. Thank you. Yeah. Oh, question here. And there's also a mic up there as well in case someone wants to, people want to start queuing up here, but go ahead. Thank you. I just wanted to ask, how do you handle rollbacks? He knew that was coming, looks like. We had a bet. So there are some backup slides, specifically about rollbacks. But I think it will take, no, do we answer this or not? You see how well prepared we are. Maybe give a short answer so we can answer other questions. Okay, so I'll give a very brief answer and we can take this later and go a bit deeper into this. But most migration tools tell you to pre-plan your down migration, right? You plan the up and the down. However, in our experience, practically, no one, not one single person, then I've talked to hundreds of engineers that work in databases, it's my job. Nobody uses these down migrations in production in development. Yes, but not in production. Why? Dealing with partial failures, okay? When you plan a migration, it has multiple statements, right? The down migration assumes that all of them succeeded, but oftentimes, especially if you're using MySQL or a database that doesn't support a transactional DDL, okay, then you have, kind of you're not in version one, not in version two. So you can't apply the pre-plan migration. You need to know exactly what to rollback. The second thing is sometimes rollbacks don't happen for deployment failures. They happen because your product manager tells you, please rollback to the version we had yesterday. Now we had the version alive for a few minutes. If we run the down migration, we're going to drop these columns and lose this data. Maybe we want to lose it and maybe we do not. But the point is that if you are going to rollback an application that involves a stateful component, your database scheme, for example, you need to make a decision based on the situation on the ground and the proper situation. What we advocate for in Atlas is something that we call declarative roll forward, right? You can point to Atlas. Get me to this revision, okay? And let Atlas calculate the diff based on your current schema and on the diff from the desired to the current schema and get you there in the proper way. This works amazingly well in the CLI and it's coming to the operator very soon. So we're happy to share progress on that on our newsletter and stuff, but this is the short answer. Yes. Okay, hello. Hey, awesome stuff. I was wondering, we use Flyway currently and a problem that we have with Flyway often is just version issues, right? Like if we're in dev and devs are doing weird stuff, we end up with this fake version and then we have this weird forking of versions and applying different things gets really annoying. How do you deal with that and how important are those database versions that you have there? Okay. Atlas has a built, maybe noticed in my Git repo I had a .sum file. This is a mechanism that we just develop especially to let teams enforce linear history so you don't have these situations where you're not sure what's going to happen in production. I have a whole talk about that. I'm happy to explain about that outside. Thank you very much. We're outside. I'm also at the conference booth if you want to ask more. Thanks a lot.