 Hey, everybody. Oh, hi. It's me again. Michael Crenshaw. Thank you for sticking around. I think this is going to be a really exciting talk. So again, I'm Michael Crenshaw. This is Zach Oller. We both work for Intuit. We are on the team that runs Argo CD and Argo rollouts for Intuit. And this talk is about Space Age GitOps and how pull requests can help us think of the next generation, the next evolution of GitOps tooling and practice. The way I want you all to approach the talk is, this is very much a proposal. We are presenting a thought experiment. We're going to have some mockups about what we think GitOps could look like in the future. And we really need the expertise that you all bring to this room to tell us, are these ideas good? Are these ideas going to help you solve the problems that you have? So come to it with that mindset. I think it's exciting stuff, but we definitely need your help to make it as exciting as it possibly can be. So first, a little bit about Intuit. We are a financial services, financial software company in the US. We have products like TurboTax, QuickBooks help individuals do their financial stuff, help businesses do their financial work. And mine and Zach's role in Intuit is helping manage our internal developer platform, of which Argo CD is a massive part. The three numbers that I want to focus on are the number of teams we have, more than 1,000 developer teams, more than 2,000 services. And finally, across all of those services, we have about 44,000 Argo CD syncs a day. So that is between rolling out a new change, promoting that up through environments, a massive amount of Argo CD activity. So any change that we make to how that system works could have a huge impact on developer experience and velocity, which is our entire focus. So in order to improve how continuous deployment works, we need to understand the basic parts of it. And I think it roughly falls into these four categories. First, either a human or a system proposes a change. And for all of us, that change is Kubernetes manifest changes. Second, we deploy that change. And we use a GitOps operator like Argo CD to synchronize the change from a Git repository out to your live Kubernetes cluster. Third, we typically promote those changes through a series of environments. And in between the environments, you're going to have some either humans check and approve to go to the next level. Or ideally, you have some automation that is checking for you. Is this change safe? Can we move it along to production? Finally, if something goes wrong, you need a system to revert that change. And again, ideally, you want to have some automation that's like watching metrics, watching logs, and automatically moving you back to a previous known good state. Today, Argo CD and GitOps controllers in general focus on this deployment stage. And they're very, very good at it. Continuously reconciling your Git repository state out to your cluster. But they tend to not focus as much on these other areas. Now there are, as we heard in the keynote, spot solutions for some of these problems. But there's no unified common practice for how GitOps incorporates all these steps. So today, what Zach and I are going to present is what we think will be an awesome way using pull requests to unify this full experience under Argo CD. And we're going to start by talking about how GitOps works today. Again, Intuit does a massive number of deployments using GitOps. And we are going to walk you through kind of what a day in the life of an Intuit developer is using our GitOps system. And we're going to highlight pain points that we think a lot of people in this room are going to resonate with and recognize from their environments. Starting with proposing a change, it looks like this. In our GitHub enterprise, we, suppose I'm a new Intuit developer. And my first task is make a change to an application, make a change to some configuration for that app, and deploy it. I'm going to start here. I've made my changes to the app code. I've created a new image, image tag pushed it. And I'm changing something in my configuration, change from user group one to user group two. At this point, I would go to another developer on my team. And that would be Zach, a more senior developer, to review this change and help me get it out to production. So from here, Zach's going to describe what that process would look like. Yep. So as the PR reviewer for this, I tend to generally have a lot more questions that need to be asked and answered. One of the bigger issues with to figure management within Kubernetes is this idea of expansion. So a single line change in a Helm values file could end up being hundreds of lines of YAML output. Customize has variables, expansion, and a lot of the tools around config management within Kubernetes kind of has this expansion process. So when I look at this PR, I don't have a whole view of what's going on. I don't know what environments it's infected. We see that this is customized as an app base. This could affect multiple environments that aren't really clear by just looking at this PR. So generally what I would do is I go back to Mike and be like, hey, can you render this output for me? Can you show me what this looks like when it's installed? In order for Michael to go and do that, it's a pretty time consuming process. He has to go check out the main branch that's currently running, render it with customized build or whatever tool he's using. You see here, he's doing it for a very specific environment, which is dev. Go do the same thing for the proposed change and manually generate a diff, attach that diff to the PR. A pretty time consuming and error prone process. But at the end of the day, I get a diff in the PR that shows me the final output. And one of the things we can see is that we have the original online one, the single config map change. That expanded to also changing service selectors in the labels, which you can see when you're up at two. As a PR reviewer, that changes my mindset when I'm viewing the PR to spend more time on that particular piece because that has larger possible effects and causing outages, etc. So I want to do a more thorough job of reviewing that. Because that is such a time consuming process, most people don't do this in their environments. And what they do instead is they turn off auto sync on Argo CD, which is, you laugh, so lots of people do that. It's very true. I was going to raise hands and see how many people run auto sync. So that is one of the kind of big downfalls of what happens. You also get this idea of drift when you turn off auto sync. So with auto sync not on, you're deploying your changes, so you're doing this all kind of manually. And you forget about it. So you've committed your code to change. Argo CD is out of sync now, and you get interrupted to go do coffee, so you don't hit sync. Can you come back later? Now all of a sudden there's like three extra changes in there, and you have no idea what this is, right? There's this big drift between what's been committed to get and what Argo CD is wanting to deploy. So the diff view. You use Argo CD as a wonderful UI. You get a diff. You can see the exact same change that's been rendered. It's nice to not get off. It's got it's sharp edges all over the place. So yeah. So anyways, I've reviewed the PR now. It looks pretty good. The next phase of this is promotion, which I'm going to pass off to Michael. And he's going to kind of walk through his promoting this PR through to production. Yep. So the diff process wasn't awesome. Maybe we've turned off Argo CD auto sync. Not great. But we've gotten past it, and now we need to promote the change through the various environments dev test prod. So merge the PR and into it, that promotion process would look like this. You have a Jenkins pipeline. You're using your CI system to do a bunch of things that if you look at the view, I don't necessarily understand or care about as a developer. Platform specific details have leaks through this interface, and it leaves me somewhat confused. So I'm seeing things like transition, your tickets, code analysis. I have logs down at the bottom that are talking about archiving artifacts. Again, all I really care about is this one note up here, which says we're pushing this change to dev. And then I'm going to want to know when it goes to test and when it goes to prod. And what I'd love to have is an interface that lets me focus on just those things. When something goes wrong, this is the interface I get. Again, a bunch of nodes that I don't necessarily understand leaking platform details. And then logs down at the bottom are my only way of going and diagnosing, okay, what actually went wrong and how do I keep my chains to prod? I'm a new into it developer. I don't want to get fired. I want to get my chains to prod. So I go to the console output, and as you can see, 345 kilobytes of logs. That's 5,000 lines of changes or 5,000 lines of logs for this change. And it's just full of stuff that I don't necessarily understand. I see folks laughing. I know you've seen this in your CI systems. You're doing this with your CI systems, just hacking these promotions. And it's not a great user experience. So that's my experience of promoting. When something goes wrong, like we've seen in this slide, then we get to the stage of reverting the change. And Zach's going to walk you through that process. So that PR that we brought out did have a problem. We had some service selector changes, right? Which caused our traffic to black hole because there was no matching label change. So now what we want to do is try to get back to a healthy state as fast as possible. And to do that, you generally have kind of three-ish options. If you're using Argo rollouts, you have that option. And we'll focus on that a little bit. With Argo rollouts, Argo rollouts only understands today the pod spec change. So when Argo rollouts, if you were deploying your changes and analysis run would fail, it would roll back. It would roll back just the pod spec. But that doesn't actually solve the problem because the real error here was the service selector changes. So that experience could be improved. The other thing that people tend to do is use Argo CD's UI once again to history and deployments and go back and use Argo CD to do the revert. It solves the problem of not reverting the whole entire application. But this UI is lacking in user experience. You don't really know when the last healthy commit was. You don't know what was changed in that commit. It's a little bit harder to decipher. You're going to have to spend time making sure that you're going back to the version that you actually intend and want to go back to. So what the third option ends up being is, looks somewhat like the proposal step, is basically do it and get, right? We always do get up. So let's go to get and actually commit the reversion there and see what happens. So this is the revert for the reverting the broken changes. To review this, I have the exact same problems that I had during the proposal. I don't actually know that this is going to undo the service selector changes or any other configuration that might have been expanded. And so it's kind of error prone and time consuming to generate, you know, when you're in a broken state, you want to get back to healthy as soon as possible to go generate rendered outputs for all the environments and do that as a non-pleasant experience. So that's kind of the experience today. Proposing is genuinely to wholesome error prone. People have to generate these diffs if they want to see what's changed. Just doesn't work very well. Promoting is kind of a little bit more of a wild west. There isn't really a clear pattern. And people do Jenkins pipelines, bash scripts, manual. There's no real clearly defined way on how you should promote and get ups. And reverting suffers kind of a lot of the same problems that proposing does. It's just a lot of unknowns. You as the end user are doing the process. It's kind of in the dark, a lot of it. So let's take a look at kind of mind of Michael's proposal for a get ups workflow of the future. We will set up the exact same situation. Michael has asked me to review this PR. It's a simple PR. There's a label change in a config map and an image bump. So however, this experience now as this reviewer, I would come back to the PR and the big difference we have this time is some Argo controller came along and you'll see gave me a comment on my PR that has all the environments that were affected by this particular change along with diffs that we can kind of do the auto rendering for. So I will come in and I will click on one of the diffs for one of the environments. I get the exact same diff that Michael had to generate by hand in a much more repeatable, concise, fast way. I can see that the same changes happen. There's service electors, all that stuff is great. I kind of fully understand the PR now. So the next step is we'll have Michael promote it. Yeah. And I've highlighted this hash that was generated when we merged the pull request because that's the unit that I'm going to be promoting. It's get ops. I'm promoting a get hash. And this is what the user interface is going to look like for it. Not a Jenkins pipeline, just the GitHub pull request interface. And what's happened automatically the moment we merged that pull request is three PRs were opened. One has already been merged. It is the PR which is promoting the change to development. There aren't any prerequisites in the dev environment. And now we have test and prod waiting. Those little yellow circles you may recognize in the get up interface, that just means we still have checks running. So something needs to happen before these PRs get merged and before we move to test and prod. So let's look at the test PR. Along the top line, you're going to see that this is a PR against the test branch from the test next branch. The contents of that branch is just flat manifest files. They've already been rendered by customized. And if I hit the diff button on this PR, I'm going to see the exact diff that Zach just looked at. I know what I'm actually promoting now and I can see it and I have confidence as a developer what's going to happen when this gets merged. Second bit, as you see, Argo has given me a nice little description of what this is. And third and most exciting, you've got these checks. First, we ran a check that said is there a deployment freeze in effect? If there had been one in effect, then this check would have failed and we wouldn't be allowed to auto merge this PR. That's standing in for whatever checks you want. It could be you write your own check to say you have to have a service now ticket and it has to be approved by a manager. It could be security checks running security scans on the manifests, whatever. But that's the first check that we're running. Two that we get for free from Argo CD are the prerequisite environments are synced and prereq environments are healthy. This is what's preventing this PR from being merged and going to test. The dev environment has been synced, but our commit hash isn't healthy there yet. So the moment that becomes healthy will merge. If I hit the details button on this check to see why the environment isn't healthy yet or to investigate further, an environment something like this is what we'll get. And this is a very rough mock. This imagine this changing significantly, but the basic ideas are here. We have the dev environment that is still progressing. We're trying to get to the BF 14 14 B commit hash there. Tests and prod are healthy, but they're on a different commit, the previous commit. Between each environment, we've got a pull request representing the state where we are going to merge that and that's going to cause the promotion to occur. And then for each PR, we've got a list of the different checks that prevent us from moving on to the next stage. Before we go to prod, I've got a CR approval. So I'm just showing that you can have a variety of checks. Every single thing in this interface is CD related. We're not mixing the concerns of CI and CD. And every single thing ties directly to a Git concept that your developers already know, love, and understand. If I hit the environments dev button, it takes me to GitHub. I see my branch. If I hit the PR button, I go to a PR. If I hit one of these checks, I go to the PR and see the check in my list. All Git concepts that people are already familiar with. But even in this bright new future where everything is Git concepts, things go wrong. And this is what it would look like if the dev environment had an issue and we needed to revert. It's again, it's just a PR. It's just standard Git stuff that gets opened automatically. And we get this button that we can click and go look at that experience. So Zach will walk us through what the rest of the revert experience now looks like. Yeah. So instead of having to make the choices between what three versions of how I want to roll back, I automatically just got a PR opened up for me. Which means I don't have to spend time, you know, degenerating, rendering, manifest. If I were to view the diff here, I would instantly get to see what I'm undoing, right? And when you're in a time of pressure and things are broken and trying to get back, having a clear idea of what's going to happen when you smash merge is good. One of the only real difference between this and the promotion process is generally there's no checks enabled on it because it was driven by whatever health of your environment is. So reverting, fast, easy, painless, I was clear of what was happening. It's kind of nice. So the new system that we're kind of proposing here is proposing was generally really clear, right? During that whole process, I knew what was happening. I didn't have to go back and forth with Michael and I was doing the PRs. I had a hydrated version right in the PR that I'm familiar with. Promoting, everything was in get, right? The state of all my environments. I just had to go look at the environment branches and see what version it was running, what the configuration was, etc. And when problems do arise, I'm still in the know and I have automatic PR generated. That's just a quite a nice user experience. And if I were to boil down sort of the thesis of this talk, it is in a GitOps environment, Git should be your interface and not just your database. Your developers are already familiar with Git. That's where they write their code. They're already familiar with your SCM. That's where they get code reviews and collaborate with their coworkers. And there's also just this massive ecosystem of things that integrate with Git and with your SCMs. So you can take advantage of that knowledge that already exists in your developers, the massive Git ecosystem, and use Git primitives to do GitOps and avoid introducing new and confusing concepts or mixing concerns between CI and CD. So that's the thesis. Again, this is just a proposal. This is a set of concepts. If you go to that QR code, you're going to get three links. One is to review this talk. So do that first. Be sure to give us an awesome rating and tell us what you liked about the talk. Second is a link to a proposal in GitHub. It's still in a very early stage. We've written out CRD manifests that we're proposing, new controllers, the basic architecture is there. But what we need from you all is to read the proposal and make sure that this would actually solve the problems that you have. Let us know of any concerns you have about the basic ideas. And then finally, we have a link to a Slack channel where you can, it's just about space age GitOps, where we can discuss these concepts and you can give us feedback. And so with that, ready for questions. Thank you so much. Yes. There's mics up, like over there in the hallway. So if you want to queue up for the questions, there's one there. Yeah. Two mics on the aisle. And then one here, so. Yeah. So folks find these two mics so that other people can hear your question if you have questions. I know you'll have to kind of shuffle out the end of your row. But yeah, there you go. Hey. Hey. Hey. Han from AT&T. Got a question about cargo and how it interacts with your vision. Sure. Yeah. So so cargo is tackling a lot of similar problems. I think the fundamental difference between what we've discussed here and how cargo approaches the problem is cargo is approaching a world where folks are already mixing these concerns. They're using customized set image tag to bump an image number in a CI pipeline. They're using Helm template to inject a new image tag. I think that by returning to sort of the primitives of Git, we can supplant a lot of the mess that we're dealing with right now in these CI pipelines and avoid the need to even use sort of complex pipelines, a new user interface, new CRDs to represent these concepts. Instead, we can use existing Git concepts to think about our promotion and reversion strategies. So I don't think that they're necessarily opposed to each other inherently. I think that they could even coexist and really complement each other in certain environments. And there are some environments where you've got, you need cargo, like you're going to need the ability to do the customized set image tag. But that won't be every environment. And I think that this has some advantages with this set of tooling as well. It's a good question. Thank you. Yes. Oh, hi. You're proposing branch per environment? Not branch per environment. This is going to be the main question we get. So I want to clarify the heck out of this right now. Still directory per environment, but that is in the dry manifests. The don't repeat yourself, your Helm template with your values file, your customized overlays, in there each environment gets a directory. So that's the human interface and the problems with branch per environment are a human problem. People get frustrated dealing with these branches. It's a difficult mental model. So you're going to have your humans interacting with the directories. There will be automation that pushes to branches. And there are reasons why that's preferable for the humans as well. If I want to know what happened in dev, I look at the commit list for the dev rendered manifest branch. I can see diffs between whichever commits I want. So yeah, it's an excellent question. We're still sticking with directories for environments. Good. That's good. But are those in the same repository then? Or is it the same repository? I don't think we've nailed down necessarily, whether you might have a place where you want to render out to a separate repository. I'm not sure yet. I think for the one, you keep them in the same repo. All right. And I think that means that we have to be done with Q&A. Is that what that means? Okay. We have to be done into it. Has an open source happy hour tonight. Come talk to us. Will, we can talk with you some more. Also see us at the booths. Thank you so much.