 I was sitting in front of floor-to-ceiling windows on the top story of the building where I worked. It wasn't a very tall building. The view was was enough. It was of a free way that I was going to sit in traffic in in a few hours. And I felt like I was succeeding at a lot of things. I had deployed Kubernetes for my organization. It was automated. I had set up all of these things that let developers use templates and deploy applications quickly. And I was a good sysadmin. I was a good systems engineer by automating those things. And even if you're familiar with Core OS at the time is what I was using, I had automatic updates, ability to have automatic updates. And most people turned those off because they were worried that things would break. I left those on and I had this cluster set up where it was a bare metal cluster. And my automatic updates would deploy updates every weekend for me. I set up a time that they could do those updates and I would come in on a Monday and the cluster was upgraded. And I was just like, wow, I have made it to this peak of Kubernetes, this peak of sysadmin where I had automated some of my job, and I didn't have to do that anymore. And I was sitting in front of these windows because I was writing a book about cloud-native infrastructure. And I was writing it with my co-author, Chris. And I was waiting for her call. And we had one last session to brainstorm because we were writing this book and it was focused on things we learned in Kubernetes and in the cloud. And we got on the call and we were going over a couple sections of the chapters that we just didn't know how to put it in words of what we learned. And she said something that I'll never forget. She said, if all of our infrastructure is APIs, then our infrastructure management should be applications. It shouldn't be a repository of automation and a repository of code. It has to be software. It has to be something that is running that manages those APIs. And everything that I thought I did right about Kubernetes was, I had just built automation. I had built the same thing I had done over and over again. And I just automated pieces of it. But she was right that we had to write software. We had to create software that controlled APIs. And software is all about just taking data, changing that data, and calling APIs. And pretty much every application and every piece of code is doing something on those lines where it's taking something in and it changes it for whatever you want in your business. And then it calls APIs. And if all of our infrastructure is APIs, then our pieces of software that should be running to manage our infrastructure had to do the same thing. We couldn't rely on infrastructure as code anymore. Infrastructure as code was just the automation piece. And it didn't scale as well, because you had to trigger it certain times. You had to make sure it worked. And from that point, I was like, what is a thing that exists that does infrastructure as software? What is something that is running software that takes in data and calls APIs and then applies that? And the very first thing that was obvious to us was a Kubernetes controller. And a Kubernetes controller just calls APIs over and over again. And it looks at the state that it wants and it looks at the state that it needs. And it makes it happen. And that was the very core piece of infrastructure as software, this idea that, oh, we should actually be managing our infrastructure with these control loops. And we found other things, once we started looking around, this pattern of infrastructure software showed up other places. If you're familiar with Netflix has chaos monkeys, chaos monkeys is like the opposite of infrastructure software, but it does the same thing. It's software that runs, that takes data in, calls APIs, but it breaks things on purpose. And these chaos monkeys would kind of degrade the state of your infrastructure on purpose with software. And it wasn't a one-time get push, it wasn't a repository, the chaos monkey had to constantly look at the state of the world and make sure, oh, is this right? Is that right? But both those things weren't GitOps. This was before GitOps was a term. But it fit really well. Once GitOps was announced and kind of solidified into these four principles that we've been hearing all day, once I heard about GitOps, I'm like, it's infrastructure as software. It's exactly the thing that we learned in Kubernetes controllers that we saw over and over again at large scales, at very high impacts in high velocity environments, you had to manage it as software. And GitOps is a core definition of that. It's an implementation of infrastructure as software. It does this control loop, it takes in that data and it calls data. And when I like to visualize things, if anyone's ever seen me talk somewhere, I like to use props. I like to use things that are visual. I'm a very visual person. And so if you've never seen these, these are AWS snowballs. They have 52 cores in them, 256 gigs of RAM. They're snowball edge devices. And I'm using them today to represent, like, standing up our infrastructure. And this is essentially, if you're using AWS, it's like a 16 extra large. It's like, that's how big one of these boxes is. And we'll shoot them to you. And you can run edge compute with them. You can put storage. I think they had 42 terabytes of storage. They're pretty nice to have at the edge. But this was what I was thinking when I had on-prem servers. I would stand these up. And what do you call two snowballs on top of each other? It's a snowman. But when I'm running infrastructure, I would deploy it. And I would run these scripts that would stand up my infrastructure. And it would get it all set so that every time I wanted to run it, I would trigger something, Jenkins, whatever, it would actually deploy this. And it was great because while I automated that piece of it, it was, I couldn't touch it. I couldn't do other things to it. And realizing time and time again that infrastructure is code again, it wasn't enough. Because someone would come along and do something to the infrastructure. Something would break. And nothing brought it back. Nothing stood those back up. Nothing brought it back to the desired state until we ran that infrastructure is code again. That repo of code that we had that stood these up, that's what actually would bring it back to a state that was like, okay, now we're good. And the main thing there of those controllers infrastructure or software is you always have something watching this state that you don't have to worry about like, oh, it's going to go. It's nope, nope. Software is constantly looking at that desired state and catching it before things fall over. Or if it does, again, it'll bring it back to that state force. We don't have to trigger our CACD. We don't have to do this stuff. I just like knocking these over now. No one tell my coworkers. They're rugged. It's fine. But I wanted to come and talk about why GitOps is really powerful in the things, the core principles of GitOps illustrates this really well. And yes, you're going to have more than one controller managing infrastructure. You're going to have Kubernetes controllers. You can probably have cloud controllers that do different things. There's all these controllers that are implementations of software. And the first thing that I wanted just to touch on again is this, what is the code piece? The thing that we always have is infrastructure is code everywhere. Everywhere infrastructure is code. And if I really like, okay, that's fine. We know running this isn't infrastructure as code, right? That's not it. Because you can't run it at your command line. You don't want people manually doing this. So what you really want is something like this. Let's echo it so it doesn't actually do something. Right? Is that infrastructure as code now? That's what we just did, right? I mean, that was essentially, I have infrastructure as code. I've made it, right? But that's like, it's just the automation of the thing, right? It's still not any better. And GitOps again is very specific in how we're implementing this. And if you didn't notice, at this point, I have code. When did it become software? Where was the piece in there that we're like, oh, now it was, oh, it's done. It's no longer software, right? It's like, software is code while it runs. It's only in that running state. It's only the code with electricity applied to it. When it goes through the processor, then it's software. And if you have a repo that is code, that is scripts, that's not software. It's static and that's fine. That's your automation pieces. But how do we break this up to say, okay, well, we need this to be applyable. We need it to be always running. We need it to be software. And again, as software, we want to separate our needs of the data we bring in, the variables we're taking in, and then the APIs we're changing things with. And for a lot of people, it's going to be, like now it's kind of software. Terraform is a controller. Terraform looks at a desired state, it looks at the current state, and it reconciles them. It's just a command line version of it. It runs locally. It's exactly what a controller inside of Kubernetes is doing. It's exactly what it fluxes and GitOps is doing. It's doing the same sort of reconciliation. And that's fine. But it's, again, it's not what GitOps is proposing. Because in this case, you could, right, like that's continuously reconciled at some level, or Terraform is going to always just do this. But that is really bad. That's going to hit limits. That's going to hit all sorts of things that doesn't scale. We need GitOps is a very specific way of doing something like this so that we can help it scale. And I really like looking at the principles because that helps us and that informs us of what other things are doing similar things. And they all kind of go along the same route. They all have similar principles and how they're going to scale and what they're going to do. So we looked at software as far as only when code has electricity applied to it. And so here's these principles that, again, you've been hearing them all day. I'm not going to repeat every single one of them. But this declarative piece is all about separating out the step-by-step from data, from the data that feeds a controller. We're declaring a state. For a long time, I falsely believe that declarative meant nothing was imperative. And that is absolutely not true. When I run Terraform apply, I declare my state in my TFRs, my Terraform manifest. But Terraform does everything declaratively. It builds this DAG that then says, okay, I need to do this first, then I need to do this first. That is exactly imperative. I just didn't have to define it. So for the longest time, I thought, oh, controllers are declarative. I'm like, wait a minute. That doesn't make sense. Something has to say this goes first, then that, then this. And the important thing is to have computers do that for us, and not in a script. Because humans have a lot more assumptions with a lot narrower scope view of the state of the world. And controllers and flux and Terraform can get a more holistic view of what's going to be applied. And that's when they can actually see what needs to happen in order. And again, it's not that declarative means nothing is imperative. It means that you're separating out your in-state to something that your interface to the thing says, I want this to happen. But something back there does step by step by step. So what we do is, oops. So there's my declarative state. Did I put that in here? Yeah. So I have my TFRs in there. That's my declarative state. But what is immutable? Okay. That's it, right? I'm immutable now. I can't write to it anymore. That's all we did. Like, that's all we had to do. And now we're immutable. Congratulations. You're at step two of getting, like, no, well, kind of. This is the fundamental piece of what we want you to do. We want you to be immutable. We want you to be versioned. And the main thing we want to do is actually, oops. See, I got permission died. I want to version this in order. I want to go to the next version. And so in here, what do we need to do? We got to, let's see, sort of first, sort that. Let's find all of them. We only want to apply latest, right? Because that's we're good. I think that'll do it. I think it's something like that. Let's see. Feet, version two. Right, there we go. We're now only pulling in the latest version. So every time, thank you, next door. Every time we're going to run our infrastructure as code, this is still just code, that becomes software, it pulls in this latest version immutable state. It's declarative, it's immutable. We're almost there. We're building our own GitOps with this infrastructure as software mindset. Pulling in GitOps is two very important things that I really like, that they put this as a core piece of GitOps. And I know there's different sort of pieces here, but there's two big reasons you want to pull your desired state, the data that you have, you want to pull that in. And one is for scalability, because if you have one or two clusters or one or two servers, you're probably fine. If you have a thousand, it's going to be a little bit harder to push that out and push that everywhere. Whereas a pull states, it's a lot easier for us, we've just, technology in general, have known how to serve some files for a very long time. We can serve websites, we can serve static stuff, we can cache them. But the compute side of it is intensive, and all of the API calls that result in that action happening takes a long time. So we pull that data first. We want to make sure it's stored somewhere. Yeah, you store it in Git. Yeah, you store it in a web server wherever. But that's one important thing. And the security aspect is the other one. Because if you haven't been paying attention to any sort of breaches over the past five years, a big way in the door is CICD systems that have global admin access everywhere. And that is difficult when you can just get to one system and then you can move horizontally anywhere. And we want to prevent that. The downsides, of course, is you have to run more controllers. You have to run more software places that only have limited scope of what they can apply to. But the benefit there is if something breaks in that one limited controller, it doesn't have a large blast radius. It has a very limited for security, for downtime, for all these other things. So this pull aspect is really good for the scalability. It's just a lesson learned. Weaveworks had been doing this for a little while. They're like, hey, you should probably pull that stuff. Here's why. And it's a great principle to have. And so in our infrastructure, we want to do something like this, right? We're going to pull that data down. Let's say it stores our variables file. And once we pull it down, then we're almost set to go. Because then I can run that tear from apply. And I can assume that, bash being bash, I can assume that if that fails, my code would exit and I don't actually have a problem of half variable files applying somewhere. And this one continuous reconciliation, again, it was kind of a trip for me where I didn't know how continuous that meant. Does that just mean while true? And it doesn't. It's not about being continuous. And it's one end we can just say you run every loop. Every time you finish, you try again. On the other side is our traditional infrastructure's code, which was only when a file changed. I don't know if anyone did config management for a long time. Once I changed my puppet manifest, then I applied that manifest to servers. And that was fine as long as I kept sort of a normal cadence of changes. But in something like infrastructure, you have a low level there of say DNS network infrastructure. These things don't change very often. And if you're changing those with a while loop, you will likely break something. And that is scary. And so you wanted something beyond the like only when a file changes. Because again, if I wait for a file change, and that happens to my network, I'm going to be parsing Terraform state files. I'm going to be manually fixing Terraform state files. And if anyone has done that, that is a bad night on call. I am sorry if you've ever had to go through that. But that was a downside of infrastructure's code, right? Like I thought when I had my Terraform as code, or my Terraform manifest, and when I was on call and I got this call and I was like, how could it be down? I have infrastructure as code. Infrastructure as code solve this problem. And then my state was completely to a point where the steps Terraform was going to do, the imperative steps, I can't. I can't get from here to there because someone in the console did something that broke everything. And now you have to go fix it. I had to go in and manually adjust Terraform state or manually go to the console and figure out the knobs to turn to get it back to some place that Terraform could apply it. And that was the difference between infrastructure as code and the sort of get op style continuous reconciliation. And that's the last piece that we want here where I want to take this script and when I want to do it more than just when a file changes, I want to do it when infrastructure changes too. But we're going to simplify it for, because this is a crappy demo of what we're doing here. So let's say if you've never used iWatch, oh, and I've got to find it. There it is. Because I couldn't remember this one off the top of my head. iWatch will watch for file system changes and run a command. So we're telling iWatch as soon as a file is written in this folder, close right is the thing it's going to look for. In this folder, run my infrastructure as code script. And so over here, I can actually, let's do version 3. And now I saw like right here, it immediately applied that thing. This is still at this point, oh, I got version 2. Oh, I got a bug. See, that's a problem. Oh, I spelled it wrong. Look at that. Good catch. See, and that's what code reviews for. There we go. Now we get version 3. There it is. Thank you. See, we got that in code review. That's still in this case, obviously, looking for file changes on disk, but GitOps is doing that reconciliation. And one of the cool things about Flux and Argo and these things, they're doing that two-way sync. And it's one step beyond what a typical controller like this is going to do, where I'm only looking at my local files. And traditionally, we're only looking for a Git push. And what you actually want to do is you want to look at that full state of the infrastructure and you say, hey, I'm just going to check it. Terraform is going to go out there and Terraform plant it every once in a while and say, hey, I think something's different now. And you can hook into these sorts of signals. Amazon, we have a vent bridge. There's all these different ways that you can look at what is going on in the infrastructure and you can trigger things based on any changes. Right? I could look at my cloud watch logs or cloud trail and say, hey, whenever something happens with this scope, I want to run that controller. I want to make sure that my infrastructure is back to where it was. And that was for me just how I started growing from what did infrastructure's code look like and why was code not enough? Now to we had controllers and Kubernetes controllers and how those get applied inside the cluster was a great thing because, again, it knows the entire state of the API server, the data that stores in and it gives you a ton of different annotations and logs and events inside of Kubernetes. And then applying that in a more generic way is really what GitOps is all about. And you can apply GitOps principles, you can apply infrastructure as software principles to anything. And it's not just the Kubernetes pieces because, again, you're going to have different scopes for these controllers. And if you think that you're going to do one controller and it's going to do everything, it's like writing one Terraform main file that does everything. You don't want that scope, you don't want that blast radius for one file, for one controller. So you do want to separate these things out. You want to have that limited scope. Having pole-based limited scope inside of environments is a great idea. And then applying this whenever there's a change on the infrastructure or on files is really that last piece. And that's really all I have for why infrastructure as software and GitOps works so well together. GitOps is an implementation of infrastructure as software. And that is the main direction that all of this should be going in. And I think it's been a great progress with GitOps in general so far. So thank you. Well done. Thank you so much for that presentation. We do have a few minutes for questions. Are you available to take some questions? So questions you can raise your hand or we have a mic at the mid. You can just jump to. Any questions? I will say I am showing off running Kubernetes on these and AWS booth tomorrow with EKS Anywhere, which also implements GitOps. So if you want to see them in action with Kubernetes, come by the booth tomorrow at 10.30. Those aren't hollow? No, they are real. They are heavy. You were really just punching those over that whole time. I was like, oh, he's just got the case up there. It's not a big deal. No, they are real. All right. Well, good luck with the demo tomorrow. Yeah. Any questions? Raise your hand. Don't be shy. And I'll be around all week. So sorry. While they are thinking of one, you were mentioning that, can you guys hear me okay? Because I can't hear the mic feedback. But you were mentioning that you felt like on some level it wasn't declarative because it relied on an imperative system for operation. So would you, I mean, in that case, there is no such thing as declarative, right? Because everything always relies on an imperative operation. Like even if you built a language that was only imperative at the end of the day, it has to be, that was only declarative at the end of the day, it has to be put into bytecode, which is imperative onto the CPU. So I was wondering if you'd maybe just speak on that for another moment or two. Yeah, there's also no such thing as immutable, which was kind of mind blowing for me to think of that. It's like, no, actually everything changes over time once we run it. And those were two things that as a sys admin, I kept thinking like, oh, well, this is how it has to be. But realizing that the main benefit was to the humans. The main benefit of all of this is that you can sleep at night so that you don't get paid, so that you can get paid to your paycheck, and you can have a good balance of life. If you need to scale, if you need to automate things, like automation goes so far, but there's always has to be this trigger, and GitOps really applies that trigger. And we're still just, we're doing a lot of the same things, and we're just pushing that out. But the human interface to those imperative systems, and the human interface to those what we thought were immutable, it was okay because my view of what's immutable is like, oh, well, I want this application deployed, but then I have this other controller that comes in and says like, oh, but I need to scale it up. I'm like, well, I didn't tell it to scale up. So it's not immutable, because I told it five replicas, but something else figured out it needed 10. And at that point, I'm like, well, now it's not immutable. So is it bad? No, it's wonderful, because I didn't have to do it. It was a huge benefit of having these controllers and something to figure out that when I say I wanted five, it said, well, I need one first, and then I need two, and then I need three. And it had to do those in order, and the scheduler had to bind them, and a cubelet had to pull them down, and I had secrets, and all this stuff had to happen in order. But my view, the human interface to the rest of the system is very declarative. And I like to think of it as if I'm driving a car, I can call Uber and say like, I need to go to the grocery store. My interface to that system is I made it to the grocery store. I was there. I could also drive myself. I could jump in a car, and I could steer the wheel left and right, and step on the gas in the brakes. And that is very imperative, and that is me telling it exactly step-by-step every little thing. But I don't have to say piston one fire, piston two fire, piston three, like that's a horrible interface, right? But it's still, there's imperative things behind every declarative thing we do, and where, what stage of declarativeness that I want and I need, all depends on how much ownership and how much control I need. If I need to get there in five seconds, five minutes, and it's the 30-minute drive away, I can't request an Uber because I'm not going to get picked up. So I need some control there. I have to get there fast. I have to get to the hospital. My wife's going into labor. Let's go. I'm not going to sit there. I'm like, well, let's call a cab. Sorry, let's hope you get there. So it's all about the interface with humans. Just because I didn't see any other hands up yet. So would you say that, well, I don't know, I don't know if it's necessarily ownership, but would you say it's how much you can trust your software? Because you're describing infrastructure as software, right? Yeah. There's always bugs in software. I wrote how many bugs today in three lines of bash. There's always bugs in software and a lot of that trust only comes over time. So it's like, well, I have to trust other people. Thank you for the PR, the review. We trust the team. We trust experience. We build that trust over time, and then we can trust that software over time. And some of that trust comes from just generally other people use this thing, right? Like flux is amazing. Not because I've used it forever. It's because 1,000 other people use it and they told me some practices to do it. And I trust my car. I'm not my wheels aren't going to fall off on the way to the store because I didn't put the bolts on, but I trust the mechanic who did it a thousand times that they did it correctly. And there is always that implicit trust of just like, well, I have to trust someone in this case and being able to trust other people and trust the ecosystem and trust the tooling. Again, please don't write your own bash controller for GitOps. Use one that's fairly trusted because you're going to have those bugs. But to gain your own experience adds more trust. But a lot of that comes from knowing how the system breaks in your environment and knowing like, oh, my process didn't align with how that thing was intended to be used. And so once you have some sort of intention-based thing where you're like, oh, I think it's going to work this way, but I have five teams that are deploying to their own repos that all merged into this other thing, like that's where you kind of get into problems because you're like, oh, actually my ideal of how I was using it didn't align with how the rest of the community was, didn't align with the process inside our company. And so you have to build that trust for yourself as well. But yeah, gain that trust through these common tooling and common practices. And that's again, GitOps has those four definitions, four very good reason in the tooling implementing them for those reasons. Open for audience questions. Yep, got one over here. Hi, great talk. One question I don't know, you don't show it, but it's always in my head when I use Argo CD. If we revert in a case of GitOps, what is the best practice to revert it over the UI on Argo CD? Because you have a history and revert possibility. Or actually a single point of truth will be Git, right? You go on Git and revert it there. And then you have auto sync, you have a sync, which will, in case of issues, in case of bugs, will deploy it immediately. It's also in my head why we have this history and rollback in Argo CD. And should the developers use it or not? Should the users use it or not? Or should we go over the Git where the single point of truth lies? I don't know the very specific on Argo's implementation of that, but in general, you always want to roll forward. You always want time to progress forward. And computers sometimes have more gotchas when all of a sudden time reversed. And we said, hey, I already saw that state. I worked at Disney Plus for a while, and we were managing infrastructure. And we built our own controllers for how we were managing our clusters. And we had that same question. It was like, hey, how do we revert? We deployed something that was bad. How do we get back to a known good state? And yeah, we could go to Git. All of our state was stored in Git. We could go to Git and say, oh, I'm going to make the new head the old version. And we had so many weird things in software that didn't ever assume that latest would be something that's not the most recent. And it said, oh, no, latest is the most as far as a time stamp goes. And a lot of systems still deal with time stamps. And so what we decided for our system was always that we never went back a head version. We always did a Git revert that pushes one more head version ahead. And it says, oh, I'm going to take that thing, and I'm just going to undo this commit. But it always has a new commit, has a new time stamp. It always moves forward. And so in software and infrastructure, a lot of those systems are just easier to mentally think of when the current state is always the newest time stamp state. And so figuring out what the newest time stamp would be and the newest commit as far as history of dates, just because we live in a time synchronized worlds, it's just easier to reason about in a lot of ways for all those controllers. So I don't know if Argo has a specific way to do those reverts. But I know in other cases, it's just a lot of times more safer just to say I need to revert, meaning I go forward in time with a new check out. The Argo maintainer in me says, let's talk after this. I think we have time for one more question while they're setting up for the next talk. One more question from somebody? Big pressure. You got to get a good question. No? Okay. Well, let's give one more round of applause. Thank you so much. Thank you.