 My name is Hilary Lipzig. I am a principal site reliability engineer and team lead at Red Hat. So what does that mean? It means I use a lot of GitOps, but from like a very different side of things. Like Christian. Yes, so Christian Hernandez, senior principal technical marketing manager. I think we have like the longest titles at Red Hat, but and you know, I deal on the other side of GitOps, right, I'm in the BU and I always receive all the feedback, right, from from our customers. So she kind of deals with actually implementing a lot of the things being an SRE for, and so she has a different view than I do of GitOps. So, so yeah, we're both cartoons there. So as we see, it's not just the like the way we implement it, but then like supporting the infrastructure that has the implementations on it. So operating GitOps and using GitOps as a consumer. It's good times, folks. All right, so brief agenda today, which is a lot of white space on this big screen. I'm not gonna lie. We're just gonna keep this all pretty high level conceptual. We're gonna talk about CICD. We're gonna talk about pipelines. We're gonna talk about non-linear pipelines and then we're gonna tie all of that back into GitOps. So how it started and how it's going. This is a real make file from 10 years ago. I picked this one because it's nice and clean and you can see everything. And I'm borrowing this meme format, right? Usually we use this how it started, how it's going and we're talking like things have gone terribly. The good news is with like GitOps and modern CICD, things are actually going pretty great. So who here has ever manually deployed code to production? Right, yeah. What did you do? Did I get an FTP, right? Right, you something like that? I've done it with FTP. I have. And then somebody went and logged in and ran a make file like this. Assuming it was this nice. I've seen some pretty gnarly make files. And then what did we do? We basically prayed, we hoped. And as an SRE, I'm gonna say the thing all SREs say, right, hope is not a strategy. No, no it's not. So what we basically did is we figured out ways to avoid hope. And so now we have, you know. Now we have like observable, provable methods of delivering production code. Delivering production applications. Delivering stuff that is good, tried, true, consistent. So a lot of the driver behind this was this methodology, right, CICD, continuous integration, continuous delivery. And the idea of being continuous means we're going to do it faster. That also means we're gonna fail a lot harder. So one of the things that immediately got kind of codified into this was that testing needed to be part of the way you delivered code. Test driven development, like CICD and test driven development came up around the same time. I spent 11 years in quality engineering, I promise you, that's how it went. So we had a lot of those things happening in concurrence and so a lot of original CICD pipelines that I worked on, it was not just how are we going to deliver our code consistently, but how are we going to guarantee our code? And that was, now we have infrastructure as code is one way of doing it. We have, of course, the unit tests, the integration tests, you have various different layers of things to deliver code. So I've actually like logically divided this by color, right? You have your CI pieces, your continuous integration pieces, those are in the light blue. And then your continuous delivery pieces, those are in the dark blue. And these two things must be separated. A lot of CI is actually about method, not just the technical implementation, because there's a lot of different ways to approach CI and there's a lot of different reasons why you would approach them differently. So your actual reality is gonna look more like that, right? And actually these are two different examples of potential CI CD pipelines. So in the first one I have, you have some concurrent processes, right? Like you might do a build and then especially like cloud native where you can spin up multiple instances of things with pods, you might be doing some tests, you might be doing some security verifications. All of these things by the way, apply to non-cloud native deployments because that is really where a lot of our reasons for best practices came from. It was when services were monolithic and they weren't broken up into microservices. And they weren't necessarily as easily observable again. So this is, I love observability, you're gonna hear me say it a lot, just get with it. So here are two examples of like real CI CD pipelines that I have built in the past, right? So we're doing some concurrent things. Where we're building, we've got some security checks that then check off at a separate workflow and then we've got our test release and deploy to stage. Now all of those things must be true and correct in order for you to progress past stage. That's why that ends there. I'm not like the best visual thinker, so my diagrams suck. And then once you go to stage, you test again. This is an interesting thing because with the infrastructure as code, with the pretty much guarantees we have that things will be the way we expect them, that last test thing is starting to get dropped off a little bit and then last just like kind of the one CD. But there really is like CD to stage and then CD like for real with the delivery versus deployment at last and had a great article on that a few years ago. I don't know if it's still up. And so you're going to have those steps. And so some times you're going to see it with or without the second deployment model. I always have that last set of tests. That's your just in case. That's your oh shoot button, if you will. Like that's the fail safe method. And then you're gonna have something like a notification, right? We've passed our last round of checks and things are gonna go off. So that's an example. The next one, this is a bit more of a microservice designed kind of pipeline. Like this is again another real pipeline. So if you would imagine something like a React stack, right? React Native, React just for web, you know it's all API driven. So you can actually use the same backend API to power both UIs, right? So realistically you're probably just gonna build that once. And then depending on what you need you're either going to build both your UI stacks or maybe just one. And so the reason for this logical split of things is one so that if you only need to rebuild your web, you can just rebuild your web. If you only need to rebuild your mobile, you can just rebuild your mobile, right? And if you want to rebuild your backend, realistically you're probably rebuilding everything but not necessarily. So moving on. So in these models I discussed sub tasks, right? And this is, you see these a lot sub tasks. What is that? It's this thing that you're doing, it's part of your task and your pipeline and it may or may not be blocking, right? A really good example of that is a notification sub task. If I am notifying you that something has happened or something has started and let's say I'm doing it through a Slack integration, right? If your Slack web hooks are failing because Slack has gone down, you probably don't wanna stop your entire pipeline. Realistically you're not gonna stop your entire pipeline. You can add some other fail saves or even attach pager-duty alerts if you want to some parts of your pipeline failing but I don't want my notifications failing blocking the continuous part of continuous. So it's kind of, then there's other things that would be very blocking, right? Let's say your unit tests fail. Well, yes, you're gonna stop there. Realistically that's a good place to stop if your unit tests are failing, you know. Ideally, I mean I've seen it done the other way just not to much success but you're gonna probably like we showed security text on the last page, you know. Those are types of things that you could run them as sub tasks or a pipeline of sub tasks but realistically that stuff fails. You're not gonna progress your code. You're not gonna progress your deployment. So there's lots of different reasons why something may or may not. It also might be about your maturity model, right? If you're not actually going to production, right? You're new. You're just starting up something brand new. It's very like in the moment you can go ahead and make things that should be blocking, non-blocking and then update it as you go, right? Get more mature, get ready but the point of continuous integration again is that it is fast. So if you want something to be blocking or non-blocking maturity model, the maturity of your application, the maturity of what you're doing, these can be factors as well and so you basically should be continuously evaluating it. You're going to want to manage your CI CD with a life cycle really similar to a life cycle of an application, right? The software development life cycle applies everywhere, I promise. Like really everywhere. Just apply it everywhere. You'll be fine. Am I doing good? Am I sticking to like two minutes of slide? Jump out, yeah. Perfect, great. Trying to run through this as fast as possible, right? And so the last part is keeping things maintainable, right? And I have like five stories I could tell about all this stuff but I'm trying to keep to our time here. I'll go back and tell them if we get that space. So again, like don't keep it too fancy, right? I showed you that slide with like two different pipeline models and one of them was kind of, like there was a lot more to it and the other one was kind of broken out more nicely. And basically the advice is find logical breakout spaces where you can do that because it makes your overall process of CI CD it's easier to debug when something goes wrong. It's easier to follow, it's easier to debug. You're going to be able to observe it a little bit better if you break it out into more logical spaces. And I mean, feel free to experiment with that and try and fail, right? And at the end of the day, there's like copy paste for a reason. You can just move stuff around. And so strive for dry code but like don't kill yourself with that. And really at the end of the day what are we doing when we're doing CI CD, right? Where did we start? We started with bash, right? It was bash. And now we've basically created tools to orchestrate bash. I was explaining this to one of the junior engineers on our team and I was like, look, everything we are doing in Python, you could do it in bash. It just wouldn't be object-oriented, it wouldn't be as dry. But at the end of the day, we're still just doing bash commands. We've made them faster, we've made them more reliable, we've made them more observable. And we've kind of regulated that environment, that infrastructure as code piece really does that as well. So dry code where possible but don't kill yourself, crawl, walk, run. Don't be afraid to mix and match the right tools, right? So we say, oh, CI CD, what am I going to use? What am I gonna do? My team uses a mix of GitLab and Jenkins. And you're like, whoa, why? Well, there's reasons why, right? Basically, I said this to Chris earlier, choose your technical debt wisely, right? Everything has some level of technical debt. And what did he say? He said, and Conway's law always applies. Yeah. So choose your technical debt wisely. You're going to get some. Guaranteed, everything you pick has technical debt. You have to decide what is the most sustainable technical debt for you. And you really need to be mindful of that. Really, I'm like a pessimist, honestly. I've seen everything that can go wrong, go wrong. So I'm never like, oh yeah, choose the advantages. No, choose your debt because you will pay it every single time. And again, iterate, start with good. Get your MVP, work your way up to better. Wow, we're doing great on time. I'm gonna tell one story then. Oh, there you go. I'm gonna tell one story. I earned it because I kept so good on time. By the way, just since we have time, that quote, choose, don't choose the best. Choose your debt. I think that I'm just gonna put that on the next get-up. Yeah, on the next t-shirt. That's just gonna be on the one in Detroit. I'm just gonna have that on the back. It's just, I love that quote. Yeah, you make that shirt, you better save me one. Yeah, yeah, I'm gonna, yeah. All right. So this is a real story from about 10 years ago before all of this had really matured, right? This has come so far. CICD has come so far. We have tools to choose from. We did not used to have tools to choose from. We were like Daisy chaining Jenkins jobs and praying. So anybody, Daisy chained Jenkins jobs? Yeah, yeah, okay. So a lot of you make me feel very old and I'm not okay with that. You may leave. So way back in days of yore, we did not have some of the maturity, right? And we had to learn very painful lessons which is why we have today's maturity. And one such example of this and why infrastructure is code and GitOps and so forth is so important is real story. We, my company, right? We had this big monolithic application. We added this new feature. I was the head of quality assurance. I had spent like tons of time testing this six ways from Sunday. It worked. It did. Except. So it was a very memory intensive thing. And somebody, not me, because I was not the sysadmin. SRE is not your sysadmin, by the way. Just in case that wasn't clear. Throwing that out there. I say that every time I get a chance. So not me, because I was not the sysadmin. Somebody had decided that the staging cluster, or staging cluster, jeez. You can see how long I've been in here. The staging virtual machine could have double the amount of memory to it as production. Now does anybody know what the most, what is the most important thing about staging? Consistent with production. That part. Yeah, it's consistent with production. Staging should closely mirror production. It should be the same. So much so that when Christian and I worked together 10 years ago, the way that we promoted staging to production was by swapping DNS on the servers. That is how it was done. That's how old school we were. That was so old school. And Daisy changed Jenkins jobs. Yeah, Daisy changed jobs, yeah. So somebody had made that decision that it was okay for the environments not to match. And there was no R-back, right? VMs did not have R-back, no role-based access, no identity access management, none of that existed. It was just one person had the password. So we had to wait for this one person to go in and figure out why this much-awaited, very highly-lauded feature didn't work. And he finally, like, he's done a bunch of digging. He finally finds that, right? And he's like, oh, I guess I should, maybe scale staging down. And I was like, no, no, you can't do that. You must, you know, turn production up. He's like, well, keep in mind this was 10 years ago. Does anybody know what happens when you need to increase memory on a VM roughly 10 years ago? Right. Shut it down. Yeah. We had to turn off production. That was not a good day, folks. It was not a good day. So, you know, these kind of, these lessons that we've learned, these stories, right? A lot of this applies to bare metal. A lot of it applies to VMs. And now, of course, it applies in the cloud-native world. And so we talk about these best practices when Christians about to go tie this into GitOps and like bore you, I mean, regale you with more. Keep in mind that these are like really hard lessons that we have had to learn. So, moving on. Yes, so thank you, by the way. You're welcome. So now, how do we tie all this to GitOps, right? And so the, I'm gonna take this language off because I'm proud of this shirt. Oh no, now you're all tangled. Look what you've done. It's like, the, how does all this to GitOps, right? So now we have, yeah, now I messed myself up. Now we have cloud-native architecture, right? We have a way to manage cloud-native architecture. And now that's even evolving even further with GitOps, right? And, you know, the fact that you're all here and the fact that that room was so filled makes me happy. I'm part of the planning committee, by the way. So which is why I'm saying it makes me all happy. That all of you are here, I really appreciate it. And, but, you know, a lot of the times is like, how do we tie this into GitOps, right? And so, just like Hilary was saying, that a lot of this is gonna be conceptual and since I'm like on the business unit side, it's gonna be like a lot of the things that I've seen out there. So it's not gonna be as technical, maybe as you guys would like, if you wanna chat with me after or ask questions after I can get deeper technical. But for the purposes of this conversation, I'm gonna keep it like really high level and kind of like the three ways I've actually seen GitOps tie into CICD. So, but first, the, how does GitOps look like in your pipelines? From a high level, right? Traditionally, long time ago, CICD was considered like a thing, one thing, right? And a lot of it has to do with Jenkins, right? I mean, it's kind of like, it's the tool we had, right? We had the hammer and everything looked like a nail. And so we, Jenkins did everything with CICD. So Jenkins did the CI part, it also did the delivery part. But with GitOps, CICD are really kind of decoupled, right? Because they have their individual, the individual, I guess what I would like to say is the focus, right, of the aspect of it, right? And you mentioned the Atlassian article about like what CI is and what the two CDs are. And so, and they're kind of different, right? And so, think of CI as a synchronous process that happens. Whereas GitOps is really asynchronous, right? If you think about it like Flux, or if you're using Argo, there's a reconciliation loop. Yeah, you can use webhooks to kind of run the reconciliation maybe at commit time. But it's really asynchronous, right? Especially with like drift detection and things like that. You have to wait for that reconciliation loop to happen. And so, now you have this idea that GitOps controller doesn't necessarily know anything about your CI process, really, because now what you're trying to do is you're trying to integrate a synchronous process with the asynchronous process and try to make them work together. And that's kind of difficult, really, from what I've seen, right? So I've seen three solutions out there, right? And if my, did I press maybe? Uh-oh. It only liked me. That's it, it's done. Oh, there we go. Yeah, so, what I like to call CI-managed GitOps pipeline, meaning like this is, the CI basically owns the entire process, right? This is kind of more of the traditional CI CD, like with Jenkins or Bamboo or whatever, you know, one of those technologies are. Yeah, all of them. Yeah, like all of them, right? Like this is kind of fits more into that aspect where the CI tool basically owns the whole thing and really GitOps is kind of like the, it's kind of like a second thought. It's kind of like the last thing and they don't really know about each other. And this is kind of where I see people start because they use the floating tag, right? Aspect of it. CI does the whole thing when it's ready to deploy, it just tags that image to dev. Wait. Stage. Production. Use floating tags. Yeah, so the, yeah, floating tags not recommended, right? I put that there. And then also, if you guys were in the morning session with Dan, it doesn't really fit with like the GitOps principles as we've written them because. Or security principles. Well, yeah, and also the security implications. And just because it's like, because GitOps, you need to have your source of truth, right? And a floating tag isn't really a source of truth because someone can force push an image, you know, a bad actor can change an image. If you need to roll back, there's no really good way of rolling back without like force pushing a change. So yeah, so like, you know, the CI tags an image and it just tags an image and just assumes the CD went into place. But this is kind of like a good start, right? So it's kind of like a, where people start and kind of, because it feels familiar, right? I see them, it feels familiar. It's like, okay, it's kind of like, you know, I could still kind of use my CI process the way it is and kind of just put like that GitOps thing at the end. And then, and then this next one is kind of like the mid tier and it's probably like what the most, what I see most people use, right how we use ArgoCity. So if the flux people, if you see something different, I'd love to talk to you guys. So it's essentially the CI does the build and GitOps does the CD part, the GitOps controller, whatever it's continuous delivery or deployment. The CI builds the application just like it normally would do and the GitOps controller actually does that deployment and it handles and it hands back the process back to CI to do whatever testing it needs to do. You know, it's essentially, we're talking about multiple pipelines and the multiverse of madness, right? Like this is like when you start chaining together it's like, you know, hand off, you know, hand off, hand back on to the CI process and continue running however many pipelines you need to run. So the CI process is more hands on, right? So it's kind of like the CI process is kind of managing the GitOps tool itself. And although GitOps is involved, right? This is a functioning asynchronous process because it's handing back and forth. The GitOps is kind of just waiting for that image change to happen and then it does something and it hands it back off and it's, you know, it's a little bit more linear, right? It's, you know, even though GitOps is there asynchronously, it functions a synchronous process. So, and this is things I've seen, which is pretty cool. CI triggered, but like the GitOps is completely owns that process, right? And so the CI builds only, it only builds the image and GitOps does like all the rest essentially. CI creates PRs, right? So it's like more intimately involved with that GitOps process. It'll create a PR, you know, it'll do gating that way, right, with PRs. And since GitOps controllers act on a branch or a tag or whatever, the CI process, that's how it interacts, right? With Git essentially with tag, sorry, with PRs. And someone, you know, can manually, either manually approve it or automatically approve it after test run or something. We have bots in our group and the bots do things unless you say slash cancel and then the bot will cancel it for you. Yeah, so yeah, slash retest slash cancel. Slash retest, yeah. SlackOps, your favorite. Slash cancel is my favorite. The CI and CD and this design, they're like literally decoupled, right? And so this is, you know, kind of what they were saying, you're using the best tool for the job, you're kind of building things with what CI is and you're deploying things with the best deployment tool and since they're decoupled, you can kind of, you know, pick and choose what the best tool for that. You can pick not necessarily the best tool for the job, I guess, the best tech debt. The best technical debt, yeah. And the kind of the drawback of this is no longer like a linear simple process, right? You know, this is the multiverse of madness sort of ideas, you know, you have different pipelines doing different things and it's not necessarily a linear anymore. So you kind of need a good management of that. So yeah, then this is kind of the idea, kind of like the nirvana, but there's kind of some drawbacks to it. And so kind of when you're, you know, integrating GitOps into your CI CD functions, right? You really need to do that kind of a paradigm shift, right? So first of all, it's CI CD should probably be decoupled in GitOps. Remember, GitOps code is not promoted. You're not promoting code anymore. What you're doing is you're promoting manifests really at that point, right? You're promoting YAML at that point, right? You're not really promoting code, right? So you're just building an image and then you're promoting the YAML from it that then just then deploys. So yeah, code update is just a trigger for a pipeline, right? And so you kind of just have to keep that in mind. It's some of these paradigm shifts, you know, we come from a background where you are promoting code. You're doing a Git clone onto either a production VM or Git clone and then FTPing that is old school. But now you're just promoting YAML through the process. So you kind of have to change your mindset there and do that paradigm shift, right? You kind of, you know, a lot of the times we try to fit a square peg around a whole, right? We have these old processes and we're trying to see how that fits in this new world. But now like cloud-native architecture, immutable infrastructure, right? You're no longer building VMs to run code. You're building VMs to run containers and then at that point in the VM just can, all it's doing is just running containers. So you have to take care of less and have the Kubernetes take care of it more. So anyways, that's it, right? You can find us, I think we messed that up. Somewhere, I guess. You can't find us. We will disappear forever now. We just won't tell you where. So yeah, so if there are any questions. Oh, we got a question over here, Chris. There we go. How do you trigger the second pipeline or how's that triggered? Oh, so the second pipeline is triggered either. So at least with Argo, there's like a posting hook, right? So you can do things with hooks like a posting A when I sync and hit this endpoint. In the Argo world, there's other tools you can do like workflows, right? You can set, it does a workflow or an event or whatever you have an event listener somewhere, but usually that's how I've seen it done with like an event listener. Basically that, yeah. I mean, it depends on your tool a little bit. When we daisy change Jenkins and it's still like a valid model, you could just be like, okay, when I'm done here, then I'm gonna go like hit this and then that does the next thing, the next pipeline, the next job, whatever it is that your tool has. There's gonna be some sort of like, now that I've done all of my things, there's just one last little step. Most most modern CICD tools have something like that in there. What's your experience in introducing like manual, you know, approvals on promotion to prod and automatic ones? Like. Okay. So this is like a really big topic, okay? I'm an SRE, right? So like promotions to prod, they're still part of my life, just like they were when I was in quality, right? And so there are, what was the word? I just used it earlier, models, something maturity models. Yes, maturity models. There are maturity models, right? So you're gonna wanna have minimum quality standards, right? You have certain amount of tests are definitely passing before certain things can promote this far, right? So you basically, you should always automate as much as possible. And then from that last point, it's really gonna be based on your organization and basically your faith in your own stuff. So there's a little bit of like hope, trust and pixie dust. I guess you would say, I'm all pop culture, all the time, but like 90% of them are for kids because I have children. So we're just, that's where we are today. So let me give you a really good example. We have some services in Red Hat in there, what's called a managed service. What does a managed service mean? A managed service means you buy it, you use it and we take care of all of the infrastructure and running it and making sure it's up, right? I'm an SRE for that. That's what I do, I manage other people's stuff and I don't know what they're doing with it, I just know how it works on OpenShift. So with our very like initial versions of things, we actually would basically say, all right, everything else is done, all of these things have, all these checks have passed and been met and we still decided we're gonna have this one last review on this and basically just says that the SRE team is ready for it to go, right? And so basically that means that if the team in AMIA, which is where the engineering team is, right, is done and it's like but at the end of AMIA day, then that means that the NASA North America, South America SRE team will do the LGTM, looks good to me, to kick off the deployment because we're fresh, bright-eyed and bushy-tailed. And so we could also just wait and decide on that. We could also say, okay, we're gonna have this ready to go but we're gonna go do all these additional burn-in tests, right? Are you burn-in tests, everybody? Yes, no? I mean, I get it, I don't know, do you wanna- No, okay, geez, these are like not, this is not a short topic, right? Okay, so a burn-in is really, this is your longest-lived test and you can decide it's a period of 24 hours, you can decide it's a what- This is a test that will actually send off production-like alerts to your operations team, right? This is the operations team saying, hey, guess what, your stuff operates or not, depending, right? So there's lots of different little pieces like that that we have and then once we've had enough experience with how it fails, not how it runs, how it fails, then you can say, okay, we know how this fails, we've fixed as much as we can, we've pushed everything that we can push, we have just developed to do over time and experience a level of confidence in our ability to handle anything at any time of day, and that's when you go fully automated. So basically, as the operations team, right, that's kind of it, but there are some things that we might never fully automate because, and well, not Red Hat, but in general, right, a lot of companies, maybe smaller companies, without 24-7 SRE, you're probably not going to wanna fully get rid of that last, like manual check if you don't want deployments happening on the weekends, right? If you don't want anything to happen on the weekend, you're always going to leave that off because you're not going to try to remember to turn it off for the weekend. That is too many layers of human error that we're gonna go through. So it really is about maturity model of the product that you're supporting or releasing and it's maturity model of the team and the company and what you can realistically handle going wrong and when you can handle it going wrong. It's a personal decision, so that's the short answer. Any more questions? We do have to go, all of us need to go back downstairs. Well, that's right, you have a talk downstairs. So I've gotta get out of here. Yeah. All right, thank you very much Hilary and Christian. Thank you.