 Awesome already. Sorry for the wait everybody. How's everyone doing? Yeah, that's the most energy I've seen in a room all day And I hope we keep it up. Whoo All right, so my name is Katie Lampkin I am a PM at Intuit a platform and open source. I focus on our CICD platform So that's grant that scans from Jenkins all the way through Argo CD and Argo rollouts today Our presentation will be on click-free environment promotion with application sets and progressive sinks now allow Michael to introduce himself And I'm Michael Crenshaw. I am a back-end software engineer at Intuit on our Argo CD team That means I help keep our 50 instances of Argo CD running in about 20,000 applications sinking I also contribute heavily to Argo CD in the open source world Great, so let's go over our agenda really quickly today So we're going to cover today's problem statement, which happens to be our promotion strategy We're going to talk about the goals that we want to accomplish in fixing our promotion strategy What potential solutions are out there today and the solution that we are going to be discussing today Which is application sets and progressive sinks So just a little bit about Intuit right Michael alluded to a little bit of the scale at Intuit But Intuit is a very large company that powers very large platforms as you can see we have over 2,000 services and over a thousand teams that Run and power those services. So that means developer innovation and developer productivity is Extremely important to Intuit and you can see since 2019 We've had a 9x increase in developer productivity, which is great, but we don't want to stop there We want to continue finding what those pain points are for our developers Continue making sure that we can reduce that toil increase developer delight and see what we can do about continuing to improve our metrics So what problem are we going to focus on today? So let's take a look at today's promotion strategy, right? Like I alluded to previously our CI platform is Jenkins with a series of groovy scripts And we combine that with Argo CD and that is our promotion strategy today So what does that mean when an application developer wants to deploy their service to Kubernetes? They get a Jenkins pipeline that contains not only their build But every single step that they need to deploy to their individual environments dev test stage prod Sometimes there's a perf environments development teams also have the ability to create custom environments as well If that fits their use case so as you can see they're going to be many steps in this Jenkins pipeline And when they're ready to deploy to those environments We call Argo CD sync via CLI and invoke Argo CD to make those changes against those environments Now what are the downsides that are causing pain points with our developers? So no auto sync this process involves manual effort from our developers, which is not something that they want to do no get-ups We have a situation where a developer can't necessarily check get and see exactly what's running in these individual environments And we have these long complex pipelines, right? So I talked about that in the beginning and what are the pain points behind these long pipelines? Well, if a developer wants to go and introduce a change to these pipelines, which they often do for example Adding in a series of integration tests. They want to execute an E to E adding performance tests in perf Maybe it's adding security scanning at a certain level because this application is really important to our company And we need to make sure it's secure Well, when you have these long pipelines and you go in and you make these changes It can be very difficult for developers to make those changes and in return when we are executing these pipelines And we're actually deploying to these environments These build logs can be thousands of lines long and so developers troubleshooting failed builds failed deploys can take hours and Ultimately, they may not be able to do this by themselves and they'll have to reach out to a platform support team to assist them For our CI team 60% of our support requests today are necessary for troubleshooting failed builds So that's an undesirable non self-service experience for our application developers So what are our goals? What do we believe we need to accomplish in order to help relieve these developer pain points? Less manual toil, right? We want to increase developer productivity and that means removing that manual toil from that workflow Synchronization across all environments, which goes hand-in-hand with a get-offs declarative strategy We want to be able to say hey What's in git is what is across all of our environments and all of our environments should be the same and on the same version And finally we want automatic failure detection and we want an easy process for developers to troubleshoot these failures So what are some potential solutions? app of apps with sync waves great potential solution But unfortunately, you know with some industries that are heavily compliant You require an option you require an option for human approval and app of apps with sync waves does not allow for that option There's also no single manifest to define that strategy. So you're still in a position where you need to manage multiple manifests Cargo as I'm sure you guys have heard there is a great new open source project called cargo It has an awesome feature set also could be more than you need and Finally the topic that we are going to talk about today appsets and progressive syncs It's easy to adopt if you're already using appsets, which I know a lot of you are as I've already talked to you at the end It has advanced promotion strategies all located within a single manifest and it's already built in the Argo CD So when you're using Argo CD, you get these features right out of the box And I'll hand it over to Michael to talk a little bit more about application sets So as Katie said a lot of y'all know a lot about application sets already But for those who don't know I'll take a minute just to give a refresher Application sets are a CR custom resource just like applications And it's all about drying out the application manifest management in other words You don't want to have a bunch of application manifests When you could express things much more simply So an application set basically defines a source of truth. It can either be static or dynamic And that source of truth produce a list and that list is templated over what basically just looks like an application manifest So the sorts of things that you can do with that are have an application Automatically generated for every cluster that you have configured in Argo CD And that's useful for things like deploying cluster add-ons like it into it We deploy Argo rollouts and pneumaflow to every single cluster and rather than having to create a new application Manifest for every single one of those 200-ish clusters. We have We can write one application set and it can just create all those for us Something else you can do is make your developers lives easier One of the generators available for application sets is called the get directories generator So if you have a mono repo with all of the manifests for a certain teams or even the whole organizations Applications you just have one directory for each application and each directory contains your manifests The applications that generator can loop over those directories create all your applications point them to the appropriate directory And now your applications just exist automatically That lets developers self-serve they can create and delete those directories at will And it just happens for them like like magic. No manual creation and deletion of application manifests I see people nodding and laughing like this is an excellent idea. I love to see that It really does make life so much easier So I'll give a quick demo. It's so much easier to tell what a feature is when you actually get to see it in action So here I have an application set. I want to make it full screen, but I can't oh this thing's huge anyway So let me pause. I've got a generator, which is just a static list That could be something much more dynamic like a git generator cluster generator, etc But for here, I've just statically defined my dev and e2e environment So that's what I loop over and it's templating that information into an application manifest So I'm get off seeing my app set manifest I sync it you see two applications were created as child resources and Now they're created in the main interface where you'd expect to see them And I just go and manually sync them since I didn't define any particular sync strategy for this And I'll also click in and show you what the application looks like. It's just a service and a deployment So a really simple API If I want to make a change I want to add a new application or a new environment I do that just by going to git doing everything get off style. I add my new environment just in that list and check that in and Again in in a real environment these clusters would probably be defined as like Argo CD cluster secrets So you wouldn't be doing this in git. It would just happen when you register a new cluster in Argo CD But for the sake of just showing how it works you see that we've added a new environment That's the only line in the diff. I hit sync and now we've got our three applications super easy and Mainly it's just easy to understand you can look at that apps that manifest and know precisely what's gonna happen There are no secrets and no surprises So that's application sets That's everyone who's already used application sets over and over again and knows what it's about you can wake up now because this Is the new goodness Progressive syncs are a way to orchestrate the applications that are generated by application sets and you saw a second ago I just had to manually click sync to synchronize those apps with progressive syncs That just happens for you and it lets you gate each stage of synchronization each application or group of applications By healthy status this means that any application that you currently deploy That has the ability to go into it like progressing or degraded state is Supported out of the box by progressive syncs so things like deployment stateful sets Ingresses all of the things that are go CD supports out of the box with health checks and anything that you can create a health check for Which is any customer resource Progressive syncs can support and the way it works is we sync a particular step We wait until that app or group of apps becomes healthy And then we move on to the next step and sync those and so on and so forth So at each stage you're sure that what is actually being synced is safe to sync because the previous environment has confirmed that for you So again much much better to see it in action than just talk So I've got another get ops application set manifest and this time I've defined this new field called Strategy and then under that roll out sync and you can see I've defined three steps For the three different environments that I'm going to deploy to and each of these environments targets a particular label that I've defined on the Application so kind of a typical way people organize applications put a label on it. We're taking advantage of that pattern here So once I've synced it you see that all three apps were created and now the magic is happening so that first one is being synced it became a healthy now e2e is going to start syncing magically and that's going to become healthy as Well, and then the third application goes start syncing and becomes healthy, and I didn't click anything It just happened for me Well, I clicked sync on the application set, but you could still automate that and make that update through some automation So now I'm going to show you making a quick change Probably the most common type of change that you're ever going to make in your applications bump an image tag And in this case, I've just defined it as a value and the helm values block like Argo CD image updater edits that field So it would automatically do that Allow you to use this progressive sync feature So you can see the only change is just bumping that image tag. I hit synchronize again That could be automated, but now it starts syncing the new image tag goes to progressing and then eventually to healthy and This image tag tag 0.2 has no bugs has no issues So it is going to progress all the way through all of the steps and synchronize all my applications all the way through prod Again safely at every stage. I know that the previous stage became healthy before I synced And so I'll just pop into the app to confirm and convince everyone the image tag has been updated. So our Progressive delivery of that change is complete That's environment promotion. That's maybe the simplest version of progressive syncs This one is a bit more like what we do it into it And we actually use this to deploy the metrics extension server that Leo demoed a couple of talks ago We use progressive syncs to deploy the same server to every single cluster add into it And I'm going to simulate that here You won't see our actual production setup because it's like hundreds of apps, but here. I've just created eight Items in this list. I've put two apps in the development environment Two apps in the development environment three apps in Ede three in prod And I wanted to pause here and point out one more field that I said Which is this? Max update field. So sometimes when you're syncing a bunch of applications at once Maybe you have a cluster that is sensitive to like lots of churn And you want to slow things down for a particular step This lets you say only sync one app at a time or you can use percentages to and say only sync 10 percentage 10% of my apps at a time So now that I've defined the strategy I've defined three ways. I'll go ahead and sync up this app You can see eight apps were created two on the top are dev And then we've got a row for Ede and then a row for prod which you'll see when I scroll down those first two apps synced simultaneously, so that's already a difference from the previous demo and This wave has been configured with max update one So each of these apps is going to synchronize one at a time and then finally we go to prod which is going to synchronize simultaneously and In this case, I've deployed a change which is Safe, so everything's just going to be deployed out with no problems And this is just super pretty to watch like I've been given permission to quote Leo saying that like just loves this feature for deploying the metric server It's just satisfying to watch it go through the waves and everything deploy safely and correctly So I'm going to bump an image tag this time I'm going to cheat a little bit and break one of the tags. So I think the tag for Team either four or fives application for Just so you can see what happens when stuff breaks in a progressive sync, but the other one's still the same I'm just bumping the tag to point to If you notice there was some templating going on in there. That's a feature an application set I won't detail that too much, but you can template stuff with go template and app sets So again, everything goes out of sync the first wave starts syncing Simultaneously and it's going to become healthy because the point two tag as we saw earlier is fine and safe In this case we go one by one again the first one goes healthy This is the one I broke and I set a progressive progressing deadline for five seconds. So very quickly It's degraded and now we stop and in this case It's a bit like if you had an application that went to a degraded state You want to have an alert something to let you know something went wrong and then you go back you make your change It's going to work its way all the way through those waves again And eventually you're going to you're going to deploy a safe and successful change So that's what all that looks like. I'm going to pass it back to Katie this sort of Recap what this actually solves for us So wow Right, I mean within 20 minutes. We solved all of our problems It's huge Michael's a genius But really this solution has solved a lot of the pain points that we had previously right and just to cover a few You saw that, you know, we press the sync button. We can automate that but besides that sync There is no manual a toil for the developers have to deal with right and that has at increases developer delight by ten fold Right that increases our delight by ten fold. So that's huge The other biggest pain point was the automatic failure detection and the ability to troubleshoot failures throughout this process Which as you could see using the Argo CD UI is extremely self-service and very straightforward to see that broken heart Which we all fear but are all able to look into and actually determine what was that failure We can recover and we can do it seamlessly while reducing our support requests and making it more self-service Right and finally one of the biggest things at all One app set manifest, right? We all deal with thousands and thousands of lines of yaml It is a problem for us is a problem for application developers And so the fact that we can reduce that and manage everything with one application set manifest is a huge win Maybe we should have titled the talk don't fear the broken heart like boys recall Katie mentioned I'm a genius I should point out quickly that the person who wrote this features Matt Groot from LinkedIn and this whole thing came out of a Discussion we had at the first in person Argo con in Mountain Dew And then like people have built on it from there So a lot of hands have gone into building this things you need to know about where the feature is as far as application sets go They're GA. They've been around a long time. They're in good shape Progressive sinks is in alpha I chatted with someone in the hall who's given it a try they encountered a bug You're gonna encounter bugs, but the fact that we've had a few reported means that people want this they're adopting it And I'm really excited to get more bugs. So I hope you all will give it a try and submit tons of issues As far as future enhancements go UI support could be better like I sort of contrived an example where we can have these nice three rows of Applications we need something where like the UI just groups those for you no matter what Collection of applications you have and tells you here's your steps. Here's your max update setting Regina Voloshine is already working on an application set support for the UI. They're open PRs It's a big task, but it's underway, and I think that progressive sink support would be the next step Something else we could do is provide reverse order app deletion So people use app set progressive sinks a lot because they have dependencies and sometimes you want to unwind things in the opposite direction So you don't have dependency conflicts. I'd like rollout analysis that is beyond just what you can do with health checks It would be really nice to do arbitrary analyses to make sure your application is functioning Well and potentially stop a progressive sink Technically you could write a custom CR and put a health check on it to do that for you We might want something a little more powerful than that Or a little more built-in maybe and then finally shared rollout strategies for multiple apps sets This is another one There's already an open PR in order to drive things out Maybe you have one strategy that you love and you want to make available to other teams This would give you the ability to create a CR build one update strategy and let people use it across However many application sets that you want And as Michael stated these are the enhancements that we're currently thinking about right now And that have currently been brought up in the open source community today for this feature for this alpha feature We would love to continue to hear feedback from you guys. So please participate in issues within GitHub participate in CNC of Slack Reach out to us if you want to have conversations with us. We are very approachable people. We don't look very intimidating But you know whether if you want to see us in person in the halls here at Argo con or cube con reach out to us over CNC of Slack As you can see into it is also really big and open source So if you want to keep up to date with what into it does an open source whether be with the Argo project or other open source project please scan the QR code and follow us on into an open source LinkedIn and Finally if you thought this was just the greatest talk of all time Please give us feedback via this QR code. Otherwise. Thank you so much No, just kidding if you we would love to hear your feedback all good and all opportunities for improvement So please submit your feedback for this QR code here on the right. Thank you so much everyone Good canaries can kind of be embedded in this So if you're using an Argo rollout and you've got a canary strategy set up It already knows how to mark itself is degraded if something goes wrong Progressive sinks gonna notice that it's degraded and it'll behave exactly how you saw when I intentionally broke one of the apps So sort of layers on top of it, and that's the power of health checks for for progressive things Good question, so the question is is it possible to have a soaking period for example? I have something run for a week and then promoted after a week I'm not sure that I've recommend progressive sinks for something that's going to be that long living Because I want the state and get to be very close to the state in live I would move that back to CI if you're gonna have a whole week For a shorter period of time, there's currently no feature to add a soaking period I could imagine hacking on something that has a health check that doesn't go from progressing back to healthy for a certain period of time You could hack in a CR that does that So it's possible, but not a built-in feature You had mentioned air gaps being a deficiency in Argo CDs app of apps Do you see air gaps in this solution? And I guess I was wondering where do you see that kind of fit in if it does I miss the first part you mentioned app of apps Oh air gaps, so meaning like an approval process like maybe everything progresses and accept production Yes, yes, I didn't didn't demo that but basically you can define a step that doesn't automatically sink It just says wait here and then someone else has to hit the sync button in order to proceed So you could definitely auto-sync it all the way up until prod and then have someone human click that Hi, thank you for the talk. Does this work with an app sets in any namespace or? Absets in any namespace. Yeah, it should just work Yeah, that I'll try it out brand new feature 2.9 was released this morning last night depending on exactly when it hit That has absets in any name space and this should work for that You showed the the rollout hitting the failure point and stopping So when you fix that does it automatically continue or do you need to now start over from the beginning? It depends on how you fix it I think that the way you'd probably want to do it is go back fix the image because this is like sort of a bug in the Image issue. I think you'd want to roll it all the way through Yeah, if you're gonna bump an image tag you you have to update it all the way through and it is gonna take some time Yeah, so if it was just something like hey the image wasn't available After a while it may fix itself and then continue rolling forward without intervention Yeah, if you if there's some manual intervention you can do to unblock that one app Yes, it will just continue progressing the moment it hits healthy It'll just go on like like it would have from the beginning and then one other thing was I think from the demo I've got this though is that you were you were showing this is a multi environment with the implication of multi cluster So the the implication here is This would need to be run from a single Argo control plan to achieve that, right? You wouldn't be able to do a single one per cluster correct app sets are scoped to an Argo CD instance So this has to be per Argo CD instance. Thanks. You probably do some kind of magic to deploy apps across instances, but you know You mentioned the progressive isn't an alpha states Would you recommend it for production instance of Argo CD or would this be experimentation for now? Well depends on how production production is Our our team we kind of know what apps at progressive sinks is so we're willing to use it for Production things for us. I would not pass this to other into it teams yet I would want to see it a little bit more stable. Thank you The question is can you gate it with outside approvers? And you want to do that outside of Argo CD In my opinion you always gate approval and get it always starts in get I dislike solutions where you have to use some external system and hack some communication between it and Argo CD Yeah, I think I think you have to have that approval all the way back yet You know someone has to approve the PR. Adds Argo CD Argo CD sinks. What's in get it has no concept of Is this a person who is allowed to? Synchronize this part of the application set Technically it has our back. You can have some person who's allowed to hit the sync button to initiate a progressive sink Maybe but in my opinion it all starts in get if you have to gate who is allowed to merge that PR I mean get gets your state. You got a you got to make sure the right people are merging application sets currently can't have Different types of sources due to the limitations of the templating system. There's an open PR Jeffrey muceli has put in a lot of work to provide a way to have both helm and customized in a single app set That's 2.10 territory We might be about to have to jump off stage Yeah, please find us. We're happy to continue answering other questions after the talk. Thank you. Thanks so much