 So, my name is Zach Galler. I am a software engineer into it and maintainer on the Argo Rollouts project. I'm here to kind of talk about a new feature that I added to Argo Rollouts 1.5 recently, which was plug-in support. I'm pretty excited about this support and I'm kind of hoping to gain from this talk some feedback on both the two plug-in systems that exist in 1.5 as well as a third idea that I'm throwing around that we'll hear about near the end. But first I'll give you just a little bit of insight into kind of Argo Rollouts' usage within Intuit. So currently we have around 1,800 total services at Intuit in production and of those 542 are currently using Rollouts with analysis templates and binary strategy. We have what we call tier one and tier two services. Those are kind of like our highly visible services and a total of 30% of those have been converted over to Argo Rollouts as well. And if you were in Titi's last talk, we're basically striving to get that to, you know, as close to 100% as we can. So as an Argo Rollouts maintainer, there was a handful of things that I kind of noticed about the Argo Rollouts project itself. And it was that our number one feature request always tended to be to add support for various traffic routers themselves. Whether it was Gateway API, Contour, HA Proxy, Google's Load Balancer, there's just a thousand different ways to basically control traffic. And as a maintainer on the project, it basically became this impossible burden for me to become an expert in every single traffic router that existed. For the implementation phase, it's just, it's undoable. Even as a reviewer, if someone contributes a particular traffic router plugin to review that source code itself has a burden of knowledge that I would have to go, you know, learn to make sure that they're doing the right things that is implemented in the most efficient way for that particular traffic router, etc. etc. That just doesn't really scale. The other thing that I have seen happen, there are traffic routers that will get contributed by an end user. And for whatever reason, either they move jobs, all of a sudden you have a traffic router that has no support and it falls back to the original maintainers to create that support. So those are kind of, you know, from the maintainers, the project's perspective, some of the issues that I noticed that was happening. On the flip side of this, in order to have a project that's used, if someone's using a traffic router that you don't support, they can't use the product. You know, we do have basic canary support that doesn't necessarily require a traffic router, but you lose features used, you know, the granularity that you can route traffic to, etc. Traffic routers are also extremely complex. So if we have a small set of traffic routers that we support to kind of force an end user to say use Ithio is almost an unrealistic asks to use Argo Rollouts because Ithio is a really complex piece of software. And so a lot of companies will just, you know, especially smaller startups won't necessarily go down that route because the complexity is just really high for a lot of traffic routers. The other thing is there's, you know, there's companies that have custom internal tooling, right? A traffic router that doesn't exist in the open source into it. We use kind of a fork of admiral to do multi-region traffic routing and things like that. And so to more fully support larger corporations that have internal tooling around traffic routing is kind of another problem that having a fixed set of traffic routers creates. There's, and then upgrade cycles, you know, a particular traffic router will have a bug or something in it and it can't really get fixed into Argo Rollouts makes a release. So those are all kind of issues that I kind of notice come up within the project. So my solution to this was why not, why not have plugins, right? But let the, let's distribute the workload. So just a few kind of, you know, generalizations around plugins. They let the people that are in those particular areas that specialize with that particular traffic router to create the plugin. You know, they are the experts, which then in turn is good for the project because it lets the project have a larger user base, makes the project more healthy, etc. One of the selfish reasons, I guess, that I also want plugins is there are kind of larger experiments that I'd like to do with traffic routers that currently exist, some slight refactoring, some more broad feature additions that are maybe a little bit risky to change in core, but would be a little bit more, you know, easier to do in a plugin that you can mark as, you know, alpha and kind of experiment with, have something that you can kind of play around with more freely and not have to worry about breaking a core service. So those are, you know, kind of benefits of abstracting out plugins and traffic routers as plugins, etc. I'll go into just a little bit kind of how, as an Argo rollout administrator, you're able to configure plugins. We basically added a new config map to Argo rollouts that has, you can see it up there on the top right, basically we support metric providers as a plugin and traffic routers as a plugin. Argo rollouts, controller supports downloading from either an HTTPS source or loading a file, if you somehow can get the plugin into a container, you can give it a file path, which will load that up. If you do decide to use web download, I highly suggest also using the SHA256 hash check just to make sure that, you know, nothing's messing with your plugin that you're about to execute in your clusters, etc. And then the general gist of, at a high level, what plugins are, is we use a library from Hashtag Corp called Go Plugins. It basically creates a remote procedure call type setup where the controller will act as the client and the plugins act as servers. The plugins will get spun up as child processes of the controller. And when those processes spin up, they'll basically output a handshake that basically informs the controller how to connect to the plugin. I chose to use, so the plugin library itself supports GRPC and NetRPC, which is a native Golang RPC framework. I kept it as a first pass as a NetRPC service. One of the pros to that is that it allows Golang developers real easy access to creating plugins. You basically have to implement an interface. There's no proto files you have to generate. There's no code generation. It's just a pretty clean, simple process. The downside to that is that we only support Golang as a plugin language. So kind of some trade-offs there. Because these are real programs, right? They're not like a shared library. They're inexecutable. You can have two running modes as well with them, which allows you to basically run it in server mode or CLI mode, which allows you to create a plugin that does that that gives you some flexibility to do easier debugging or other types of out-of-band workflows as well, which is kind of interesting. As an end user of these plugins, you basically get, in your rollouts manifest, you now have a new, under the traffic routing section, you have a plugins field. And then under that, there's basically the ability to have a name, and that name correlates to the config map. So basically it maps which config map to run. And as a plugin author, you can basically configure any structure that you want under that that your plugin might need. And then the same thing goes for analysis template. You basically have this exact same thing where you get your own spot to basically drop your configuration, plugin can find it, and it can do whatever tasks it needs to do. Let's kind of take a quick high level overview of what a plugin consists of. They're pretty simple. You have basically you have your main executable program. And all that's going to do is call a function from the Gohashit corp library that starts it up as a server. And then you'll see that there's two interfaces here, one for metric providers and one for traffic routers. And these interfaces are almost identical to the same interfaces that get implemented as like native traffic routers and native metric providers. There's the minor difference being that there's a net plugin function that gets called when the plugin itself starts up, and that gives you as a plugin author the ability to create your Kubernetes clients create any form of, you know, like your metrics provider client, Prometheus, etc. And that runs once basically a plugin startup. A little bit of a high level just I think somewhat explanatory, but like so in the traffic router, we also have set weight. So when you go through a step and you have, you know, a set weight command, that function will be called and then it's the plugin's job to basically go configure, you know, the it's your virtual service or whatever traffic router you need to control the weight on. And then there's a verify weight function. This functions a little interesting. It generally gets used in the internal routers with the Amazon's ALB. So what happens there is the Argo Rolls control, when it gets a set weight call for an ALB, it'll add an annotation to an ingress object. The ingress object will get picked up by Amazon's ALB controller, which will then go configure the ALB to set the particular weight. Now Argo Rolls itself needs to not continue to the next step until we verify that the weight on the ALB has been set. And that function will have some interesting out of band uses that I'll kind of talk about a little bit later. And metrics providers are pretty simple interfaces. Well, you basically the important functions are run. That's generally what's going to run your query or run your metric collection and return some metric results. So you can do a few interesting things with that as well. These plugins are fairly easy to develop and work with. You have kind of the same tools that you're normally used to as a developer, even, even though that they run as a separate child process of the Rolls controller, you can still in most IDEs attach a debugger to the process, you can still do breakpoints, step through the code, etc. If you if you want to run it within the controller itself, the hatching corp Go library also allows you it basically redirects all the standard air and standard out logs back up to the child process. So your logging looks pretty consistent and feels like a kind of feels like a native implementation still, which is a knife from a plugin author's perspective, not really have to change your your workflows too much from a core, a core service to a plugin service. I had I just kind of went through some general rough ideas, because one of my main goals out of this talk is to kind of get some feedback on plugins and and an upcoming feature. So I kind of want to go through some interesting use cases that are maybe some of them non traditional hacky. One of the ideas would be like a gated rollout. So with the plugin systems that exist today, our go rollouts lets you configure multiple traffic routers at the same time. So you can have your real traffic router like it's yo, and then you can have your plugin one, and it will actually run those both at the exact same time, which allows you to basically use the verify weight function of the plugin to say gate to rollout, hit call some third party API that looks for the OK to continue to the next step, right? That's kind of one idea. One of the things that we plan on looking into it into it is an end to end test to verify that because we have a kind of a complex networking setup where we use a virtual service, but it's kind of does a lot more because of admiral, where we will use log based verification. So we will have, we're playing around with the idea of having a plugin traffic router that will look at log lines from the actual applications. Basically, the plugin will allow you to provide two queries that query for, you know, canary traffic log lines and stable log lines and figure out the real percentage of traffic that's being routed through to validate that we had properly configured the traffic router at the right percentages. So that's one kind of alternative use of a traffic router. And the same goes for metric providers. There's some interesting use cases there that allow you to either create a plugin, basically you could think of as a plugin that can control when our go rollouts will roll back, right? You have a plugin that does, you know, business level logic and specs API calls, you know, complex systems to determine should I roll back or not and you'd be able to have that control there. Log based rollbacks is kind of another idea. So instead of using a metrics provider, you could have a plugin that used, you know, log queries, you know, query splunk query elastic search for a particular, you know, errors or anything along those lines that would then cause rollouts to, yeah, will then cause rollouts to roll back. A whole bunch of kind of random use cases that are kind of powerful to play with and always looking for, you know, more ideas. What I really want to give you back on step plugins. So this doesn't exist today, but I think they are really fascinating and have a ton of power within our go rollouts. So today our go rollouts within, you know, your steps, you generally would be able to control your weight. There's set header, there's set mirror. And that's about it. I want to basically toy around with the idea and get feedback on what being able to add a plugin at a particular step buys you. These are some of my back of a hand five minute thought ideas that I think could be expanded upon. But I gated rollouts. This is the more official way of how I would probably gate to rollout versus piggybacking off of verify weight is I would put it as the first step in the rollout, right? First step, go call some API, make sure that, you know, the rollout is okay to continue. Some people, some companies have, you know, validation processes that need to happen before something deploys, you know, block periods if it's like during a high peak season, et cetera. So all kinds of ideas there. The other is what I would call a sync rollout. You can think of this as like you have a handful of microservices. You have 10 different rollout objects. And you want to make sure that you roll them all out to the same version continuously, right? So the idea here is that that first step that set canary scale, weight 10, that spins up 10% of the new pod, the canary, but doesn't route any traffic to them yet. Your second step is a plugin that will look at a list of rollout resources there given by rollout name, name space and the container name that within that particular rollout and make sure that all the rollout resources within that list match the shared version before going on to the next step. I think that's a pretty powerful idea for at least start the idea of experimenting with microservice sync rollouts, right? My last kind of example is a chained rollout where there could be a plugin that at the end of the plugin, at the end of the rollout steps, you might be interested in promoting another environment, you know, or deploying another dependent service. So in this particular setup, this plugin would support two modes of operation Kubernetes direct call, which would make calls to Kubernetes API and submit the new image to that particular list of rollouts and update those containers that are there. The other would be to do the more Git Ops-y thing and have the plugin actually commit the update in the Git repo to cause the next train to happen, basically, right? I think there's a handful of different options that can be done there. Yeah, there's other things like chat ops, you know, you can have a step that basically ping and slack and prompted your developers to say, hey, we're at 20%. Do you want to continue to the next percentage, et cetera? And so there's all kinds of interesting ideas there, database migrations, if you're going to have your first step, do some database migrations stuff, check infrastructure validation to make certain things are in place before continuing your rollout, you know, all kinds of, I think, possibilities exist with step plugins that are extremely fascinating. So my ask for people that have ideas is I do have a GitHub issue open, issue 2685. I have a QR code here for those that do want to scan it. Provide some feedback, take, you know, give a little sample YAML if you have an idea with a small description. That way, when I do design the feature, I can try to accommodate all the use cases that will really help design and make sure that everything works for all the edge case, you know, I'm one person, I can't think of everything. So, you know, yeah. And then if you do, if anyone does have ideas for plugins, reach out and the Argo rollouts, you know, issue, create an issue. We have we're planning on basically putting all the rollouts plugins under the Argo Proj Labs organization, with kind of some standard naming convention so that they're discoverable and can be in documentation, et cetera. Hi. So I want to make my own plugin. How do I do that? So like I was saying, if you saw that there's two options, I'm in CNCF Slack, reach out in the Argo Rollouts channel and we can talk. I can help you guide you through things, et cetera. As I mentioned, if you do want to do an open source plugin, I do plan on putting them in the Argo Proj Labs org. If they want to be listed on the Argo Rollouts main pages, like there's going to be a list of like, you know, plugins there, right? Yeah, or if you're doing something internal and you want help, just definitely feel free to reach out, you know, create an issue, ping me in Slack, et cetera. And I'm definitely wanting to grow the plugin ecosystem. So willing to offer support wherever needed. Coming in. A couple of use cases which you mentioned requires some of the secret data. So what's the situation right now? Can you inject it securely via secrets? Or you just mentioned config mob there in your examples for the for the stat plugin stuff? Yeah. So one of the ideas to kind of store or give the plugins the ability to have a place to store stuff within Kubernetes. Most resources have the status field. And I was basically going to allow the same idea of having the plugin name for the configuration operative resource. Each plugin would be able to define key in the status that they can basically use the status of the rollout object to store some small state that they need to determine particular whatever they need to determine, right? That's one possible spot. The other is the plugin itself could create its own CRDs and manage state there as well, right? Okay, thanks. Thank you.