 I'm gonna talk about the life of a CVE and understanding the project's release cycle. They're two very intertwined things. I'm James Strung from ChainGuard. That's me. Solutions Architect at ChainGuard, which just means I help customers secure their software supply chains, using sick store policy control or a bunch of other really cool tools. Salsa at, you know, reports, things like that. I'm also a maintainer. I probably just copied and pasted this from other talks, so obviously I'm a maintainer. I'm talking at the maintainer track. I'm also an author of Networking in Kubernetes, a Klai Guru instructor on the same one, and a Gemli Cosplay enthusiast. I'm on the Community Alliance's team, so the mandate of that team is basically to work with folks like James and the rest of the community. All of my job folks is on open source projects and open source contributors. I'm my primary partnership for the architect side is the Ranger project. I also co-host the engine room with Alessandro here. That'll be, that might be on later today. I do a couple of things, game development. I also do cameras for something where we fly spaceships for a eSports organization called Admiral eSports, and I am very curious to see what James is doing now with someone. I should have wore it. I know. Sorry, agenda. We're gonna do a CVE review. We're gonna go through an actual CVE, the timeline, everything that happens behind the scenes. One of the things I wanted to talk about with this is just show you all that there are lots of other things that we're doing besides not approving your PRs. So, engine X core CVE remediation. We're actually gonna see what happens when engine X gets a CVE, which probably is a little more impactful, I would say, than Ingress getting one. And then we're gonna go through the release process. We can go in depth as much time as we have so you guys can see what happens when I break your clusters. But first we're gonna talk about a little bit about how engine X is configured, Ingress engine X is configured. So we've got about 118 annotations that we support and we're actually been working through and going through and making sure that they all have documentation, they all have end-to-end tests. And a new thing that you're gonna see coming out very soon is validations. So you see that from a annotation perspective, this is a configuration snippet, so being able to update the engine X config and controlling it through the annotation. So there's two ways to do that. It's the annotation and then a configuration option. So if you wanna make changes globally to engine X, you can do that with a config map. So you wanna set up things like the proxy connect timeout. So if it's not covered in an annotation, you can make sure it's covered in a configuration option. So that's how we kind of get the best of both worlds to be able to configure those. So annotations you can do per Ingress, the configuration options are globally for all of them. So that's how we get through that. But that sets us up for the CV that we're gonna talk about. So one, make sure that most folks, if you're not using it, you know how it's being used. So let's talk about what happens when we get a CVE and just walk through that whole process. So we do this in concert with the Kubernetes Security Response Committee. I think that's what the SRC stands for. Thank you, Carlos. He's not in his head. So the CVE that we're gonna talk about, we got the report last year in July. So it's a Hacker One report. Somebody submitted it to the Kubernetes Hacker Report, Hacker One team. And this is what the report looked like. So there's a lot on here on this page, but the piece that was important that we weren't aware of that could happen is that right here, somebody could mount the access token and you could request that. So that's the access token that has access to all of your TLS secrets on your Kubernetes, on all of your ingresses. So that was a problem. So somebody reported that and we went through and walked through how we fix that. So it took, it takes some time, right? The SRC is also a volunteer basis. So they got it, they verified it. They know that it was a reproducible vulnerability in August and they let us know around September. They probably emailed us sooner and I just didn't see it because I get 200 GitHub notifications and everything goes to a label called ingress. We got a nice email from CJ. He's great. He's with us on the SRC team. There's about five, four or five members. I've got a link in here that says who is all on the SRC and their whole processes. So Ricardo put together a remediation. Ricardo's one of the other maintainers. He's not able to be here with us. He's here with us in spirit. And we put together that release mitigation and the 101 release of a nice little picture of his face. So a lot of the times when we put PRs in and we accidentally break things or we're trying to put new things out there, it possibly is because we're trying to fix a CD that we haven't released yet. So October 29th, they released a CD. We have the fix in place. We've notified the distributors of that and we'll let you all know. From a CD reporting perspective, again we talked about the SRC team. Those are the folks who are working behind the scenes that when a report gets emailed in from the reporting. So these are all links to all of this to understand a little bit more in depth how that works. That was a very quick overview. And then we have that all in the NGINX security reporting to be able to report a vulnerability. But very quick high level overview. There's a lot of back and forth conversations about is this an actual vulnerability even after they've validated it and working through does the fix actually fix what was reported? With that, that was a very quick overview. But I wanna talk about, I saw this the other day when I was reading, I think it was about securing software builds. And some of you were talking about if you let people run arbitrary code, arbitrary code's gonna be able to be ran. That was a perfectly valid configuration option for a snippet. Like there's absolutely nothing wrong with that. So one of the things from the validator's perspective is we're gonna try to look at understanding what folks are trying to serve up. So obviously nobody should be serving up the service account token. There are other things that we're trying to remediate that like I said, there's 168 configuration options and all of that. So trying to make sure that a user is doing the right thing is very difficult because we allow folks to run arbitrary code. You can inject Lua code in there. It's just, it's really hard to do. So allowing people to run arbitrary code, it's part of it's gonna be on the ingress administrator, part of it's also on us. And it's just something that we have to continue to work on. So when I saw that, I was like, it's set up properly to do that, but there are some protections that we need to put in place to help work through that. So, and then the other thing, I lost was right. One of the things that we don't do and one of the reasons why we're putting the validators in place is that we don't validate user input. So how many people know about the emission webhook that we also use? It just validates the ingress configuration to make sure that it's a valid ingress object before it applies it because you have to do a reload and lots of things in that. So we wanna make sure it's a valid ingress object, but we don't actually, we're just looking at, does this actually work? But we're not looking at what's inside that. We're not validating what the user is trying to do. So making sure that they're not trying to mount the token or doing other nefarious things that they shouldn't be talking across namespaces, looking at things they don't have access to. So we're trying to work on validating more of that user input. So from an annotations and a configuration map perspective, that's why we're putting that in place for the validator. That should drop pretty soon. I think the PRs out there are just working through all of the end-to-end tests because we're doing this from scratch, 168 things. We've already seen that there's end-to-end tests and documentation missing on these things. So it's actually gonna be really helpful, I think, from that perspective. And with that, we'll go ahead and do the, we'll go ahead and talk with the CV remediation. All right, so I'm gonna cover Nginx core remediation and at a really high level. So I sat with the core PM 904 site. He said it was fine to give you guys your email address, but instead, I'm gonna just recommend you kind of come on to the GitHub repo, hit up James or I or get on the Slack channel and can raise and go through me and then I'll get whatever you want. Any questions you might have for Nina in this regard. But basically, the way our process works is we'll get a submission from something like a researcher and they'll come to us with the CVE and we'll take the information that they have and then we validate it by putting through it putting it through a set of like, you know, tests we all get in a room. We kind of argue over what is and what is correct what the threat vectors might be. Some of the things that also happened during this process is we first were just we're trying to decide is this or is this not a vulnerability? It obviously is almost always a vulnerability, but is it mainstream or not mainstream? So the big differential there is that sometimes we'll get vulnerabilities that are that are very, the use case is very small. So they're super, they're edge cases. They're very limited. So some of those we just kind of say, okay, this is more academic in nature. So we kind of put those in the, this is nice to know, but we're not going to do anything to fix those. But in other instances, the ones that make it to public we'll find out that those are legit vulnerabilities. Those get published. We work with the researcher to establish, you know how we want to do this. Then the next process is the information about that vulnerability goes into what's called an embargo. And basically the embargo is, is we work out with the researcher and the reporter what types of information can we relay? What can we send out to the public? What do we basically need to keep under wraps so that ultimately when this does hit prime time the researcher who makes their bread and butter off of finding and writing about these things can get the credit for the discovery process. There's no real money that, there's no money that exchanges hands. It's all basically a credibility based industry. So the next part of the process is essentially we've qualified it for mainstream. Its feasibility is deemed an actual threat. And then we basically go through the process of looking at further edge cases. So in some cases we'll get a vulnerability to come as one vulnerability. And then after we've done a bunch of research on it we'll figure out, well, this is, there's actually three vulnerabilities. So we'll basically publish additional CVEs to cover the additional threat vectors that occur there. This is also the point at which we kind of sort of decide how are we gonna fix this? What is that release process gonna look like when we do fix it? And how do we wanna go about the next steps? That process in total generally cut takes it roughly around two weeks. So it can be anywhere from three to seven days of just debating about the stuff that we just talked about. And then there'll be a part of that week where we're kind of figuring out how we wanna do this. And then there's like two weeks of actual code. Generally we'll get something out. So this is when the next step of the process we start working on the bits to publish the remediation which quite often comes in the form of a patch. And when I'm talking about core I probably should have qualified what that means. That means Nginx proper. So you go to Nginx.org you got the list of builds that are up there. That's Nginx core. So while it is the upstream or the progenitor for what happens in the ingress sig it also is used in by all of the distros for building distro packaging and a whole bunch of other things. So when we get a CV it not only affects ingress it affects literally the 1.7 million use cases out there that use Nginx. So we're incredibly tight-lipped and careful about how we fix things because obviously one minor mistake when you have 20 million websites that all of a sudden are not losing money so we really don't wanna do that. So once that's decided it's around two weeks of actual code and then by the 30th day we're usually looking at cutting a patch. That'll go out and during that time we'll notify the reporter. We'll lift the embargo. Folks like James will get information actually right around that time is when anybody who is an interested party is able to get the information. Now when I say get the information we don't have like a proactive notification that goes out with the exception of the list of PGP encrypted email members that are a part of the original search submission. Now this is something that we've kind of looked at as how can we make this how can we shortcut that process to make things faster for James and the ingress sig as well as distros in their packaging and on and on and on because obviously the quicker we can get that up to the universe anybody who's done the fair share of coding in here understands that busy polling is a wholly inefficient way to do anything. So proactively notifying people when that's available that's something we're kind of working on the details on so that's something we could look forward to in the future. And then 90 days is kind of this industry standard almost never do we go beyond the 90 day but this is what's considered the standardized patch cycle. A lot of this is around enterprise grade stuff because most people will only roll patches on a 90 day cycle. So some of this pertains to the plus product and products that are enterprise grade but most often we'll hit a release window of about 30 days. And that's pretty much the entire process. The next step in the process is it's handed off to the community and then James is the ingress sig as well as other groups that are interested parties kind of take it from there. I'm doing real good on time so if you have questions while we go through this release process just ask. We got 15 minutes still so I know we've got time for questions and things like that. So as I talk about this if you don't know what the release process is or if there's something that's like what is he talking about just throw it out there. So the ingress release ingress engine X release process we have four stages of it and each one of them is a different PR and I've been talking and thinking about how we can change some of these because as I break things I've learning new things about like there's a lot of get ops folks out there that will automatically roll out a patch release so that's fun to know and if I accidentally release a breaking change on a patch because I didn't know breaking change was in there I'll break things. So just learning and understanding that need to be very aware of things that could go wrong maybe not as a huge of an impact is engine X but still you can break a lot of clusters at once. You like breaking things so. Yeah well you know that's how I learn about it. Nobody tells me they're using ingress engine X so I only know when people are using it when it's broken. It was the best way to figure out who's using whatever rack thing you've got going on there is to pull the network cord and find a head way for the emails rolling. Exactly. So one of the first steps right is that we we manage this with a tag file. The tag file gets updated that's the one seven oh that will kick off a Google cloud build. So as a Kubernetes sub project we build everything in the in the Kubernetes realm. So when you hear about that that G cloud build I don't know what percentage of ingress engine X is responsible for that but I know it takes a while to build ingress it takes us about I think you'll see it in one of the slides. It takes about if there's an engine X change or if there's a CD remediation we've got to put out takes about four hours because we recompile engine X from source. So that build could take four hours if it succeeds. So is anybody using S390X by the way? No, oh that's going on the deprecation list. I don't see any hands. So we build across five or six different architectures across four or five Kubernetes versions. And then a lot of other things that I'll talk about in the later slide but that all gets built in this first PR. So that gives us the container image for the new engine X either the engine X or the controller. So all of our end to end tests everything that we're running we build on top of an engine X base image. So if there's a change or vulnerability in engine X or any of the pieces that we're using that's to container build. So I have to build the engine X container. I have to update all of our testing framework to use that new engine X build and then I can release a new controller. So I didn't even talk about that piece. That's a that's a pre PR for all of that. So that's kind of how things are built across that. So once I've got that new container build Shaw since also we're a Kubernetes sub project we also have to do image promotion. So the image promotion takes that from staging into the production. That's another PR for the Kubernetes org. How many people knew that about those staging about the promotions? Carlos don't raise your hand. You know about that. You write the code for it shaking your head. So in order for us to get that image out to you all we have to do what's called the promotion. So it's another PR another piece of YAML. So that takes anywhere from one minute to three or four hours depending on if I have one of my maintainers available to accept that PR. So that does the push that does the proud build that signs the container that gets everything ready. That's the point where the the folks who do the patch releases the get ops folks if it's available they'll run it and I haven't even done a release yet. So that's part of the release part that we're thinking about maybe changing of doing the GitHub release and then doing this this staging promotion. That way it's not available for folks to try to use but I'm wondering how many how fast some of you are are going to yell at me that that image isn't available when I do a GitHub release. Or how fast is your automation and picking it up? So we're working on trying to figure out how to optimize the build on trying not to break things or making things available until we've actually officially announced them. Excuse me. This is the one that recently has been automated. So all of those pieces are pieces of documentation. So updating the helm chart, we have to update the home chart version. The home chart has its own versioning strategy because a configuration option for a helm value could change but the controller we're not updating. So we can release the helm chart separately and independently but with a new controller version we have to do a version rev of the helm change of the helm chart. It also requires updating the change log as well. So helm, it's got its own release cycle. It's got its own change log but also we have to update it as well. And the version controller update the change log update we've been working really hard on automating that as well. It was an artisanal note that took me about three or four hours to handcraft and try to understand. But that's mostly automated now. And then also the documentation updates but actually thinking about getting rid of that last piece that documentation update because really all it is is changing the version string in the examples to the current version. I'm just thinking about just putting it to main so I don't have to update that one. So just working through some of those thoughts. So that's the one that's completely automated now from that perspective. But again, that's the third PR in this whole process. And then we've got that PR. We've got everything in. I tag the release and I make an actual official GitHub release. That's the end part of the cycle of saying that there is a new release. Here it is. But like I said, it's been available already for possibly a day. And there's no release notes. So when things break or something happens like guys, we haven't even released this yet. So can we not, can we wait? So we've got to work on that timing, the communication. Part of that is utilizing the Kubernetes Dev mailing list more. We actually have a very old Twitter handle that I'm getting the username and password for because KubeCon is great for tracking people down in person who's not answering your slack DMs. I know that because some of you do that to me. So I do that to others. So yeah, it's great to talk to people in person. So incidentally, that's one more person than we currently have to work on the SIG right now. So, and I don't know who in this photo is who. Do I get to be bomber? Sure. Okay, thanks. I don't know who I'd be. I'm just gonna be awkwardly standing up there. I use all my energy at Kubeoke or apologize that all of you. But that is the high level of the change release process. And again, like I said, it's three GitHub PRs and a GitHub release. Only the one piece is automated right now. So that could take anywhere from 30 minutes if we're on a Zoom call together and we want to get it out very quickly to maybe taking a week because people aren't available, people aren't on vacation. We only have three maintainers who are gonna approve that. One of them is in China and the other one also has a day job as well. So we're working with SIG Contributor X. I did a couple of workshops this week trying to understand how we can help build a community so we can take people who are one time contributors to getting them to a reviewer to ultimately getting them to approvers so that we can help grow and get those to be a little bit faster. We know that there is a backlog of PRs and issues. One of the ways that you can make sure that you can get your PR pushed through or you get your issue talked about is coming to the community meeting. Come to the community meeting, things get talked about, things get worked on and they get pushed through. That's how we got the open telemetry module updated. Is anybody using the new open telemetry module? Did anybody know that you could use open telemetry now with Ingress Engine X? Thank you, one person for validating me. How many people didn't know you could? Don't do that to me. So part of that is on the maintainers as well is getting that information out there. So trying to work on understanding how we can get more information out to the community. We have the two Slack, we have the two Slack's channels. So we're just working on what's the best way to get this information out to folks. I already talked about that, but just want to click through this so folks can understand. I presented this at the last KeepCon, I just want to keep reiterating this to folks. There's a lot that goes into Ingress Engine X, that's just not Engine X or the configuration option. So if they're with Golang, Alpine, Engine X and Lua, if there's a patch that's needed in any one of those or if there is, when we do an Alpine update or a Golang update, the Golang update is the same. We have to update all of our test runners so that we can actually compile it against that Golang version. So that's two more PRs, one to update the image and one to update the SHA for all of the testing. Same thing with Alpine, Engine X and with Lua. So did anybody know that we have a QCTL plugin? Yeah, we got two, two, two, one's better than two, so get in there. So the QCTL plugin, that's also something that we have to maintain. We've worked on automating that as well and we fixed it. The whole Mac switch and architectures helped fix that and shed light that we weren't actually releasing new QCTL plugins. So again, just learning new things. I think I've been helping maintain the project for, oh wow, two, three years now and I'm still learning new things about it. So trying to lower that learning curve for new folks. Two monitoring frameworks, that's gonna go down to one, once we deprecate those and push everyone to using the open telemetry module. So not going to do that until everyone knows that the open telemetry module is available. So thank you for that data point. Third party plugins, so from a Lua perspective, if you need a Lua module or if you need, I forget, I don't know the plugins off the top of my head. So we help maintain three of those. These are four hour build time. If there's an engine X change, there's just a lot that goes in. So the, any changes in those engine X modules that we use or someone requests a new one. And I'm not even gonna talk about that number, the depths, command line flags. So there's just a lot with the 400 end-to-end tests. Our end-to-end tests across those Kubernetes versions and now with LTS we're gonna have to do N minus five. It takes about 50 minutes to run an end-to-end test. And that's gonna be on every PR. So it's not a small process. If you have any questions, please let us know. This is the QR code for the feedback. I'll let us know how it went. We've got the links this, I haven't uploaded this yet, but I'll tweet this out. I'll put it in the Slack user channel, the Slack dev channel and try to get it out there. But we have a dev channel where we talk about the release process, PRs that we wanna work on, things that the maintainers need to work through. And then we have the user support. How many people ask questions and user support knows that it exists? Okay. I'm not getting a lot of hands up. How many of you use Ingress Engine X? Okay. So I'm glad, I'm really glad that it's, I'm sad, but I'm glad that we're having this conversation about all of this stuff and how this stuff works, because I figured that was the case. A lot of people use it, but they don't understand how the sausage is made. So when you guys go looking for new features and updates to Ingress Engine X, where do you look? Just in the release notes, okay. Well, that's good to know. We have a new contributor doc, but I think it needs work and it always needs work. Our docs always need work. We meet every Thursday at 11 Eastern. We've been trying to think about and move about the times, but it's hard because it's the middle of the day and everyone's working. And if you wanna talk, if you wanna see about what we do talk about and the meetings, community meeting playlists is out there as well. Right on time, I guess. Any questions? If you have a question about a PR, just no. I'm kidding, I'm kidding. Yes. So the question was, I mentioned the four hour build time. Does that include tests as well? No. That's compiling, so I'll go back. That's, when I say four hours, because we use Lure modules and a bunch of Engine X modules, we have to compile Engine X from source. We can't just use the Engine X binary. So that's the four hour build time. And that's across the, how many? The four architectures. And that's why I ask the question, because usually when S390 fails, it fails the whole build and that's really annoying. That's why I ask this S390 question. So yeah, it's not the testing. So we build that and then we do our integration testing with that new one. So that's why, build a new Engine X image. We do a pull request to update our end-to-end testing to use that new Engine X version and we can do the testing. So it's three PRs before we can even start testing it. We're working on optimizing that, but again, it's optimizing the release process, communication, or lack thereof, and also working through the PRs and managing all the other things. So yeah, it's fun. So that's why I made the joke before we started recording and asking folks that we need contributors to help and I need to do, we need to do a better job as the maintainers to lower that bar very eventually so we can get people to approvers. And Dylan, this one's for you. So, yeah. Thank you, there you go. But, anyway. By the way, that's a real axe, 100%. Yeah, I couldn't bring it. And wouldn't you say you can kill people with it or something like that? No. No, I would not. Anyway, thank you all for coming. I don't, were we 35? I can't remember. Anyway, any more questions? Let me know. We're hanging out at the Changar booth and at the Engine X booth.