 So I'm Michael Crenshaw. I'm a software engineer at Intuit on the Argo CD team. My name is Henry Plixt. I'm a product manager also at Intuit, and one of the Argo maintainers. No mouse. There you go. So just quickly, since we're both from Intuit, just want to give you a quick introduction to who we are and give you a little bit of understanding where we're coming from. So we're a financial technology software company based in the US. We have most of our business in the US. And we spent the last two years building a platform that comprised of five pillars to modernize our infrastructure. So we now have moved all our services to this platform. And this platform now serves about 58 billion machine learning predictions per day. And during tax peak season, which is our main time of year, we push about 3.6 billion requests through this platform. And if you look at the Dev environment that we have, when our teams go in and create a new application to build a service, we automatically bend a namespace and a bunch of other things automatically for that application. So we have about 16,000 namespaces in our environment for the 3,000-ish services that we have running. And these are developed by the 6,000-plus developers that we have that work on on all the services for those products that we have. So this platform and this journey that we've been on for the last few years has given us a six-fold increase in developer productivity. So that's a huge lift and a huge benefit to all our developers. And Argo has been one of the key parts of that. We're also strong believers in open source. As some of you probably know, Argo, the Argo project came from Intuit originally. But we also have other projects we're also heavily involved in the community in other areas like Kubernetes, Istio. We have a new project called Numer Project that we announced very recently. And we try and contribute and work with the open source communities as much as we can. And this was also recognized in 2019 when we won the end-user award. And for those of you that were in the keynote on Wednesday, we actually won it again this year. So a big thank you to the end-user community for the recognition. We're really happy and proud over that. But this talk is about Argo, not Intuit. So just a little quick background for those of you that don't know Argo, which I'm guessing is not too many of you. But one thing you probably don't know is that Argo actually turned five years old last week. We had one of our maintainers, Jesse Suan, did the first PR on October 17th five years ago, which was the birth of Argo. So happy birthday, Argo. And Argo has four sub-projects, Argo workflows, Argo CD, Argo rollouts, and Argo events. So the most we're going to talk to today is Argo CD, but a lot of it covers the other projects as well. We have hundreds of companies using Argo in production today. I'm sure some of you here use it in production as well. And over the years, we've had over 7,000 people that have contributed to the Argo project. And we're growing with somewhere between 30 and 50 new contributors every week, so it's a pretty amazing growth. And all these contributors and all these companies have actually also propelled Argo to be one of the fastest-growing, highest-velocity projects within CNCF. But so how did it start? So where did our security journey start and why did we get interested in security? So Argo started as an incubating project about two and a half years ago now. And in February of 2021, roughly a year and a half ago, now we apply for graduation. So as part of the incubation and graduation process, there's this focus on security. You have to go through security assessment. The TOC was involved. You have some sponsors that help you, you know, guide you through what's needed to be a mature project. So that kind of helped us increase the security focus on the project. And at that point, we didn't really know what we didn't know. And that's kind of what this talk is going to highlight, like all the things that we learned along the way. Because back then we kind of thought we were really good at security. We hadn't really had any CVEs. And if you don't have any CVEs, you must have good security, right? We also thought there were a lot of maintainers in Argo that really knew security. So we thought, you know, we know security. We're good at security. But it was very siloed. We didn't really share that knowledge. So there were a few people that knew some things very good. And there was some others that didn't really know anything. We didn't really know who knew what. And also, we looked at, like, oh, we've done A. We've done B. We've done C. Our security is good. And I'd really realize that security is something that evolves. As the attack vectors change, as new threats come out, you know, you add new code ever, the security needs to change as well. And new technologies, new tools that come out. So it's something that has to be a process. So as we were going through this, we also had a new experience with our first day's zero CVE. We had a couple of CVEs before this one, but this was the first one where a security vendor reported a CVE. So just want to stress that we worked really well with these vendors, an awesome vendor. We worked together with them hand in hand, you know, with the embargo for the issue. We wrote a patch. We talked to them about, you know, when they were going to release this information. And it was a 7.7 CVE. So it wasn't, you know, super high. It was still seen as a high, but it's not, it's nothing crazy, right? So we figured, you know, we're doing good, right? We'll talk to the vendors. We have a patch. The embargo lifted. And this is kind of what I wake up to the next morning. Because what we didn't realize and what we hadn't really talked about was that this vendor didn't just fix this, report the CVE or help us with this to be nice to us, right? They had a vested interest in this. So they want to make sure that they got some benefit out of this. So they went massive media-wise in terms of every single media outlet you can think of had an article about Argo and this security issue. So the good news is, you know, we, you know, CNCF measures, you know, the number of media mentions. So, you know, Argo had, you know, more than the next three together. So, you know, all PR is good PR, right? So this was also an interesting learning experience, right? Because we figured out, you know, we need to communicate better. And not just within the group of maintainers, but also with the people outside of the group of maintainers you work with. You know, had we talked more to the vendor and kind of figure out what their intentions were? I mean, they were good. I mean, they were doing this as part of the business or doing it, but we just saw it's a CVE for the project. We need to fix the code. And we didn't really realize that, hey, there's some wider implications of working with the security vendor on this. And we didn't really have the processes in place either. Like, we knew how to fix the code, but this whole part of the PR marketing collateral, working with external vendors and all that just wasn't really there yet. We had it for the code, but not for anything else. And then the last insight is like, it's not as senior as it sounds. Trust no one, but figure out what their intentions are. Why are you working with them? What's their end goal? What's the benefits they're looking to get out of it, right? Had we had to spend a bit more time on that and talk a bit more to the vendor, we could have figured out, hey, they're going to do this massive media campaign. So we should probably have some response to that as well instead of being bombarded with Slack messages and emails the next morning saying, like, what the heck is going on here, right? So I think that's really, I mean, all of us engineers are really focused on just fixing that code issue, but a project is bigger than that, right? We need to figure out how to do these other things. So some of the things we started doing with this newfound knowledge was we started replacing some of the siloed knowledge that we had with more structured information. So we started documenting these processes, started documenting what should happen when there is a CV. And Michael will go into more details about that later, but figuring out who knows what and in what order do we do things. This also helped us figure out what processes should be in place and, you know, what steps that need to be taken. And that also, you know, working together with the TSC and the security tag also looked at some other tools that could be incorporated into this process. And then lastly, we want to make sure that it is security, so we can't really talk about everything right as they show up, because you need to make sure there's some embargo, but you also make sure that there's enough transparency that it doesn't look like we're hiding something, right? So it's like figuring out that balance between transparency and embargo and folding that back into the processes and documentation that we were doing. Yeah, so the first thing that happened when we got the notification about the new CDE, the new Zero Day, was it came via email and then immediately a few of the core maintainers opened up DMs in CNCS-like and they're talking about, well, how severe is this? Who knows how to fix it? And that slowly turned into a temporary private Slack channel, where just the people who were necessary to the resolution process were there to talk about it. So the first thing they needed to do is discuss how are we going to respond to this vendor, because that's on an email thread. So they coordinated what information do we need from them, what information do they need from us that they get proper credit for this vulnerability, et cetera. So that was all done on this private Slack channel. The next thing that happened was they workshopped the fix. So we found the person who was the most familiar with the code that was vulnerable and got them in a discussion with initially how we're going to put together the patch. And then finally, that channel became the place where we coordinated releasing the patches across all of our currently supported versions. And Argo CD supports the three most recent minor versions. That Slack channel, we quickly realized, wasn't simply going to be a temporary place to talk. This wasn't going to be the last CVE that we faced, and we needed to make it permanent. So we created a new permanent private Slack channel, and that became the basis for our new Special Interest Group for security. And now having this Special Interest Group gave us a few advantages. The first thing was we established a group of people who had need to know about new vulnerabilities as they came in, so people who had some trust relationship with the project, mostly by having been a maintainer for a while, and typically some interest in security. It also gave us a place to discuss when and how we could expand the need to know, because there have been vulnerabilities where the people in that group don't necessarily have all the expertise they need to evaluate it, and then communicate to our users about that vulnerability. So that channel gave us a place to say, okay, I want to bring in this other expert. Are you all cool with that? Do we have a good trust relationship with them for an individual vulnerability? Second thing this gave us was the ability to handle the embargo process. So a lot of companies use Argo products as the basis for their paid products, and those folks don't want their customers waking up and reading about a vulnerability and saying, oh, no, we're vulnerable to this as well. They want to be able to say, we've already patched that and you're safe. So this gave us a place to make sure that all the vendors got the patches in time and had them released before the open-source project released the vulnerabilities to the public. That embargo process for the moment has just sort of de facto been limited to the folks who are in that security interest group initially, so early maintainers, et cetera. And we realize that over time, there are going to be vendors who aren't part of special interest group security, but they do have products that they want early notification for bugs with. So we're going to have to set up a process where we can establish a trust relationship with vendors and make sure they get patches ahead of time in order to get their products patched. That's still to be done, but we have good examples from other projects that we can follow. So this special interest group also gave us the opportunity to reflect on things that were missed. In the four months after that first zero day dropped, we handled a total of 15 CVEs across the four Argo projects. And a lot of that was because we just started looking. The maintainers thought, okay, well, if this was vulnerable, maybe there's something similar to that that's vulnerable. And in parallel to that, while we were going through the graduation process, an organization was sponsored by the CNCF to audit all of the Argo projects, and they identified, I think, between six and seven CVEs all about the same time. So the workload became very heavy very quickly, and at some points we had as many as six vulnerabilities that we were actively trying to understand, patch, embargo, release all at the same time. And that's a lot of work, and it makes it really easy to miss steps. One of the things that is easy to miss is we have an email channel. That's how people tell us about vulnerabilities. And when our security special interest group wants to communicate with them, there are several people who see that email, and they might all at once say, okay, we're going to respond to the security vendor and discuss the vulnerability. If people don't communicate among each other, the vendor can get frustrated if they start getting mixed messages or inconsistent communication with our project. So over time we've learned to use, particularly the Slack channel, to coordinate that messaging. Seconds, reviews of security advisories and patches get delayed. So this isn't just one patch per CVE. When a CVE happens for Argo CD, for example, we don't just patch the most recent version, we patch the most recent three versions. And sometimes that cherry pick isn't clean. The patch is a little bit different. So that's 18 patches that need to be reviewed. And then you have a security advisory for each one, which is just text, and that's sometimes really in-depth. And you need someone to review that and iterate it. So when you have these many things being reviewed and iterated all at once, it can be really difficult for people to remember what the heck work is still in flight. So things get dropped and things get delayed. And finally, and this one was on me. We had a vendor who needed to know about a patch and get their product patched for their customers. And I was so busy pushing out the patches as quickly as I could that a patch went out and a get-up security advisory was released before they were ready. And it caused problems for them internally. So we needed to find ways to avoid that happening in the future. And the way we did that was we started formalizing and documenting the process that we had learned sort of in a trial by fire. First, we created a get-hub repository, and it's private, and it contains two files. A readme, briefly saying what it is, and a get-hub-issue template. And that get-hub-issue template is an extremely detailed list of all the steps that need to be taken for every CDE that gets reported to us, from the moment we hear about it to when the embargo is finally lifted and all the patches are released. One of my favorite parts of this list is very near the top, and it is we have to write a security advisory draft. This is extremely important to be early because when you write a security advisory, you start investigating the standard things that go into one, what versions were affected. So you go back to the code and read it and understand why was this feature or this bug introduced in the first place because your patch is gonna be informed by why does this code even exist. Another thing that it does is it causes you to think from the attacker's perspective because when you write a security advisory, you're describing how to perform the attack so that your users know this is what we're defending against. When you write it that way, you understand potentially different ways that you could perform an attack that weren't immediately apparent. And it became obvious that this step needed to be early one time when we had a CVE. We quickly wrote a patch, like within hours of hearing about the issue. But when we went and wrote the security advisory, we realized there was a different avenue to perform the same attack that we hadn't addressed. So if you end up writing a checklist like this for your own project or own organization, I recommend security advisory draft goes really early. And finally, part of the process that we introduced was just having meetings with this new special interest group. When you have that much flight and work, even if you have lists in GitHub, people sometimes lose track of what they're actively working on. So at first, when we were dealing with, you know, the four months of 15 CVEs, we would meet once a week just to touch base and make sure everyone was on track. And over time, we've been able to expand that to every two weeks, and that's worked very well for us. Something else that came out of all of this process of dealing with these CVEs was we found some patterns. One of the issues that we continued to encounter again and again for five CVEs was our repo server component was vulnerable to directory traversal and SimLink following attacks. And without going into, you know, unnecessary detail about what the repo server is, basically what it does is it goes to Git or Helm, gets manifests that are to be deployed to Kubernetes, and then it compiles them using Helm Customize, et cetera. Then the application controller picks up those manifests, puts them on to Kubernetes as resources. If you have multiple tenants using a single Argo CD instance, or even multiple users, and you don't have a full trust relationship between them, you want to make sure that one repository owner can't write their code in a way that traverses out and reaches someone else's manifests, because it might have secrets in it or just sensitive information that you don't want those folks to have. So that's the repo server. With five CVEs related to directory traversal and SimLink following, we started to lock things very heavily down. First, permissions go to zero when the on-disk cache of the repository is not actively being used. Second, we use ephemeral copies of manifests for user-contributed or user-written plugins. So that plugin gets a copy of the manifests, does everything that it needs to do, and then we delete that. There's no way to traverse to it because it doesn't exist anymore. Then we started using cryptographically secure UIDs for the cache paths, because previously they had been deterministic. They were based on the repository URL. So we randomized them. If you can't guess the path, then you can't traverse to it, and you can't write a SimLink that goes to it. And then finally, SimLink gave us such fits. We decided just to kill them if they reached outside of the repository at any point in sort of the chain of the SimLink. Then we just reject that repository and say, sorry, you've got to change your repository to get rid of that. Second secure coding practice we picked up was as part of the security audit that happens, there were some issues in, actually, I believe workflows where the cryptography was not secure enough. There wasn't sufficient entropy for the purpose that it was being used for. So we did a good full audit of all the cryptography we were using in Argo CD, and discovered some more places where we had insufficient entropy or we were not using a cryptographically secure... Excuse me, I'm back with you. We were not using a cryptographically secure random number generator in places where we should. So we issued CDEs for those where it was appropriate and then just in other places where it wasn't as sensitive, we just fixed the issue. Finally, and I think this is pretty huge for our users, we started improving our logging. A lot of events that happen in Argo CD are related to security but they don't necessarily mean that someone's attacking you. There are places where Go can throw an error when you try to close a file. And by itself, if you fail to close a file, that's not an issue. If it happens hundreds and thousands of times, it can be an issue and cause a denial of service. So we have introduced to our structured logging a new field called security and if that field is present and it has a number sort of designating the severity from one to five, this event just needs an extra look in terms of it may indicate an attack underway or a potential issue in the future. So that'll help our users monitor Argo CD for upcoming issues. We've also tightened up our supply chain security. So we've introduced SBOMs to all of the projects, which is just a digest of all the dependencies that are included in our CLIs and our images. It also provides a snapshot of releases. So we use a bunch of base images and dependencies aren't pinned. Each release may get a new version of a package. This gives us the ability to go back to a release and see precisely what was installed at the time of that release. And it also just gives people the ability to scan a release and say, does this fit our threat profile for our environment? Can we install this software? Something that's new as of a couple weeks ago as part of the security slam, we started signing both of our images and our CLI binaries. So if you go to the GitHub repo and look at the latest Argo CD release, you'll see a .sig file, and we sign the binaries. So if you use one of our binaries or use one of our images, you can verify on your side is the thing I'm about to run, the thing that the Argo CD team says that they built. And a final engineering best practice we introduced, and actually this was really thanks to CNCF, we started doing some fuzzing. So the same folks who did the audit of all the Argo projects wrote about 40 fuzzers for us. And those already showed a lot of promise and paid dividends because they discovered a CD in Argo CD, sorry, Argo Events is where they discovered the CD with a fuzzer and that was resolved. And they also used a fuzzer to demonstrate a bug in Argo CD just by, you know, they wrote the code and said we ran this and this shows you how you can attack Argo CD. They also used it to demonstrate that we had resolved the problem afterwards. Yeah, so those are the engineering practices that we put in place. And it has more in process. Thank you, Michael. So some of the... So show of hands, how many in here are contributors to one or more projects? A few hands in the air. So both CNCF and the open source community as a whole actually have a lot of resources and give us a lot of opportunity to get help and funding to improve our security. So one of the organizations that helped us a lot was AUSTIF, the open source technology improvement fund. So they're a security-featured or security-focused fund that helps open source project to improve the security. So they help with funding, they help go through the RFPs and they basically set up and helped us get funding and set up the latest security assessment we did with one of the external auditors. So big thank you to AUSTIF for helping out with that. Another big help was the security tag, the CNCF security tag. There's a number of security-focused volunteers that help CNCF projects improve the security as well. So they have a couple of different programs. There's a security pal or security buddy that can step in and help your CNCF project give you some ideas on areas to look at. They also have a longer process with a security assessment. So we're right now wrapping up our joint assessment with a security tag. So there's another venue for any of the CNCF projects, whether you're graduated or graduated. You have the opportunity to go through and do this joint assessment with the security tag. We have OSSS Fuzz that Michael already mentioned. If you're an open source project, you can get your fuzzers written and they can run in this cloud environment. If you're a vendor with a paid-for product, you can still use the libraries that they provide, but you would have to run the fuzzers yourself. Last but not least is the IBB, the Internet Bug Binding Program. So that's similar to the other organizations that it's a grouping of donors, generous companies that pool money and IBB helps to pool and distribute these monies. Basically what they do is they do payouts to people that find CVEs and security issues in the projects that are signed up for it. So going through all this that Michael and I have talked to now, we feel more than secure in where we are with Argos. We have signed up for this. So anyone who now finds a CVE in Argo can actually submit that to us and we'll help you work through in actually getting some good money from the IBB as a thank you for finding that CVE. So just a few quick words to wrap things up. I know it's been talked about earlier this week that open source projects are not products, and that's a distinction I want to make, but you should treat your project as a product. Like I mentioned earlier, we're really good at writing code, but a lot of open source projects don't really take into account all the other things that go into building a good product and all the processes. I think most of us have seen one or more open source projects that might be a little bit lacking for example, things like that. Marketing in PR is generally non-existent. It's usually even worse than documentation. So there are a lot of these things that go into building a great product that are missing from a lot of open source projects, and that's something that you should really take to heart. And there are a lot of roles and responsibilities that all product teams have that are not simply there in most open source projects. Same goes for processes. Like I said, when we talked about how we handled the CVE, most companies that have been around for a while, they have those ingrained and they've been through this before. A lot of open source projects, they start from scratch. They might not have thought about this. They might not be seen as an important thing to do, right? So that's something to think about as well. And then lastly, like an open source project is used as a product by many vendors. A lot of companies that use open source projects as critical parts of their infrastructure. So security for an open source project should not be treated any differently than any other product or project that is used in your environment. Definitely take advantage of the community resources that are available. You know, we all pay membership fees to CNCF, so take advantage of getting some of that money back. Use the resources that are available. There are a lot of really good resources you can use. Both with CNCF and the security tag, I mentioned AUSTIF, IBB, if you're part of an open source project, reach out to these organizations. They're super helpful and they have a lot of really good resources to help projects increase their security. And then lastly, and I think most importantly, is help us help you, help us make our go better. And there's no really better way of doing that by testing and breaking things. So, you know, if there's a release candidate that comes up, you know, you can open a sandbox, try and break it. And now you can try and hack it and we will actually pay you for that. So, you can sign up and participate in the Bugbinder program, either as a project or as an individual, submit the CVE, and there's actually, I can't remember the exact amount, but there's actually a fair amount of money that if you find a high or critical severity CVE that will get paid out to you. And for those of you that are part of projects, there's something in it for you as a project as well, because part of that bounty actually goes back to the project as well. So, the project actually get a little bit of money to help fund additional security work in the project. And then on Friday, which is tomorrow afternoon at 2 p.m., Argo participates in the contrib fest. We have maintainers there. So, those of you that are now interested in breaking Argo, we will help you get started, and Argo is installed, and you can start learning more about Argo and hacking away at Argo. And I think with that, please remember to submit your feedback after the session. And I think we also have a few minutes left for questions, if you have anything else. We'll also be at the event later tonight. Happy to talk over a beer or two. Just rewrite me if you have any questions. And come get you. Is there going to be an accelerated way as far as, you know, because CVEs can take a while to be established and even be created from that CVE board. Is there a way to accelerate the process when we find vulnerabilities on a platform like Argo? The process in terms of getting the patch out or what would it be? Well, a vulnerability doesn't have to be just an established CVE, right? Sure. We've been there long before someone gives it a number. And so I guess that was my thing. Is there a way to directly interact with Argo so that we can find vulnerabilities and get them to Argo and the developers before we have to wait for a CVE to be established? Sure. So there are a few ways to just at least discuss Argo CD security. I mean, we've got the Argo CD Slack channel where I'm always monitoring it. Our security-interested folks are monitoring it. So if you just have questions and you're like trying to poke holes in Argo and please do, then people will answer there. We also have the bi-weekly SIG security meeting, which is typically open. We only have closed sessions if we need to talk about a CVE. And you can just show up and we're happy to chat about it. And if you think there is a CVE and it's a security vulnerability, then maybe not blast it out on Slack. There is an email address that we've talked about. There is a process for reporting CVEs. If you think it might be a security vulnerability, you can use that email address and we'll go to our SIG security group and we can deal with it the way it should. If it's not the CVE, we'll still have to discuss it with you. But if it is the CVE, then try and keep it to the process. Thanks. We may have time for one more question if anyone has any. If not, thank you very much for your time. Thank you. Thank you. Thank you. Thank you.