 Hi, my name is Dagen. I'm a developer and occasionally a security researcher for the last couple of years I've been working primarily as a kubernetes platform engineer and I really enjoy working in kubernetes in compliant and restricted environments. So I think things like sock to and Fed ramp sort of my happy place My name is will client. I'm also a platform slash security engineer You know, it's been really exciting for me to watch kubernetes grow over the last couple years You know, I kind of started out doing Containers in kind of similar environments is what Dagen described and we're you know orchestrating them with salt stack I don't know if guys remember salt stack back in the day but like, you know just seeing that grow over time has been incredible and I'm also kind of working with Dagen on different projects to help modernize deployments for the different customers that we work together with Today, we're gonna be talking about third-party applications Why we love them some of the dangers that they impose So what do we mean when we talk about a third-party application? And basically ever since subroutines were invented in the 1940s people have been running other people's code so third-party is code that you didn't write and The question is not do I use third-party applications? It is absolute surety that you are going to use a third-party application Yeah, and I think one of the best things about kubernetes in my experience has been you know Going from that world where you know vendors would deliver you a random RPM file or even like a tarball And then you eventually turn it like a VM that they would deliver to you and you'd have to figure out What's in this VM? What does it need to run? How do I need to network together? Like what is it? What is it bringing to my my network here? What about when I install it? How do I meet my security baselines and now with kubernetes? We're delivering applications and getting applications from third-party vendors which have a full description of you What kind of containers does it need? What is the networking requirements? What kind of service account access does it need and it brings all that together in a really beautiful API that I can use You know whether I'm deploying an ingress on alb's and aws or I'm using metal lb on my bare metal servers I've got that one API that I can learn about and introspect and understand what I'm about to ingest and bring into my cluster So In addition to kubernetes itself, what do we add on top of it pretty generally? We're gonna add Observability we're gonna potentially add a get ops agent of some sort if you're not using that yet you should be You're gonna have your database engines, which might be running in the cluster. They might not be that it sort of depends What about your base containers? And if you're running a language like Java, then you're bringing in an open JDK or maybe something direct from Oracle Java base container. What about all of the packages that are running on that base container or no? JS also has a lot of extra packages in that base container if any of those have if any of that's exposed externally like let's say Grafana And Grafana has a vulnerability in the web UI. Your cluster now has a vulnerability if there is a vulnerability inside the and you know cluster network of service level thing then your cluster has a vulnerability that might be used either as a pivot or a Escalation of privilege, sorry space for a second there So we need to be really aware of all those and we need to think about those possibilities as we're deploying our applications Essentially when code is vulnerable your cluster is vulnerable I like this little picture because I picture the third-party application being the ladder that you're walking across, right? You can sit there and be like, well, I didn't write this ladder, but if it if it fails, it's a long way down for you Back in August Will and I gave a presentation at DEF CON and we're gonna give you a short snippet of that presentation here in just a minute and Before I do it's sort of sped up because that's not what this conversation today is about But it's I think it sort of sets the tone of why today's conversation is so important so What we did at DEF CON is we took three vulnerabilities that we had found in Common kubernetes third-party applications specifically kiali fleet which is a get-offs agent from the folks at rancher and Longhorn which is a distributed storage mechanism for kubernetes also from the folks at rancher and we had found various Vulnerabilities over the years. They're all patched. They were all responsibly disclosed well before we started talking about them and Demonstrating how to take advantage of them But what we've what we realized you could do with them is you could actually chain them together to go from Completely outside the cluster to running as root on a cluster node, which is as bad as it can possibly be So I'm gonna play the video But real quick just so that I don't have to race through and try and narrate for anybody who's interested into how the attacks work Starts off with kiali which is exposed through a web UI back in 2019 And I identified that there's actually a hard-coded secret in kiali or was rather a hard-coded secret And that was what was being used to sign the Jots used for authentication to the web UI so all I needed to do was you know mint my own jot using that hard-coded secret and I would be admin on kiali, which was fantastic from there We realized from looking at some logs from fleet that when fleet has an error trying to reach a repository It logs the URI that it passed to a third-party library to access that and that URI Includes embedded in it base 64 encoded the private SSH key that was used to access the git repository Which means that if you can get access to that log you can then access that repository You know as that get ops user and then finally longhorn in longhorn We identified back in was it 20 just last year's 2021 so last December There is a API that's exposed inside the cluster that allows you to allows the longhorn manager to Specify a binary and a string of arguments that it wants the agent to run in the context of that container It's it was a completely unauthenticated API So if you could discover that running in the cluster which was effectively running as a demon set So any node that was listening on port 8500 you could you know say? Hey, I want you to run echo out to a file, and it just doesn't you can also use that to pop a reverse shell which is what we did and Additionally because of the privileges that longhorn needs to do its work It's running as root, and it had several host paths mounted that we were then able to use for a full container escape so Show you what that looked like So again, it was a chain of three C of three different vulnerabilities here. We're starting with Keali Just using jot IOs to mint our own token here and Then that token gets embedded in a cookie value, so I just use developer tools in Safari to Drop it on in there. That's all it took for me to be admin in Keali One of the things that Keali does is it allows you to access logs of running workloads Which is super useful when you're trying to you know diagnose what's going on within your service mesh That includes logs from fleet that may actually may include a SSH private key and The URL to that private or to the repository that that private key can be used as so here What we're going to do is we're going to set up my terminal to use that SSH key to log in to get and See what's in this repository? All right, so it's just a pretty bare Helm chart all I need to do is add a job that's going to give me whatever my attack payload is and Push that back up and then GitOps is going to do GitOps. It's going to deploy my job So I started listening on port one two three four five for my reverse shell I went ahead and started that and then this part is it fast forwarded quite a bit took about a minute and a half for fleet to do its magic, but eventually I was running and Have my reverse shell There we go So now I'm running in my in my payload which has a little bit of tooling in it and I'm just going to go ahead and use end map in this next segment and See what's running and specifically again I was looking for port 8500 and it looks like there was two nodes that were running it and I'm going to use the exploit that we developed called rustler and just say you know Hey go launch this attack payload on the remote or on the on the longhorn pod and Connect back to me on port one two three four four six. I think There we go. And now I'm connected and I'm running his route and then a quick container escape I'm going to run cry control here in a second and show you that I basically have completely taken over the cluster There we go. So now I'm running his route on the the node that's running longhorn, which is pretty devastating So with all that, you know How can we Prevent that I mean this was a complex attack, right? We change together three different vulnerabilities But it really didn't need to be if you look at just Keali if Keali's run not running in read-only mode You can do devastating attacks just on that very first phase Complete traffic management fun things like that The with fleet it's running as get ops, you know There's probably not a lot of restrictions on what you're allowing your get ops service account to install into your cluster So I didn't need to bother pivoting to longhorn I could have just installed, you know my own demon set from the get-go and then finally, you know with longhorn, you know, that was You let your imagination go wild right and if you already are running in a multi-tenanted environment That's something like longhorn's vulnerability is truly devastating because you know anybody that's running a workload in your cluster Is going to be able to take advantage of that Let's see one thing I do want to point out with these three also is These are not poorly written applications, right? So Susa and red hat know how to develop software and they worked really well with us when we reached out to To disclose the vulnerabilities. So this isn't about saying like oh, there's bad software out there It's more about saying almost all software at some point in time has a bug in it whether it's You know from a reputable software Shop like Susa and red hat or it's from some rando and get up Yeah, so basically at while we were putting together together that def con talk And immediately after this these vulnerabilities were disclosed and we're kind of reevaluating things You know day and I start having conversation about you know, how did we end up getting here because you know, obviously Third-party apps are necessary to be running your cluster. You know, I don't who among us wrote Kubernetes, right? Like it is all third-party software. The ones that were okay. Yes, but like one person did not write it You know, we're all we're all using it together So kind of the question that we had is like as an organization How can we be looking at third-party applications as we bring them into our cluster? And how do we make sure that we can make find things that align to our internal development practices? So at the job that we were working with together on this There were all these CICD requirements ass-dass all this stuff fuzzing in place a lot of merge request controls And one of the questions we had was like how can we make sure that we are looking at the whole application life cycle because when it came to third-party applications Our shop and I'm sure many many other places. We're looking at that CVE scan and coming back like no CVE is no problems, right? So we started talking about like how do we how do we start looking at applications to really figure out what needs to be done to? You know put ourselves in good position next time this happens So this is an opinionated approach, but our approach starts with an initial review Basically, the goal is to answer a pretty simple question. Do I want to commit to running this application in my cluster? So here's how we go about solving that or answering that particular question The first thing I think is is important to understand is this is not about a single point in time Pretty much any application that's been around long enough is that and used widely enough is at some point in time I'm gonna have a CVE eventually it'll probably have a devastating CVE So it's not about saying what is the state of this application right now I now make a permanent decision for all of time that this is safe and it can move forward It's more about what is the project look like? How is it managed? How many contributors are there? What what sort of thought are the contributors putting into the dependencies that they rely on or don't rely on see Yes security policies as well, you know, is there a security policy is do they publish security advisories? Do they even publish CVEs if you if an application is not using github security advisories or CVEs Your scanners are not going to find anything because they're pulling from a vulnerability database that has no entries for that application That doesn't mean the application has no vulnerabilities just means they're not telling us about them So it's really important that we look for applications that do or projects that do publish security advisories So anyways, so we're working on these projects together and you know We're one of the examples that came up in our development cycle immediately after the Longhorn incident We were looking at binami sealed secrets at the time really the binami sealed secrets didn't fit You know, it's a fine project It just didn't fit into the security model that we were internally developing where vault was a single source of truth So we started looking around for alternatives to binami sealed secrets and you know We started looking around and coming up with a list of alternatives and we decided at that point We should be applying things to this search so that we are evaluating not just you know The point in time does this have CVEs is it publishing those CVEs? Does the specific container have CVEs, but it said look more holistically at the whole project and you know We work together to assemble this list of potential candidates and then we started analyzing them You know really trolling through their github pit projects to see what they were doing internally and see if that kind of aligned with our values So while we were looking through this we were going through and And you know we came across this OSSF scorecard thing So basically the OSSF scorecard is going to be an application that you can run As part of either your CI CD pipeline or just run it from your desktop here And it basically goes into the the application on github or pipi or npm And then analyze tries to analyze it for a lot of these things that we've been talking about you know Making sure that you know, they're using best practices and disclosing vulnerabilities that they are doing some sort of Vulnerability scans in their code base as automatically as the ICD systems It's checking to make sure that they're they've got good merge hygiene in terms of you know Who are people going through merge requests merge into the master so? From our perspective these are all of the things that we were requiring the developers It within our program to do before they could run their code in our production cluster And it made sense to us to apply those same requirements to third-party applications that we as the platform maintainers We're bringing into the platform So we're gonna Got a little quick. Hopefully people can read this. So this is can everyone read that. I think we got a review earlier Yeah, so basically we have Here's kind of the first half of the scorecard we broke it up So it's looking at whether the binary artifacts are being published. That's pretty self-explanatory They're looking at the settings within github to make sure that branches protection is enabled And the main production branch is not being just randomly committed to and pushed to by outsiders It's looking at the CI CD tests. It's looking for CI best practices, you know The CI best practices is actually the projects themselves will begin embedding this and you're you'll once you know to look for it A lot of projects are including this. I know Cillium has it I know Argo has it on their github page where there's the badge that kind of shows their OSSF scorecard And what we take that to mean is that the application developers themselves are now conscious that we are all looking for these best practices You know, I I've been sitting in this room all week here listening a lot of really early talks about different tools to enable SBOM management But at the end of the day the thing as a maintainer, you know, someone who's not deploying you know sharing my tools outside I'm looking for Development processes that I can trust and I can believe in and that OSSF scorecard represents that to me And it lets me know that the application developers are very much aware of security in the same way that I'm aware of security It's looking at you know contributors So one of the things that we've run into in a couple I think we've got an example of this down the line is you know If an application only has one company that's contributing to it or maintaining it or even one sure yeah one contributor You know, maybe that person gets sick. Maybe they change jobs. Maybe they get uninterested, right? If if there's one company that's maintaining the future of this project, then it's hard to rely on them and not It's a lot hard to rely on them Long-term because it you don't know what their interests are going to be you maybe they start making decisions that are orthogonal to your security plans Seeing that a project has lots of contributors and maintainers helps you understand that you know There are a lot of stakeholders here that will try to keep it in line so you can believe in the vision long-term Let's see. I think those are all yeah, you know It's looking for workflow patterns that are dangerous making sure it's using some sort of like dependent bot type tool that is looking at the dependencies Making sure it's using fuzzing in the project the licenses You know, I'm not gonna go through all this and I think one of the things that's interesting with OSSF scorecard is that Because it's a relatively new tool a lot of applications out there do not have stellar score actually can you go back? That's something yeah at the top there. So this is the external secrets You know, you can see that they have a 6.0 out of 10 That's actually relatively good score because a lot of these stuff You know as teams start adopting the OSS scorecard model and hosting this on their github page They're gonna start and being aware of it and once they're aware of it We can trust that they're gonna be improving those scores over time We thought this is a really cool tool it really answered a lot of the things and it was really cool from our perspective to Talk about all this stuff that we wanted to get done and then look forward and find the tool That does exactly the thing that we're thinking about and having our internal discussions about So once you have that scorecard in hand, that's that's not everything there are a couple of the things that I think are harder to assess automatically You know depreciation policies in place you you never want to end up in a situation where? Something happens upstream and you're suddenly depreciated, but you don't have time to integrate it in with your project Making sure that they're showing like you if the application says it's in beta trust that they're in beta like that Only the developers can make that call Don't trust anyone that I've seen too many things that people are deploying this New jazzy cool beta application into their cluster and then something wildly chains about it and breaks all their stuff And they're like stuck right because they can't upgrade it anymore All right, so at this point I'd venture to guess between Open-source security foundation scorecards and all the added research that will have done on an external secrets That's far more work that has gone into a third-party application than most cluster administrators put in I'd love to be proven wrong on that. I'd love for the industry to move towards something. That's that's much more Diligent about the review process So are we done did we do you know did we high five doing install external secrets operator and call it a day? Well, not quite because even when the application is Going through all of these security reviews and whatnot. There's more to it than just that Right. Yeah, so um, you know once we once we kind of settled on The external secrets operator as the way we wanted to go forward this because we liked it as a project You know at some point you do actually have to do those scans and gather that point in time analysis of What is the exact security of the container and the helm chart that i'm about to bring in? Um, we're not going to talk too much about hardening. There's been a lot of talks today that cover it far more in depth And we want to and you know, I definitely recommend you look at our github or our defcon talk about Specifics of how we were hardening the longhorn containers. We uncovered all these vulnerabilities um, but basically we want to say that you know, this is When you start bringing something in, you know, one of the things that I think I would chalk it up to hubris on our part When we started bringing things in like longhorn, um, we were looking at the project and saying like Oh, we'll just you know harden it one time and bring it into the cluster and then everyone will clap their hands and start using it And what we realize is that you really do need to have that ownership and that plan of not i'm only going to hire Not only am I going to harden the container today But i'm going to have to harden it tomorrow and next week and next month and a year from now You know that that job never ends so finding projects that kind of align With the security models that you internally want to push in place Is going to make that that delta a lot smaller so that you're not applying a lot of big changes to it This was one of the things that we came into with longhorn that we'll talk in a bit like where We were applying a lot of hardening changes on it that Became very fragile long term because it was not meeting the upstream projects Guidance and direction that they were going and as that delta grew So did our work So, you know, we go over whatever the process is for hardening it so that it meets the guidelines that at least Will the guidelines that you're putting forth to your developers and application developers on your cluster Um now we want to talk about what happens when the next version gets released So, you know a week from now new version drops new container What we were doing internally at that point is we were Building taking docker files and having like a from line of the upstream and then layering on our changes You know changing it so it doesn't run as rude removing unnecessary packages one by one Building that process out so it was relatively automated and then using the upstream projects automated tests to know Whether or not our hardening was breaking it because you know We don't want to have to run through the manual tests every time we apply our changes We just want to apply our hardening changes to their container and then Run their automated test suite to confirm that it's still going to work the way the maintainer intended it to One thing that i'd want to add here as well is just how difficult it becomes to harden an application Is i think itself a little bit of a smell about the overall security of the application It Less mature applications on average are far less secure We do have you know some very security conscious open source application The creators out there that start strong, but not everybody does a lot of the time things come in over time There was a great talk here last night from the folks that are maintaining argo cd talking about what happened when they went through that step up from You know uh incubation to a fully released cncf project and the security team that they worked with to understand some of those things So as you're going through this if you're fighting that application a lot It's really truly is something to consider. Is this worth it? Do I do I need to find a different solution? so for in one of the nice things that we found is is The the folks at aqua sec have just been absolutely knocking it out of the park with trivy It has become such an invaluable tool that We use it for almost everything that it'll do Um, we it'll scan your your images and let you know You know not only are there vulnerable packages installed by the os But there's you know We found these static libraries embedded in this go application as well So that's really useful. You know, that's sort of a starting point, right? You know we'll had at some other good tips as far as not running as roots things like that but Really It's it's Yeah, use trivy. There's a there's a qr code there Here's a quick walkthrough of using trivy I like to run it in docker myself just so that I don't Have all these different versions of the tools running around on my local machine It's that fast. I didn't speed that up at all and that's just like regular playback And you can see from this there are some issues, right? It looks like really they need the external secrets needs to update the aws stk library that they're using assuming that there's a newer version available So in that particular case there weren't any vulnerabilities nothing You know that we really needed to worry about but what about when there are vulnerabilities that we need to worry about So here's a use of the uh tamarin base image for running jre applications And um when I scan that it did have a couple of vulnerabilities in them trivy's output included a the fixed version So i'm just really easily able to run apt to and to do a pinned upgrade to the fixed version If it's a If it is a source code like a language dependency though I'm going to tell you you do not want to get into the business of forking every application that you're running every third party application So in that particular case I I really recommend you just make a go no go decision It's either not going to be something that is super You know traumatic and you're eventually it'll get patched upstream or yank the application out of your cluster Don't try to fix the application unless you are actually a maintainer contributor to that application Um, yeah, so a lot of places that I work kind of have this this mentality where we do that scan at the beginning and no No cvs no problems Meanwhile the application itself is surrounded by this like nebulous group of the helm chart that is like You know we run into this all the time And you know we're not here to like bash a lot of companies It's just stuff that we see where you know applications have I run to literal applications that have a flag in the helm chart for Run is non-root and then try to call init containers that run as root and ignore the flag and you're just like come on guys like This is you know, and especially when I'm supporting developers that are you know off their go lang or java developers That really aren't kubernetes centric yet, and I'm trying to provide the offside of this dev app So I'm working with them and I've got to tell them like hey Just because you found this helm chart that deployed the thing you wanted that runs really well locally Doesn't mean that we can use it on our platform And the nice thing is again, you know trivia does everything right trivia will scan a helm chart trivia will scan The a workload running in a kubernetes cluster and it'll give you feedback of what about the the security posture of that application as it's deployed So here's just doing the exact same thing as before walking through what it takes to scan a helm chart I did speed this one up a little bit. It took about three or four seconds Once it started running And you get some very helpful output again. It's it's a starting point. It's you have to ask yourself the question Okay, is it running as root because it needs to or is it running as root and it should not be and in this case It didn't need to be I think we just uh, yeah Um, so as you can see with the trivia output in hand, you know We now have the point in time snapshot of the application and running inside the container That seems to have, you know, some things in it, but we're willing to accept some risk on this. You know, it's absolutely We never it's never at all or nothing, right? You've got to balance the value something brings to your organization versus The potential risk of bringing it into your cluster Now we have the helm chart scan, you know that we can decide again Is that something we'll want to do is it's something we don't um, but you know Now we can see exactly, you know And this just goes back to my thing about why kubernetes is so powerful and how I sell it to all the security folks that are out There like, you know, say a no to everything, right kubernetes gives us the opportunity to run standardized tools like this To understand exactly how an application is going to be deployed into our environment And the way that we've never really had that sort of visibility for like vms and things like that coming into our environment All right, so we've decided that we like External secret operator based on the helm charts on the containers development practices all that We felt really good about it So kind of the next step is forking the upstream chart and dagan's going to talk a little bit about that process and why we do that Yes Um, I do as much as I say don't fork the source code and build it yourself I my experience there are exceptions to this but generally you're going to end up having to fork the upstream helm chart Helm charts tend to be community developed They tend to be designed in a way that just simply running helm install Pretty much guarantees that it's going to work and then there's usually very helpful Configurations that you can add on that will ultimately harden the application But in a secure and hardened environment the places that we work You have to be secure by default You need to have every one of those security constraints enabled from the beginning so that a helm install gives you The most secure option and you actually have to take action in order to In order to make it less secure in a development environment, for example, so What I find is you can use their helm chart as a starting point But you tend to end up having to run it yourself or write the helm chart yourself All right, so once you've got the a hardened image you have a hardened or rewritten helm chart What do you do? Well, that's the easy part All you have to do is run a helm install or go through your get-ups process whatever whatever is your organization's methodology here We again, we definitely Want to make sure that our clusters themselves are hardened and that our practices Meet those or work alongside those hardening requirements So for example, we don't run our kubernetes clusters with the api exposed to the public internet Which means that if a developer is working remotely unless they're coming in through a vpn They're not going to be able to just run, you know k8 supply or or so kubectl apply or helm install And nor should they in most environments. So that means that you need to have a get-ups workflow And then you want to also continue to monitor those applications You should know what the meaningful metrics are and you should have alerting thresholds that lets you know if an application has gone down because remember Availability is a security control right a denial of service is a threat So you want to make sure that you're aware of when that application goes down Um, and then you want to constantly monitor as it sits there deployed and you want to reassess that application on a regular basis You should be running vulnerability scans on a daily basis Daily basis ideally you should have some sort of a security operator running in your cluster That is alerting you when it finds a vulnerability running in your on in an application that that you're running You need to be able to patch critical vulnerabilities typically less than seven days is what is what the compliance frameworks will tell you Um, depending on what it is sometimes you get that phone call of like you got to go now We all remember, you know, uh blog for shell and heart bleed where everything just hit the fan and everybody had a terrible Christmas And then also you really need good intra-organization communication If if you're using something like a base container where you your team is like, oh, hey We'll go ahead and we'll harden this base container for java and then everybody else can use it And you know, you can really scale out our efforts there You need to make sure that those teams have an understanding of hey We've updated this image You now need to redeploy rebuild and redeploy your application that's built on top of that because the way containerization works until they rebuild It's still not patched on their image All right, so here's the questions that we want you to answer at the end of the day Is you know on a regular basis is the project still actively maintained? We were working with kim to get aws credentials into applications And the maintainers had pivoted away from it because they were like, oh the oidc provider for eks works wonders We weren't running eks so we couldn't use it So it was a deprecated project, but there was nothing we can do about it So is that project still being actively maintained if there's any security incidents have happened? How were they addressed? Was there good communication from the maintainers? Was there a patch made readily available? Was there security advisories, you know all of those things Internally, do the current owners still want to own it if they've pivoted on to go-lang from java Then who's going to take over that that jre base image that everybody else is relying on Overall, is it meeting your organization's needs? And you know, if not, what else is in the ecosystem today? So as a final recap because we're right on time here Initial review then harden then deploy monitor and and then ongoing reviews Yeah, and I think that's a really important to make sure that we're also reviewing that upstream as he said We've run into multiple examples where the upstream started diverging away from what we were intending to use the project for At least from a security baseline perspective So you know understanding that upstream process and being very active there Working in our spaces that we've had trouble getting things upstreamed patchwise But that's always the dream is you know, if you you're making something more secure If you can get that upstream and this is the value proposition to your customer group who's saying no, right? Tell them like if I can get this upstream to the labor costs and savings down the line is going to be so much cleaner That is our time We'll be hanging out if anybody has any questions feel free to come up say hi We'll do our best to answer them. We do love feedback as presenters It's it's invaluable to us So there's a barcode in the bottom right corner there if you can scan that and offer any feedback Positive negative it all helps. Thank you both. Thank you all so much for your time. I hope you enjoy the rest of kubecom