 I am Alexander Lawrence. I am here from SysDig today. I am a principal solutions engineer covering out here in the West helping out people do all sorts of fun and dirty things with containers and Kubernetes and all that jazz. So really what we wanted to talk about today was making my slides advance. That was the goal. Let's try it again. Talking about technical difficulties, huh? So let's reset this and come back here. All right. Now we've got the slides to advance. We're good to go there. So basically SysDig's whole point in life is to make this journey to Kubernetes and containers easier. And a lot of that we are focusing on building in the open source community. So obviously we have a commercial product that we sell and ways to make money. But we fundamentally believe that all of the technology around the K8 stack really should be this open source notion. And what I guess what I mean by that is as the product evolves, as Kubernetes evolves and gets changed, new things come into play, it moves at an incredibly fast rate just by the very nature of the fact that it's open source. That's kind of the whole point about integrating quickly in the OSS world and building newer technology stacks. And so if you're a traditional company who's trying to keep up with the community just by yourself, you're not going to, right? Like look at the evolution of PSPs, how quickly those kind of came out. And then nothing happened for a while. They got super popular for like a month and then they died off. And now we have OPA replacing everything, right? And so basically as these things evolve and come into play, it's really difficult to keep up. And so one of the fundamental philosophies we have here is that the best way to be able to secure your instances, to be able to monitor things, do things in a way that makes sense seeing as K8s as open source is to embrace the community itself. And so SysDig has written a number of things ourselves like the quote unquote open source SysDig, Falco, which we gave to the CNCF, right? We've embraced things like Encore and Prometheus, Cloud Custodian, Trivi, a number of the other open source projects that surround the K8s community. And so it's something we firmly believe in and finding ways to help enable customers and help enable the community to embrace those tools and secure their stuff in a way that just makes sense, right? And so along with that, you know, how do you take all of the tools? How do you take all the things in the world and make them secure your environment in a way that makes sense? Thankfully we have, you know, councils and groups of people like NIST or CIS or other organizations who help kind of define what these standards look like and help drive towards what you should do to run this stuff in production in a way that makes sense. And so like in the cybersecurity framework, they talk a lot about kind of desired outcomes, try to make sure that the language they're doing this in is understandable by everybody, right? They're not trying to use vernacular that's just confusing or difficult to get through. They're looking at how can they apply this to risk management? How do you define what that stuff looks like in your cycle? And then kind of looking at this across the entire span of the security lifetime. So looking at things from both prevention as well as how do you react to events when they do occur. One of the things we talk about a lot in the community and at SISTIG is having security events or having things come out is great to know when they happen, but if you don't know the context of where this stuff happens, it's not actionable, especially in the world of Kubernetes, right? When you have containers spinning up, ever run to the sun, if you don't know where stuff comes from, it's basically useless. You can't know where you need to react to things, where you need to go find stuff. And so having that level of detail integrated into your security solution is something that's huge and it's something that they often point out in these frameworks. As well, NIST has the 800-190 Application Container Security Guide. It's a fantastic guide to read through if you're looking at going into production with stuff like this. It'll help you kind of look at various risks and countermeasures you can take, things you can identify. Effectively, it's just a really strong resource to be able to kind of decide how your posture looks, how your responses should look, what you need to do to kind of take into consideration the various controls you're going through. A lot of companies, SISTIG Included, do try to help make these things be a little bit easier so that when you get into the bread and butter of it, you can have predefined canned rules about certain content, things you need to be going and looking for. We have the Cloud Native Security Hub that SISTIG helps run. It's a community-catered organization that has a bunch of different rules for runtime security and a lot of those harken back to things like 800-190 to HIPAA rules to other types of compliance structures. So definitely something to check out and consider if you aren't doing it today. Really great frameworks to help inform where you should take and drive your instances and your configuration. Compliance obviously is huge, right? All of these standards are typically built in a world where they are kind of just informational guides that help explain what you can do and why things are important, but they basically don't really talk about how to implement. They just say, well, you should do runtime security. You should do EDR type work. You should be able to do inspection of things happening on your file systems. They don't say, well, you do it by instrumenting this or you do it by implementing XYZ thing. That's really where the other kind of big compliance worlds come into play. So if you think about NIST 853, that is a guideline that takes about how to run stuff in a highly secure way, and that's what the FedRAMP spec is based on top of, right? So 853 tells you these are the controls that are important. FedRAMP tells you this is how you should implement these important controls. So it gets extremely confusing and extremely cumbersome, but ultimately it's all about making sure people are protected and secure here and the way that they're running their infrastructure. And all of these evolutions are really about how to do it in the containerized world, because it's just fundamentally different than traditional vms or big iron or things like that. So why would you make all of this happen? Why would you introduce the complexity of containers and Kubernetes and all that junk when it all worked fine on vms and all worked fine on big iron, right? Really, it's kind of these points here. It's all immutable, right? You don't have to worry about stuff being persistent in some ways if you're doing it right, at least. There's quite the debate on that topic today out in the community, but realistically, if you make stuff immutable, you've built applications in a way that they can go down and come back up in a very dynamic sense, and that's a pretty attractive offering for a lot of folks who want to scale or push stuff out in different ways. They're purpose-built, self-contained. I cannot tell you how many times in my career that the wrong Python library is installed or the wrong dependency for like a libc is there or whatnot, and I can't make my thing work because this little component broke. And so containers let you control your world a little bit better. It's really not a new idea, right? We have BSD jails, we have C groups. We have all these things to implement it, but this world of containerization has given us a standard set of APIs and a standard set of way of doing things that just has it kind of more uniform across the globe. I often tell people that there really isn't anything that's ridiculously fancy about Kubernetes, other than the fact that it's kind of given us an operating system for the cloud. So I can build an application one way and deploy that same application anywhere I want it to go as long as it's on Kubernetes, it's going to work. It's a hugely, hugely powerful thing when you start thinking about people who are across different clouds or in different environments, both on-prem in the cloud or something like that. It just gives you specs to work off of. Then also being able to kind of limit and isolate resources is a huge deal. So if you think about things like Styra or OPA and the regular language, and then you think about network security policies in Kubernetes, it gives you a lot of different tools in your toolbox to eliminate connections you don't want or access to things that you don't want. And so it just makes that be a lot easier to handle. It gives you a lot more flexibility in how you approach that. Obviously that flexibility comes with complexity on top of it, but it does let you have more options. And then realistically your developers love it, right? The single biggest thing I see in a lot of my customers today is that it enables our developers to do their job. They're not really being hampered by all the other things that we're used to doing, right? So the security professional isn't slowing them down. The infrastructure team isn't slowing them down. They don't have to request new servers or resources. They can make their container. It has their application on it. They can spin it up wherever they want and they can go about their job. It's a wonderful thing, but you have to find ways to do that in a way that enables the developer and doesn't block them. So that's a huge aspect of things that we have to consider as well. And this whole cloud-native ecosystem, this whole open-source ecosystem has really adopted good ways to do that. And I didn't build out the slide, but these are some of the projects that are out there that are helping enable this. So I kind of mentioned a few of them like OPA or we have things like Ancor or Falco, other tools that integrate directly into that SDLC to make sure that the devs in the world is kind of working in a way that makes sense and you're not trying to block things off so much. So we've got an idea of frameworks. We've kind of talked here a little bit of the different compliance guides and things like that, but kind of getting to the point, what makes up all those different components? What makes up all of those things that you're trying to accomplish and kind of work your way through? And really it's kind of these various areas right here. So we have things like CSPM for the cloud, ensuring cloud security. We've got image scanning. There's the different compliance frameworks, runtime security, forensics, benchmarking, right? There's all these different things. And realistically, to run your environment in a secure way, you have to have solutions across that entire stack. And what it comes down to is that it's hard because there's lots and lots of layers of all this different security stuff you have to take into consideration. So how can you be one small, one agile team trying to do all these different things? It's difficult, right? And a lot of folks end up focusing on just one or two of them and not looking at them holistically. So if we think about CSPM, this is where you have tools like Cloud Custodian that come into play to help you kind of work through that world, kind of figure out what your base configuration should be in your cloud config that you're not exposing things you shouldn't expose, like unencrypted S3 buckets and stuff like that. Most people, when they start thinking about container security, they're just going to pop circles in all of them. They do mostly image scanning. That's kind of the place most people go because that's where the vulnerabilities exist, right? So if I plug all my vulnerabilities, then I don't have to worry about things coming into the environment. Everything else is happy and taken care of. Nothing else bad can happen. The reality is that that's a huge part of it and it's important to remediate your software. It's important to find the vulnerabilities. It's important to plug those holes. But the other components are equally as important and they pretty much get neglected. So I guess what I mean by that is if you think of your environment, if you're going to go run stuff today and you're building out these new applications, most people will say, oh, no, we've got this new thing. I'm using something like Artifactory. You know, I scanned it with X-ray or I used Clare to make sure there's no vulnerabilities in the thing. So I can push that into production. They just let it run. But things show up over time, right? Images age, applications age, packages age, new vulnerabilities appear, and you have to remediate those in some way, shape, or form. And if you don't, you really need to know about what's going on in that runtime context. So if you think about runtime security, this is like where Falco comes into play. So if I have a vulnerability that's exposed, or a overflow attack against that container, they get access to a shell. They can start doing things. If I'm not monitoring that runtime aspect of things, I have no idea if it ever went off or happened, right? And so there's multiple layers here you have to go through to secure all of this stuff. And it's just difficult to figure out what you're going to do, right? And so what tools are out there? What things should you be using? How can you use them? And it's kind of what I'll get into next here. So effectively, when you think about the environment and the tooling that goes along with that, you have kind of these disparate areas that surround the infrastructure. So things like the CI CD pipeline, you've got stuff around cloud security, things like that, to take care of networking like service meshes, or Istio, consoles, stuff like that. Runtime security, identity and access management, there's just a bunch of different things you have to deploy and deal with and take care of. And how do you make it easy? How do you make it simple? There are so many things you can do. And so I kind of made up a little quick list of a few different components to consider. What should I be looking at? What should I pick off first as I start deploying this stuff? Excuse me. And really the first spot, especially in the current modern world is FIPS compliant SSL libraries. Whenever I talk to anybody who's doing compliance or remediation or they're trying to deal with deploying stuff in a compliant sort of way, I don't want to ask I get. What distros, what Kubernetes releases, what library should I be using that can be cryptographically secured? And the reason why this comes up so much is this is really part of the FedRAMP and 853 specifications. So it's an area that a lot of people end up struggling with and have a hard time figuring out how to deploy the libraries correctly. Because if you think about it from a context of containers, even if your Kubernetes cluster is secured, it doesn't mean that the container itself is using the same library that the host is using. You can self-contain the SSL library inside your container as well. And so this becomes a very interesting thing to deal with and somewhat difficult to handle. Not all distros make it easy. You've got to have different ways to handle that. So I've thrown up here how Red Hat does it specifically. But there's certainly ways to go about it and things you can do to make that be a little bit easier. If you think about RBAC trying to get access to... to dictate who can get access to what, who can do what, there's various ways via the Kubernetes API that's native. I think generally the community agrees that's not quite enough and so that's why we have things like OPA coming out. So you can dictate what access a different container has, what entitlement it can get access to, what VNETs or what NICs or IPs it can access. There's certainly a lot of ways to handle who can run what against what in Kubernetes and OPA really has done a good job extending that out to make that RBAC world a little bit easier. There's other projects out there as well like RBAC Manager have never actually used it but supposedly it can do some of the same stuff. I think you'll find stuff like OPA kind of taking over that world more so. It's got a fair amount of flexibility even though it has a little bit of complexity to it as well. Let's see here. Rancher and OpenShift also have some commercial offerings in that regard. We each have some solutions out of the box that try to help you do things like cryptographic libraries and RBAC a little bit easier but kind of your monotone vary based on your configuration and how you're trying to run that. The other area that is kind of a spot to consider is hardening your nodes in your run time. So there's a wonderful tool out there I think by Aqua called Qbench that allows you to do a benchmark against your nodes in your configuration to ensure that you're meeting kind of those base CIS guidelines so that you've encrypted the API or you've encrypted the communications back and forth. You've locked down who can access what portions of the file system like kind of all those basic things you should do but don't happen out of the box, right? When you deploy a fresh Kubernetes instance it doesn't have any of that configuration in place and so I'd highly recommend you check out that project. It's again open source a lot of other commercial tools also rapid and so you can leverage them as well to be able to figure out what your infrastructure should look like. Really easy way to do evaluations and dictate what's going on across those nodes for hardening and things like that. And then kind of finally on the last component to consider it's not talked about a ton but using hardened images, right? Anybody can pull anything down from GitHub or from different repositories that are out there publicly making sure that your distro you're using on those images can help a lot so if you use things like distrilis images you can limit your security profile or your risk profile against those different images and so just kind of a base way that if you're building your own stuff start with an image that is already secured or at least has a minimal profile to it helps eliminate vulnerabilities and other stuff that could show up down the way. So I had mentioned there's kind of a few different easy buttons for doing Kubernetes kind of in a secure way and so most of these I think all of these are actually paid solutions but Rancher has a new one out FIPS Cryptographic Libraries by defaults has a really good RBAC situation it has built-in benchmarking for Kubernetes you can basically manage your kits clusters you've deployed with Rancher or other ones out there as well kind of gives you that potential reporting plane OpenShift also does a pretty good job at this again we mentioned earlier in the Red Hat configuration you can just do FIPS is true in the configuration and it enables FIPS Cryptographic Libraries by defaults has a pretty mature operator ecosystem so if you're looking for other other security tools like QBench or the Cystics of the World or Aqua or Twistlock all those different solutions you can install those natively inside the interface to help kind of build that ecosystem out pretty quickly also has an encrypted data plane by default which is pretty nice right that's one of the things that's in that Kubernetes benchmark you don't have to deal with those yourself with that solution then everybody's favorite EKS it also has a gubcloud solution these days and they're going for a FedRamp Moderate certification across that and so if it's FedRamp Moderate it has FIPS Cryptographic Libraries it has all the APIs your teams love and already work with and kind of handles all of that base stuff that you have to do for the FedRamp based environments so it's also a very nice solution for doing that I know they have aspirations to work further in the FedRamp chain but it's a pretty quick and easy way to spin up a cluster that's going to meet a lot of base compliance needs so talking about tooling this is kind of a catch all slide to kind of hit different areas so like at SysDig when we roll out our solution we wrap a lot of cloud custodian for a lot of the benchmarking just let you kind of do that base configuration in the cloud to make sure that stuff isn't going wrong when you start building your stuff if you think about kind of more recent attacks like the solar winds attack or some of the other ones that have been more recent they really focused on a thing called cloud lateral movement and so somebody went into a region inside their infrastructure that wasn't in use were able to get access to an account that also existed somewhere else moved crosswise to a different region and then did malicious activity right and so if you set up things like cloud custodian you can actually start seeing other accounts that are enabled inside your AWS that maybe you're not touching today that could be a potential vector for attack right so just kind of doing some of the more simplistic stuff and that is super useful in the CICD pipeline like when you're building your application development obviously this is an area that gets the most attention I would say out of anything on the list but there's tools like Clair or Trivy Jenkins things like that to be able to ensure you're at least doing the bare minimum to scan your images at build time so that those don't have a bunch of vulnerabilities or misconfigurations as they come in the networking world you're really talking about like Istio or Envoy to handle where and when services and connects and communicates and then in the threat detection world that's where Falco comes in another CNCF project and then there's a a wonderful little tool called Falco Sidekick little flying hamster guy I think it's technically a gerbil and it is a great way to be able to detect stuff in real time and push data out when things are going on you don't want like shells being opened up or UDP traffic that's not on port 53 kind of the low hanging fruit of the world and customization of stuff that's specific to your environments so kind of expanding on those a little bit in cloud custodians all about posture management and vulnerability management across your cloud built on open standards easily to can reconfigure add additional rule sets able to do remediation as well with that so if someone like let's say you have a policy that requires all S3 buckets to be encrypted and you probably should have that policy it can go in and say well the bucket that was unencrypted or hasn't encrypted should be encrypted and Otto does that for you should warn you that when you do remediation be extremely careful because remediation can sometimes be the bane of your existence as an admin you might kill off your configuration or stuff you didn't want to do but for some of the bare minimum things that's a very very useful thing to have again getting into posture of the image analysis itself Jenkins is out there and it's still ridiculously popular for doing your application lifecycle stuff and then plugging it into other scanning options is always very effective so there's again I mentioned Claire earlier there's Trivi there's Encore Engine, there's a number of solutions out there in image analysis they're all a little bit different than each other and the way they handle it and the way they approach it but the end result is that you're looking at your configuration of your containers you're looking at the packages in those things and you're actually securing that lifecycle so that if there is a problem you can report back to your developers they can go fix it before it ever gets to production and you're kind of working in that workflow that they're used to and like they'll try to push it there's a scan on the registry and say nope that one's bad go back and fix it if they can get that feedback live while they're trying to work on it I think everybody appreciates that a little bit more and then so then there becomes this gap in visibility we've talked a lot about security stuff and that's critically important but a lot of the things I tend to to feel that you can't really secure the stuff that you're not monitoring you're not tracking you don't have a security issue it shows up in a visibility context your memory's changed you're getting different processes running so it's using a different footprint of CPU cycles or there's things being written to from files you're not used to seeing a lot of that activity shows up in visibility it doesn't necessarily show up in security what things should you be looking for what things can go off there's a bunch of different things out there that are going to be showing up and going away so it's really hard to tell when something comes online tries to read something you weren't expecting then goes away right like how do you deal with visibility in a world that's extremely dynamic the pods are short lived you can't just say nothing can run that I don't know what it is because maybe there's services coming online that are important to the dev or they were expecting and so it's a really hard thing to kind of implement this notion of zero trust when your visibility is kind of all over the place we'll talk about it in a little bit with that but along with the notion of how the networking works it's really coming down to a couple of different tools that are out there in the community again Envoy, Istio and then this new world of network segmentation that's come out in Kubernetes is extremely useful for this so basically you can say I can't tell you what IP can talk to what IP but I can tell you from a service perspective or an application perspective what deployment can talk to what deployment or what service can talk to what service and so this whole Kubernetes network segmentation then Kubernetes network security policies lets you say well I've got this ephemeral world I can't exactly tell you who's coming up where or what's running in what location but I can tell you that this communication within this box that's okay right it doesn't matter what it is as long as it's only communicating inside my little world I don't mind and so this is kind of giving you this world of creating policy that's somewhat dynamic that I can say I'm okay with that type of communication I'm just not okay if it talks to someone outside of the active player right and so implementing network segmentation has been a really cool thing it's somewhat again difficult to do because it's a somewhat abstract concept there's more tools coming out now that do what you see here in this picture where you can see the secret squirrel application it kind of maps out where the communication is going to and from and then you can use that to generate policies this specifically is a assisted interface but there's other things out there so it lets you kind of create this configuration that allows you that more dynamic nature that your devs are expecting that your infrastructure is depending on so with that like how do you handle the threat detection again this is kind of where we get into the notion of visibility as well as kind of runtime security so we'll talk a little bit about Prometheus, Falco and Cystig so with Prometheus I think everybody knows this project right it's kind of the bread and butter of monitoring your Kubernetes state this day and age I was using Prometheus way back when I was at Pacific Lutheran University doing a bunch of monitoring there for our Linux instances and it's matured a lot in the last I don't know what 8, 10 years something like that and it it can do a lot of stuff for you in this kind of cloud native world it's wonderful again because it's highly extensible highly flexible and it's terrible because it's highly extensible, highly flexible so I encourage you all to be using it if you're not things like Grafana help you visualize things, there's different data stories you can store stuff in probably the single hardest thing about visibility of Prometheus is that handling the data scale that happens right when you've got one application in one place it's not too bad when you've got 2, 3, 50, 100 different applications across hundreds or thousands of nodes that's a difficult problem to solve right so we're seeing things like Thanos to be able to handle how I display this stuff at scale and handle these things across different spots so it's an extremely powerful tool but think about when you're looking at observability how do I store the data that's relevant a lot of my customers were having issues with this where they were coming on board coming online, their application got really popular it pushed out to the masses and then they suddenly realized their monitoring tool that used to store a week of data stores 12 hours of data and when you only store 12 hours that's great during business hours and so finding ways to scale it and do it well I think is critically important because again you can't really secure what you can't see and so if you're not doing this in a way that's effective it's basically I want to say worthless but it doesn't give you the context you need to actually take action on events or be able to figure out what happened after the fact so if you had a security event on Friday you realized on Tuesday the next week that this was something and I need to go get process data but your PROM has already shifted that data out of the storage store so how do you get that information out of there and so having these things happen in native ways that scales is particularly important Falco is kind of the the other side of the coin right where Prometheus is looking at metrics and data about what the health of an application is and what things were accessed or kind of logging just all the different metrics of the world Falco is looking at the data from a security perspective and so really it's about what process did what when again another great project I encourage folks to go look into effectively anything on Linux it can be able to detect rogue activity and you can write whatever type of rule or syntax you want for there there's hundreds of them out of the box but ultimately the goal here is to look for anomalous activity across your container or Linux infrastructure so if somebody opened up a shell I didn't want if somebody tried to read a protected file like maybe I've got a data location for like a PCI compliant data Falco can detect when there's read actions instead of write actions so you can get that delineation over what happened to something when it shouldn't have and alert on those types of conditions I should just move to the next slide talks about all those conditions here so basically looking for pillage escalation all that type of stuff and then the big point of this is Falco does a good job at detecting everything but how do I get the data out right how do I make it into something I can do something with and that's really where Falco's side get comes into play so Falco outputs the data sidekick takes it in and then you can spawn up all sorts of stuff from sidekick so that could be a lambda that takes action on something it could be an email that goes out something goes into a web UI a slack message whatever your workflow is so that you can remediate when something actually happens right about I don't know 20 minutes ago I had said that when you get alerts it's great but if they don't have metadata or actionable data that goes along with that alert it's meaningless and so being able to output that stuff in a way that works into your workflow to make it actually have something relevant happen is utterly critical to that so the last thing I'll talk about is the open source variant of Cystig this is really the other side of the coin of Falco so it's the metric system that Cystig has and it works in conjunction with things like Prometheus it doesn't replace it but it allows you to inspect fun things happening on the kernel fun things happening on the actual nodes themselves the first time I ever got to expose Cystig it was actually via the OSS Curses UI and so you can go into a terminal pop-up in Cystig and start doing analysis across all the different system calls coming through network ports being opened up basically instrumentation for the entire node and does a lot of really cool fun stuff one of the areas that's pretty fun with this is that we can actually have it call a command to dump system calls so if you think about Wireshark and generating SCAP files Cystig does the same thing but it actually captures system calls and generates SCAP files and so it's actually the same Wireshark filter syntax that Cystig uses to be able to sort through all that stuff and even nowadays it uses EDPF for implementation so you can actually go and grab stuff without having to use a traditional kernel module so lots of other interesting ways to get data with this core of the enterprise products and other stuff that we do but it's the open-source variant is certainly fun especially on just various nodes to get data and see what's going on on them so the Curses UI is super fun I don't know if anybody else likes those or not let's see here so at the end of the day there's a lot of stuff that goes into this like when you're asking the question are we safe are we compliant the answer is probably not we're maybe not hitting all the marks I kind of look at this world as when an auditor comes to me it's really about stacking up all the things that I'm doing to make sure that I'm meeting audit or compliance needs it's not saying that something's never going to happen but these are the tools I put in place to make sure when something does happen I know what it was, I know where things occurred I know who did it I can actionally come back and say these things are the things that happened these are the resources that were touched like if I had a breach in one of my environments to a DB or something or ran an SQL query if I have the right tooling in place I can go back and say well yeah someone broke into the infrastructure they used this account it was a modified thing across my AIM policy or whatever it might be and they ran this SQL query but I can tell you the entire syntax of the query and that returned a null the risk is pretty low because they didn't get any data out as long as I have a way to be able to recreate those stories it puts me in a good position to be able to defend myself when the bad actor does come in and tries to do something so again ultimately I hope we're secure we're probably not but we're doing the best we can and here's the functions and tools we put in place so really from there kind of the call to action is that the security in Kubernetes is extremely hard it's very complex and it's driven from the fact that things change really really quickly and we need to find ways to kind of help accommodate that and keep up the pace of keep up with the pace of change the stance that I've had, the stance that Cystic has is doing that with an open source mindset is probably the right way to do it for Kubernetes it's about the only way we're able to keep up with what changes and there's a ton of tools out there that do it obviously the Cystic solution integrates a lot of these things together but you can use kind of whatever you want to handle it and would love feedback from what you guys have used yourselves or other things that you see out there in the world so we don't have time today for me up at Alex at Cystic.com would love to keep the conversation going and if there's any questions I'd be happy to do what I can to answer them thanks alright well I'll just let you all go then thank you very much