 So, thank you so much for coming. I really appreciate it. I know the end of the first day is pretty tough. Probably a lot of folks in this room are still jet lagged, so I really, really appreciate you making it to be here. So, today we're going to use the canals and bridges of Amsterdam to map out Kubernetes networks and use it to form a security strategy. And I have a clicker. So, who am I? You just heard a little bit about it, but I'm Kailin. I've been at Shopify for almost five years. I contribute to SIG-CLI and SIG Security. I'm also a CNCF ambassador. This is my first year. And when I'm not doing this, I used to be a farmer. I'm still a squash player. I sew. I make a lot of my clothing, not today because I wanted to look nice, but sometimes. And like a whole bunch of people, thanks to the pandemic, I am a reluctant runner. Oh, gosh. One second. This is the, my speaker note for this is easy peasy. Okay. Sweet. Oh, gosh. This is my really great diagram to show why this talk is super awesome. Obviously, you can see that Kubernetes was meant to be, their KubeCon was meant to be in Amsterdam. I chose this analogy because I think it's really fun. I've always enjoyed talks that focus on the host cities. I think it's a great way to learn about the city because often this conference takes all the time and you have all the fun parties in the evening, so you don't really get to explore and enjoy the city. So hopefully this talk, you could leave knowing some things about Amsterdam that maybe you wouldn't have known otherwise. And then the technical topic is as Kubernetes security professionals or new to Kubernetes folks or people who are just interested in learning about Kubernetes, it's very likely that many times in our careers we're going to be presented with a new network and we're going to have to get to know this network and get up to speed with it so that we can see if there are any security vulnerabilities or concerns. I think that what we're going to do in this talk, which is going back to basics, is really useful and it's really important so that we can get knowledge about a system and often we're so excited to learn the really deep technical stuff that we skip out on some of those basics. So let's walk through the process and hopefully each of you will learn something along the way. Sweet, okay, so here's our agenda. We're going to start off by working that metaphor pretty close to death and then we're going to get to know Kubernetes network. So I'm going to go over, I was not brave enough to do a live demo but I'm going to do some recorded demos of just what I like to do when I'm presented with some new Kubernetes infrastructure I want to get familiar with. Next, we're going to talk about threat modeling. We'll go over a popular framework and look into how it applies to Kubernetes. Finally, we're going to look at how we can use the things we've done so far to form a security strategy and we'll finish by highlighting some of the key takeaways for that process. All right. And before we get into the agenda, I just wanted to start by going over some Kubernetes basics to make sure that throughout the talk we're all on the same page in terms of language and terminology. So I'm sure that we all are at least a little familiar with the main Kubernetes components and some of us have a bit more experience with certain areas over others. But when approaching any new system, it's really important that we get a holistic understanding of the entire system so that we can understand how the different parts interact. The very basics are the cluster, the node and the pod. This is what makes up Kubernetes when you deploy a cluster. This is what you get. The cluster is the wrapper around the nodes which are just our machines and those machines host our pods. When we're looking at networking and security, we really want to care about the control plane. This is basically the brain of our Kubernetes infrastructure. We all know Kubernetes is declarative so we declare what we want our infrastructure to look like and the control plane is what's in charge of making it look that way and keeping it in that desired state. And because this talk is focused on network security, what we really, really care about is our CNI plug-in or container network interface plug-in. I hate acronyms. It's one of the most important considerations and there are a whole bunch of different offerings. So throughout this talk, I'm going to mention Cilium because I'm quite familiar and a big fan of Cilium but there are a whole bunch of different options and I'm sure the folks would love to talk to you about why a different CNI plug-in might fit for your infrastructure. Lastly, we want to know what the edge of our network looks like. We want to know what traffic we expect, what traffic we don't want to expect or don't want to have. And hopefully from a public internet standpoint, this is a pretty small surface area. For this talk, just again to keep it basic and foundational, we're going to focus on cluster communication so we're going to stay within the cluster. But there are a whole bunch of other principles that come into play when we're doing cluster-to-cluster or multi-VPC networking which maybe will be a future talk. We'll see. And then I wanted to highlight the Kubernetes network model. These are kind of principles of Kubernetes. So each pod in the cluster gets its own cluster-wide IP address so we don't need to manually link pods. Basically what this does is makes the Kubernetes network model in the same shape as previous network models using VMs that most of us who came into the industry have experience with. Secondly, agents on the node, so this can be like a daemon set or the kubelet, they can communicate with all pods on that node. And now it's really important to note at the bottom I have this comment barring intentional segmentation via network policies and it kind of sounds like that's a bad thing. So pardon me, but Kubernetes is not secure by default, which is the intention. Kubernetes sets you up and then you need to use the specific knowledge about your system to apply the security principles that make sense. So you should always have your network segmented, but initially these rules should be true. So here's just a quick overview of where we're going to go with this talk today. We'll come back to this diagram at the end, but this is the rough steps of evaluating a Kubernetes network. So let's get into it. The metaphor that inspired the talk canals and bridges. If you look up interesting facts about Amsterdam, you'll quickly find yourself reading about a city that stands on hundreds of thousands of poles and has a rich cycling culture and is connected by canals and bridges. Right away, this made me think of a giant Kubernetes network, particularly the super giant one that I deal with day to day. But I think it can, as you'll see in this demonstration, work for any size of network that you're dealing with. The metaphor is not always a perfect fit, but I think it does pretty well and I hope that you enjoy it. So I did, it did work. Don't forget that last slide. Here's some fun facts about Amsterdam. Lots of kilometers of canals, lots of individual canals. The city ends up being many small islands and there's over 800,000 bicycles in the city. I heard a lot of those are in the canals. Amsterdam is really an interconnected group of small islands. These islands are the result of the canal system that was built in the 15th and 17th centuries, that has been used for trade commerce and even defense. The city has grown over the years and so have the canal systems. These days, they're primarily used for the movement of people in goods and they also draw a whole bunch of tourists, maybe not all of you, but definitely me to the city. And you can see how this can make one think of a complex network. So this is not exactly the network I work with, but we do have hundreds of clusters and you can imagine having these spread across multiple regions, maybe for each region you have a VPC and there are a whole bunch of services that exist on this infrastructure. The pathways that are used to move information and allow communication also open us up to the risk of infiltration. So we need to design them mindfully and come up with ways to use them to protect our services and their data. So these are two canals that I want to highlight. The single, I super apologize for any pronunciation issues. Please feel free to come up and tell me at the end how I should be saying them. And single grat canals. Single comes from the Latin word singulum meaning belt and it's related to the Dutch word omisingelen, meaning to surround and you can see that's exactly what both of these canals do and did at various times in Amsterdam's history. The single, which is the smaller one, was built in 1428 and it was the city's perimeter defense until the city expanded beyond the canal. If the city was being attacked the water level in the single could be raised which would stop troops from being able to move across the canal and then within the canal there was a high wall with various towers that could be patrolled by soldiers to stop and monitor enemy forces trying to enter the city. The single grat was dug at the end of the 16th century to accommodate the expanding city. This canal's primary purpose was trading commerce, not defense, but fortunately they were able to use it as defense in World War II to stop German forces from reaching the center of the city. Once again they used the canal to limit entry points and then guard those entry points. So to me this made me think immediately of a firewall as well as any kind of observability mechanisms you have at the edge of your network. Well primarily the canals are a transport mechanism. They can also be a defensive barrier in particular the ones that surround our sensitive workloads. So there are many ways that we protect our Kubernetes networks from attacks from the outside but for this comparison as I said I want to focus on firewall rules and then observability. Firewall rules can be applied at the network level through your cloud provider using IP tables, NF tables, or even through your CNI plugin. We can create firewall rules that only allow traffic from a list of trusted IPs ensuring that we're only letting in and out the traffic that we want or letting traffic out two places that we want. So you can imagine the firewall as the canal and your network admins or your cluster admins or the generals or those towers that are restricting the traffic to a minimum number of entry points and ensuring that all approved traffic is allowed and any unknown traffic is stopped and also investigated further. The soldiers then would be representative of your observability tooling. They're telling us who attempted to enter and how they were blocked by our perimeter defense or allowed through and there are many tools in the industry like this that we can use like Hubbell, Grafana, Datadog, a whole bunch more and this kind of observability is really useful particularly when forming a network strategy or understanding how your network works. If you're new to a network or company and you're like well okay Gailen how am I going to write a list or get an idea of which traffic to allow. There are tools one that I really like is Inspector Gadget, Inspector with a K. It's a EBPF tool, everybody's favorite packet filter and we're going to go through it later but it'll allow you to get a snapshot of traffic and you can form policies or rules based on that traffic. Another great perimeter defense tool is Siliam's cluster wide network policy or Kubernetes admin network policy. Both of these allow you to make a high level or cluster level network policy ideally as close to default deny as possible and then your service owners could add additively create policies allowing the specific traffic that they need. So the Gretchen Gordel also known as the Canal District contains the concentric rings of the city's main canals which you can see here. We already talked about the first one the single which is there. The next one is the Herengret, the next one is the Kaiser Gret and the last one is the Preisen Gret. So you can see that those canals create really nice rings around the city center and in the city center would be the most vulnerable places like Amsterdam's Royal Palace. While only the single was constructed with the explicit intention to defend the city the other canals became extra layers of protection. So within the innermost canal we have these sensitive buildings and during times of war defenses could be set up at the various entry points of each canal meaning if someone got through one they would also have to get through the next and the next and hopefully not make it to the center. Instead of nested canals we use defense in depth and zero trust networks or networking. Both of these concepts suggest that we should not rely on a single layer of security and we cannot depend on only trusted entities making it past our perimeter defense. So even if our network is protected by a really strong firewall rules and we have a proxy managing all incoming and upcoming traffic we still need to ensure that there are additional security measures in place within. Some examples of this might be access management through IAM and RBAC having a really strong culture of just-in-time access. I know that this isn't doable for all companies but we've been able to do this at Shopify and it's a security developer's dream. I know Permit.io is here and they have a just-in-time access. I highly recommend investing some time there if you haven't. Network policies they're my favorite really great baked-in Kubernetes feature well not baked in but observability so we've talked a whole bunch about that. It's really important that you are able to detect anomalous activity if something does breach your perimeter defense you want to know about it as soon as possible. And then we want to follow our general security practices as well which is data encryption, secret management, keeping our software up to date. Much like the canals of Amsterdam segment the city and create managed routes for traffic we must ensure that we're properly isolating workloads in our multi-tenant infrastructure and providing each segment with its own protection. The other value of having layers of security it means you're protecting from a malicious insider or user error sorry. So bridges I've talked a whole bunch about canals let's talk a little bit about bridges. The many bridges in Amsterdam are generally used to allow people to cross the canals and reach different areas of the city but another thing that they do is allow boats to pass through and they allow monitoring and control of which boats can access which areas at which time and they can also check licensing to make sure the correct boats are doing the correct thing. In times of war they allowed the city to concentrate forces at the access points of the various areas of the city and made it easy to defend against intruders. For me this made me think of always access control but also load balancing so we can use access control mechanisms to ensure that only the correct entities can access specific resources and even if we have just in time access at the right times. Load balancers ensure that the traffic and load is distributed across the back end and reduce across the back end resources in the cluster. So with those bridges if the canal was chock full of boats they could just close the bridge and wait until the traffic spread out so I think that is a good mapping and a very fun meme. I always use too many memes so I apologize I don't have very serious slides. Okay Kaelin we get it the metaphor is great I did a super great job very interesting but probably you would like to get to some more clear Kubernetes content and I will but there's so many good ideas that I just want to go through a few more before moving on. So next I touched on this a little bit but having different licenses allowing different access at different times is very similar to having access control or network policies controlling the traffic. Flooding the water line so outside of the city of Amsterdam there used to be kind of a field marshy area called the water line and in if the city was being attacked they could flood that area which would make it too deep and mucky for soldiers or for an incoming foot army to get to the city and it would be too shallow for any kind of boats to go through and this is like your brake glass your default deny policy you're going to isolate the city in this case isolating from the outside you're going to wait until the threat has passed and then you can allow the water to drain off and resume regular traffic and lastly to keep all of the canals and bridges running smoothly they have to have regular maintenance to these canals and this is true of any Kubernetes network we need to be updating our software regularly and performing these security assessments regularly even if the network is seemingly not changing. So the metaphor is great I've done a lovely job really security engineers are the modern day version of knights in shining armor protecting our cities from the bad guys to be clear a much more diverse set of armies than history and fantasy books would lead me to believe we had before. Also I tried to get the AI image generator to do the spider-man meme with Kubernetes characters and it was horrific. So now we're into stage two and we're going to get to know a network when I started on the infrastructure security team I had never even heard of Kubernetes I was only halfway through my computer science degree I had very little experience in distributed systems and no experience in security once I started learning about the various teams the various areas of responsibility on my team I became fascinated by Kubernetes and by network security in general largely thanks to my mentor who is here watching the field is overwhelming opinionated and I really really struggled to learn the basics for me what I felt that I needed to do was spend some time getting to know the network that my team was responsible for and I didn't really know how to do that all of the tutorials jumped in pretty quick and assumed prior knowledge as security professionals it's often the case that we need information about an entire system which is going to include a whole bunch of areas that we're not directly responsible for and we might not know we might not have the domain specific context of course the best thing to do is to go and chat with the domain expert or service owners but this is a really great starting point that we're going to go over if you want to do some exploration on your own we'll stick with the basics as I've said I think that they're the foundation of this kind of work and I want to highlight them if you're interested in going in depth on any of this I'd be happy to talk to you outside of this talk if you see me around the conference maybe an unconference session downstairs and I can point you in the direction of a whole bunch of really top-notch content that has been produced at events like this before okay so some demos I'm going to take a big sip of water they're not really demos I recorded them because I was afraid before the first one I just want to go over yeah those are pretty readable some kubectl commands to get started so as I said we're dealing with a single cluster for today's examples but it's very likely that in your real life you're dealing with multiple clusters so you can use kubectl config view to see which clusters or contexts you have access to and you can use kubectl get all all namespaces but get all doesn't get all which is an important thing to know it it actually gets a specified subset of resources which won't include any custom resources so what is preferable is to use kubectl get and have a comma separated list of the kinds that we care about as security professionals which I've listed down the side is really what we want to look at is what are the nodes what are the namespaces what are the pods what network policies already exist what services deployments and ingress resources do we have okay so weird it's not there this first one is just me kubectling just getting to know a cluster so I'm going to start off by getting the nodes also I made lots of mistakes recording my demos so I don't know why I was afraid because I didn't it wasn't rigorous rigorous enough to redo them but I got them wide with the IPs because it's likely that we would want to do some connectivity testing and we would want access to that information later after I got the nodes I grabbed the namespaces to see where things are contained and we can see a whole bunch of system and tool namespaces and then two that pertain to regions of Amsterdam so we're going to look at the pods in the regions of Amsterdam which we see here which are sub regions of those regions again getting them wide so we have those IP addresses and lastly we're going to look at network policies to see what sorts of communication are allowed and you can see that Centrum is allowed to egress to Zood and Zood is allowed to receive ingress from Centrum so that's a very simple quick thing that I will do pretty much any time I'm dealing with a new infrastructure is just cuddle my way through and see what's happening next is me using inspector gadget to get an idea of what sort of network policies and SECCOM policies I could create for this infrastructure so I did it a little bit concurrently maybe I can do it from here oh there we go maybe I'm just not pressing hard enough so in the bottom I start the advise call for network policies so this is creating a trace of any network calls that are happening during this time and then up top I'm going to do the same thing for SECCOM in the background and then I'm going to port forward to generate some traffic for that and you can see I'm going to now curl it there we go hey coupon awesome and now I probably have pretty close to enough information to look at the result of these scans and so we'll look at I think the SECCOM one first stop the process I found out that there's a demo thing you can use where you aren't actually doing it which maybe I will do next time so it's not as embarrassing and so here you can see all of the system calls that were made during that trace and with this you can generate a SECCOM policy you can do it using inspector gadget and you can also do it using the Kubernetes security profiles operator I didn't go into that but I've included the documentation in the resources slide at the end and they have a really great demo to walk through okay 10 minutes left I gotta go demo faster the next one we're going to look at the network policy and so it's important to note that inspector gadget creates the network policy based on the call so inspector gadget doesn't have any opinions about whether or not those calls were expected so we're going to see a network policy that has a whole bunch of things in it and you might not want to just apply that network policy here I did an example of like a very obvious evil one called I think I highlighted in a second sneaky boat attack network and so if you saw something that was really suspicious or outside what you would expect you would want to remove that from network policy and maybe highlight it elsewhere what I really like about inspector gadget is not only can you get this network policy but also you get the trace which is really interesting if you just want an idea of the traffic if you are trying to see if something suspicious is happening I think I look quickly at that in a second as well here I go and there you go so you can see you have a whole bunch of information here that could help you get some more knowledge about the kind of action on the network and with that you might start off with a really basic diagram just showing what the network looks like so this is the very simple one that we've been looking at here you can see the namespaces and the nodes the pods and a little bit of the traffic indicated as well as the cluster you would then want to turn this into a really simple data flow diagram this is yeah just a very very basic data flow diagram this is going to start to give you an idea of what paths traffic is taking and then you can start to know where some of those threats might manifest I also included this example from the SIG security self-assessments team Grace is here she she does this and it's really a fantastic program that SIG security is running where they're helping Kubernetes projects to level up their security through doing this process basically that I'm talking about today so once you have all of this you will be ready to turn off the projector threat model so now that we know what we're dealing with let's talk about specific threats so we can make a security plan we're going to go over using the framework and it's worthwhile knowing that there are some other tools you can use for pen testing but this is just going to be a really quick overview of threat modeling a very popular threat modeling framework is stride I'm a fan um you can see here what each letter stands for we're going to go through an example for each that's a bit Kubernetes so spoofing pretending to be someone you're not that could be through a compromised token tampering if you have over if you're overprivileged if you have API server access then you can do things you ought not do you can make malicious changes uh repudiation is basically like software gas lighting you do a thing but then you make it so that no one can know that you did the thing um so it's basically a combination of the other two things you spoof and then you tamper to hide your tracks uh information disclosure is just our classic data breach this could be done in Kubernetes through compromised secrets lack of encryption or unisolated workloads denial of service also pretty typical security concern oh yeah um you could do this through overloading the API server uh flooding the cluster network with traffic and lastly elevation of privilege this can be through um having highly privileged containers root users overprivileged roles and not scoping people down so now this is really why we do this um the security strategy is what we do when we want to do a planning cycle when we want to start a new security team join an existing team um or create a security roadmap so we have a bunch of issues now what um we need to triage the issues and label them with priority to ensure that their information complete most of this is standard issue intake triage practice but with potential security issues that cover the entire state of the network at a given point in time we want to make sure that their information complete so if someone picks them up months or even years down the line they understand the full issue and we're not losing discovered security threats so at the end of the day we all report to some pointy haired bosses and we still need to justify why we're prioritizing certain issues um and so I'll go over some of my considerations when I'm triaging security issues I'm going to go fast I'm sorry if you didn't have time to read the comic because I'm almost out of time um so considerations these are the three categories that I like to look at when I'm working on an issue so I think I think they're pretty explanatory but how complex is the issue how many people do we need to do it what is the threat this is um how likely it is to be exploited and then that combined with impact which is if it is exploited how likely is it that it will be critical so if you have something that's not going to have a major impact but it's very likely to be exploited versus something that's completely it's all at the middle of all those layers of security but could be devastating um that's going to impact how you triage them on our team um we use a risk register and this is not our template but this is roughly in the shape of our template um these are some questions that you might answer so um basically when you find these issues you can have them on your backlog board you can have them for a project you want to make sure you hand off ones that are out of your responsibility and maybe relate to specific surfaces but as a security team if you've identified an area even if you don't have time to work on it you're going to want to have a log of it and a risk register is a common way to do that it's a a list of security issues that could be re-triaged and picked up at any time and our team when we looked at plan for the next cycle we consult our risk register and pick up the projects that are at the highest priority um so this is a great example very easy describe the the risk the cat can reach the plant shelf what's the worst case cat kills all your plants would it be detectable depends probably the mess yes um but then it's too late has it happened before if you own a cat and you have a plant yep um how likely well cat's gonna cat yeah and who should fix it the owner of both the cat and the plants is likely responsible um how difficult would it be to fix uh pretty tough cats can get through anything um and are there existing controls maybe some aluminum foil on the edge but if it falls they're still gonna chuck your plant and this this part is the most important part which is what sentence if true would suggest we are managing this risk um so I put here there's a cat proof plant shelf we can leave an army of cats in the plant room for many hours and zero pots will be smashed this allows you know Kaelin Edwards 10 years later to look back and check in with that sentence to see if in fact this is still a concern I think that that's a really magical part to include so now we're at the key takeaways that was a whole lot of content about a whole bunch of different things um but the main focus which I hope uh got across was how to get started securing a kubernetes network and how to form that plan um and again so I'll just go through what we talked about initially we want to look at the components what's the network made up of we want to look at the boundaries what's at the edge what traffic do we want what traffic do we not want we want to look at the threats what are the risks how are other people with similar infrastructures getting pwned um triage where should we focus our efforts what do we have time for what are our pointy-haired bosses going to endorse mitigate finally how can we stop those threats how can we build a security roadmap and start crushing issues and what is not contained on here is regular maintenance doing this over and over again every time there's a new system running security game days so that you can find out um are we prepared if one of these threats we listed happens um that's super super valuable there are a whole bunch of useful tools that I did not have time to dig into today but most of them are here and they would love to talk to you about their tech they're really really great and they want to help you have a secure kubernetes network um these ones are my favorite just kidding I don't have any favorites as an ambassador but I do like these ones and here's some resources um these are just mostly the resources that I used to make my talk so if you want to go deeper on anything there's some really great links here and finally if you're like Katelyn this is awesome I want to do more of this I want to help you please please come work with us both sync security and tag security run a self-assessment program we need help or we need you to tell us about your sub project that needs help so you can scan these uh these QR codes we'll bring you to the slack channel for both uh tag security and sync security I will be at the sync security meet and greet or the sync meet and greet representing sync security on Friday it's at 12 30 to 2 30 but I have no idea where it is um but it is in sketch so please sign up and I'll talk to you more about what it's like to work with the SIG and finally please give me some feedback um this is my second talk live and I would love to hear mostly the things that you would like me to do in the future