 I'm Brian Borum. As the picture says, I work for a company called WeaveWorks, which is an open source software company. And before we get started, can we get a show of hands? Who already knows Kubernetes? Almost, well, let's say two thirds. OK, Docker, Moore, Linux. Just checking. Who knows IP tables? Oh, no. So I put in, this is ancient wisdom, that after the zombie apocalypse, you're going to need someone in your party who, someone who can heal and someone who knows IP tables. What I'm here to talk about. So a way of blocking unwanted traffic. So the basic idea that, I'm going to wave my hands around, it's kind of dark, isn't it? Can you tell I have hands? Yes, there are people waving back at me. So the basic idea that you have group A things, and they're allowed to talk to group B things, and they're not allowed to talk to group C things. I'm going to talk about that in the context of Kubernetes, which is an orchestrator, a thing that runs software for you on many, many machines. I'm going to tell you how we implemented that in an open source program, which is part of WeaveNet. OK, so this is some kind of security talk. So we have to have a threat model. So this is my threat model. The big guy is very big and very angry and is coming after you. So what do we do? We have our threat on the left-hand side. We have our network on the right-hand side, and we put a firewall in between them. That's how we keep them out. The problem is this doesn't always work. If the attacker gets inside your network, he is now free to hit any node on the system, because they're all connected. Worse than that, well, it may not even be an attacker. This may actually just be the dev version of your software that you managed to release to production by mistake. This could be not even malicious. It could just be randomly attacking your system. So what do we want to do? We want to put a bunch more firewalls in there and stop this. We want to stop all that unwanted traffic. Now imagine doing that in an environment which is very, very dynamic, where things can come and go all the time, where things can autoscale up and down. More machines, fewer machines, running containers, running under control of an orchestrator, which is just firing them up wherever it feels like. So it's a very, very dynamic environment. So that's the problem we have to solve. So I'm going to take an example just to show you a little bit more about how this works in Kubernetes, how you specify this. So as a fairly typical example, like a three-tier system where the presentation is supposed to talk to the middle tier, but it's not supposed to talk to the data tier. So this, if you can read that, this is how you would write a network policy in Kubernetes. There's a YAML file. This is actually what it would look like, pasted it in. So it has some metadata at the top saying it's a policy. And then we say the presentation tier will only accept ingress on port 80. So it's pretty simple. That's how you set up those roles in Kubernetes. What do we mean by a tier colon presentation? In Kubernetes, you can label anything you like. In particular, you label pods, which are, a pod is the abstraction of your running software. Pod is really a collection of containers. One container, perhaps, or more than one container, which go together. So we label our containers. Here's a little bit more complicated rule. This one says anything in the middle tier will only accept input from anything in the presentation tier. So this time we said where it's coming from and we did not say anything about ports. And we can kind of mix and match that. We can restrict the protocols, the ports, where it's coming from, where it's going to. So that is Kubernetes network policy. That was added as a specification about a year ago. And there are three or four implementations of that specification. So how do we implement it? Well, this is an open source conference. Get over to GitHub and read the source. No? OK, you expected me to do more work. OK. So this is a kind of high level architecture diagram of how this works. We run a demon process on every host. So in this picture I have two hosts. And Kubernetes has this master service, which knows everything about what's going on. So we set up a watch. This is within the Kubernetes API. We say I want to be notified of all changes and, in particular, to network policies and pods. So I'm going to get calls over that API when any of those things change. Then we drive IP tables. I told you that was important. How does that work? Well, we inject into the top level forwarding chain a rule that says we're going to check some rules under the... Oops, don't do that. We're going to... We'll don't do that either. OK, we're going to check some rules under the heading of Weave NPC Network Policy Controller. If we do not pass those rules, then drop the packet. So that's the most important thing. We fail safe. The next thing we do is we say if this is an established connection, then accept the packet. This is a performance hack. We don't want to check every packet. We only want to check the ones that open a new connection. So the first thing we do is we say if it's an already established connection, then accept it. Otherwise, we check a couple of other chains. What are we going to do in those chains? We need to... So this is kind of flow of the system. We start with the source address on the network. That's going to go over a Linux bridge. And in the course of traversing that bridge, it's going to get run through the IP tables rules. And we make use of this other thing called an IP set. Who knows about IP sets? Oh, much fewer. OK, OK, so today you learned something. So an IP set is a pretty useful thing because in a large Kubernetes system, there could be thousands of pods. There could be thousands of combinations of source and destination IP addresses that we want to either accept or reject. And if we wrote a rule for each one of those, then it would start to get slower and slower and slower because it's kind of a linear search. An IP set is a hash table into which we can put the same information. We can put the source, the destination, the port, and we can match against that in a unit time approximately operation. And if it passes, we send it to the destination. So just because you're enjoying this so much, we'll take a look at the exact syntax of those rules. So we say we're using the set module. It's like an add-on module for IP tables. But it ships in every distro. And we're going to match a set with a very, very funny name. So there's two kinds. There's the kind where we just match on the destination. That's the kind of default set. And then there's what we call the ingress chain where we match on both the source and the destination. So that just depends. Back in those policies, it depends whether the admin specified anything about the source or just about the destination. Why the funny names? So basically, this is encoding the space of names. So each one of these corresponds to a network policy in Kubernetes. And the length of name you can have in Kubernetes is longer than the length of name you can have in IP tables. So we kind of crunched them down and made use of some extra characters that... It's a little bit like base64 encoding, except it's base86 encoding. Or anyway. OK, so those are the chains. That works. That's what we do. It is incredibly easy to get these things wrong. When you are specifying the rules, if you get it wrong, it will drop your traffic when you didn't want it to. So we add one more rule at the top level. We use the NF log, net filter logging technique or rule destination, whatever you call that thing. And then we pick that up in our program. We use UlogD just to subscribe to that. We use channel 86 because that's kind of slang. 86, get rid of it. So we log connections that get dropped by our daemon. And we also export that as a metric that can be picked up as something like Prometheus. So you can alert. If you suddenly get a lot of rejected connections, then that means either you have some kind of attacker in your system or you misconfigured it. And both of those things are interesting things to be monitoring. So we export those. OK. The blog post is the first link. If you're interested to read more about how it works, it has a walkthrough of how you can set it up and try it out. The code is on GitHub. It is all written in Go. And we also have a cloud service where you can kind of run all this stuff hosted. So that's pretty much what I came to say. Thank you. So do you have any questions? We have another maybe five minutes for questions. One on there. I'm going to get some exercise. Sorry, what was it? Back, about six more rows. Sorry. Thanks for the talk. I actually have two questions. The first one is it doesn't seem like you need to use Weave as a SDN provider, but I think maybe we need, because I didn't see any connection between Weave as a SDN provider and Weave as the network policy controller. Yeah. And the second question. Shall I answer that one and then come back to your other one? Is that okay? So you're absolutely right. The program that we implemented is fully generic. It will work in any situation where you can persuade IP tables to check the source and destination and port of your packet. We only ship it as part of WeaveNet because you do need to somehow start the process. We don't want to just tell IP tables to run our rules on every packet on the machine because we only want to run the rules for the packets that belong to Kubernetes. So there's like a few lines of config to send. Basically you need to know like the name of the bridge or something like that in order to start the process off. But your observation is absolutely correct. The Weave NPC is a standalone program and it just so happens that we ship it alongside our SDN. The two things do not need each other. And thank you. And second question. What happens if we implement a new network policy that blocks a connection that was already established and does it do anything? Right. It does not because of that performance hack where we... If I can find that rule. Yeah. So this thing where we don't want to impose the lookup overhead on every packet. We only do it on connection establishment. So if you manage to establish a connection, that will stay active. So we did discuss... I'm a member of the Kubernetes SIG network committee and we did discuss this point and we felt like if the bad guy got in already then that's too bad. Reboot the machine or something like that. If you want to drop that connection, that's not within the scope of Kubernetes network policy to drop already active connections. Thank you. We have time for another two minutes for another question or two. Somebody is interested. Quick. Think of something. Oh, one in the middle there. That's like the worst possible place. Can you just do sign language? Do you have any numbers in terms of basically performance and escalation? So at the point that you start having more containers, basically you start having more rules of EP tables. Does this affect and show how the performance is? Right. You're asking about performance. So the design of the system is that there's one rule per policy. The number of containers should not impact the performance because each source destination is a hash table lookup. So it's designed to remain the same speed as the number of containers or pods in Kubernetes grows. It will slow down very slightly as you add more policies because each policy has to be checked to see if it applies.