 Welcome to our talk while we try to answer this very popular question that many of you probably have heard. How many network policies can I create? My name is Nadia Pinaeva. I'm a senior software engineer at Thread Head. Hi, and I'm Sean Crampton, a distinguished engineer at Tigera, the company behind Calico in case you haven't heard of us. So today we're going to try and answer this question. How many network policies can I create? Nadia has the first half of this talk. So she's going to take you through why that's a difficult question. She's going to introduce you to her new model and framework for scale testing network policies and why it was needed. It comes down to some kind of problem with measuring confusing multi-dimensional objects, these network policies we're dealing with. She's going to introduce you to CubeBurner and the convergence tracking approach that she's using and then show you some sample results from OpenShift. Then it'll be my part of the talk. I'll introduce Tigera's part in all this and give her a demo of testing, like how you run one of these tests. Then we'll circle back to wrap up and try to answer the question. OK, Nadia, over to you. Thank you. All right, let's start from the beginning. What is network policy? For those of you who may not know, network policy is a Kubernetes API that ensures network security. What it does, it allows you to specify connections that should be allowed for a specific set of pods and then everything else will be denied. So here is a simple example that you can see. That's a network policy that is created in the default namespace and it isolates all the pods in that namespace. It is very simple and it has just one ingress rule that allows connections from a specific namespace with a project name, MyProject. So if you look to the right side, you'll see an example configuration where we have two namespaces. The default namespace has just one pod that is the pod that is isolated and there is the MyProject namespace that also has just one pod. So in this case, given network policy allows just one connection. Now, we create one extra pod B in the default namespace. The network policy is still exactly the same, but what it does now, it allows two connections. You can imagine we create a couple more pods and the number of connections that the exact same network policy allows will grow. So what it means for us is even network policy with the exact same spec may have very different scale impact on the cluster. So what it means is that network policy scale impact depends not just on the spec of the object itself, but also on the other objects that exist in the cluster, especially namespace and pods and their labels, which as you can imagine makes the scale testing quite tricky. Now there is more than that. The spec of a network policy itself is also pretty tricky. It has lots of lists of different rules as you can see here and these lists may have any number of elements. Like for example, you can see a list of ingress or egress rules. Every rule may have a list of peers that it specifies, a list of IP blocks, exceptions for some side rules, lists of different pods and many more. So now you can imagine that the network policy we've seen on the previous slide will have also a different scale impact that had just one rule and this network policy is a bit more complicated and the number of rules also can grow like 200s and 1000s potentially. So both these objects are just one network policy but they are very different. So saying I can create one network policy without adding more details to that doesn't really clarify what is the scale impact of that. So we tried to solve this problem with defining a network policy scale profile which is a set of parameters that you set up and that allows you to simplify all the possible network policy configurations but it should be expressive enough to allow for different configurations of different network policies. So these are, I'll try to explain a couple of parameters that we have here. The first one is the number of local pods, that's the number of pods that are isolated which as we figured is important. Then we also simplify all the generated network policy configs to have just one peer in every rule as you can see here and there are two main types of peers in network policy. They are either cider based or pod selector based. So ciders is the parameter that specifies the number of IP block peers. You can see there are three on that slide and the other one is pod selector. So in this example there are two pod selector rules and pod selector is a bit more complicated than ciders because you also need to say how many pods are selected by each peer and we do that with two extra parameters that are called peer namespaces and peer pods. They are again for simplification all namespace peers will select the exact same amount of pods but the pods will be different but the amount will be the same and it is controlled by these two parameters. If that wasn't enough we have a couple more. So as we've mentioned there are also pod specifications that may be attached to every rule in network policy. There are two main types which is a single pod which just a number and the pod range which specifies the beginning and the end of the range. So here when you specify this parameter in our scale profile it will generate a pod config as you can see here both for ciders and pod selector rules. Okay let's take a look at a simple example. Oh, right, so all right, just to sum this up the network policy scale profile in the end has seven parameters that you can see on the slide. That is the thing that defines how network policy YAML will look like and that is good enough to understand what is the scale impact of a given network policy. Now I hope at this point it is obvious that there is no single or simple answer to how many network policies can I create but we can still add some certainty here by using this network policy scale profile which hopefully should allow us to answer this question with some more details. All right let's take a quick look at the example network policy that will be generated for a very simple profile. It has just one local pod and one cider. So you can see it here. There is one ingress rule with a cider based peer and there is a pod selector that selects one pod. Believe me that selects one pod or a tricky labeling system to do that. All right so we've figured out the network policy specification more or less. There are a couple more things that we need to set for every namespace which is the number of pods per namespace, number of network policies per namespace and ingress and egress policies can be also created separately in case that matters. And that set of parameters specifies just one namespace. So what we're gonna do to scale test the cluster will create a number of copies of these namespaces and that goes to the namespaces variable here. All right so that defines the workload I hope more or less in the cluster from the network policy point of view. There are though many more parameters potentially that may affect the test results like the number of nodes, the number of resources on the pod or resources allocated to the node, QBAP server performance and so on and so forth. But fortunately that applies more or less to every scale test. So that is a problem that exists not just for network policies but for all the other scale testing. So we won't pay too much attention to that. But it is important to remember that there are lots of different things that can affect the scale results for a specific run. Okay so we have defined what kind of a workload we want to create. How do we do that? There is a really nice CNCF sandbox project that is called QBurner that is able to generate the workload based on given YAML files for you. So it can create the lead patch Kubernetes objects. It has also some nice features to report test results and collect metrics during the scale run so that you can see how your system is doing in the middle of the scale test. And there is an extra repository here called QBurner.net's Netpol scale. It contains all the YAMLs for the network policy scale testing with the mentioned parameters. So what it does is it generates the workload with the given scale profile using QBurner. It also may visualize scale testing results with Prometheus open-source Grafana stack for you. And it also can do some static network policy analysis which we'll get back to at the end of this talk. And I want to say a special thanks to the OpenShift scale test team that really helped me with that and the QBurner is a really nice project that will help you introduce new parameters in case you need to do so. Okay it looks like we are almost ready to run our scale test, but not just yet. There is one extra nice feature of a network policy that we need to figure out. So the point is network policy doesn't have a status. That means that we have no way to know when all the network policies are applied. So all the objects will be created but it doesn't mean that it is applied to all selected pods at the same time. So what do we do? There are potentially many ways to do so but that's what we went with. Considering we apply the whole workload at once and there are no new changes in the cluster, your cluster should come to a stable or a converged state at some point, hopefully. Now we can try to track networking state on every node and that's what we call a convergence tracker job. That is a customizable job that can be, in our case is a simple Python script and it tracks the networking setup progress on every node. Different networking plugins have different definitions of what the networking state on a node is but that is also what's good about that because they can just implement their own convergence tracker to just see how it looks like. So I am a contributor to the OVN Kubernetes project. That is a plugin that implements network policies using OVS flows. And here you can see the example of metrics that shows the number of OVS flows after you apply a set of network policies. So you can see it's growing, growing, growing and at some point it stops. That point is where all the network policies are applied and where we can say, now we are done, the scale test is finished. So that's how it's gonna look like. All right, so far so good. Now we're ready to run the test and get some nice pictures. So QBurner, as I said, allows you to collect some metrics and see some nice dashboards. So here are a couple of snippets of what you'll get from every scale test run. And these dashboards can also be stored so you can get back to your scale test results and see how it looked like some time ago. So you'll find here all the different parameters of a scale test run, the number of created objects, the scale per name space. You'll find the seven parameters that I've tried to explain before, which specify the scale profile itself. And there are also some details from every convergence tracker from every node that shows at what time every node was converged. In addition to that, there are many more things that you can report. An important example is the resources usage. That's probably what you also wanna track, which is the CPU and memory usage, for example. So here you can see it's OVNCubeNode pod CPU usage, which is the thing that actually implements the network policies. And you can see that when the network policies are applied, the CPU usage bumps up, and then it goes down when everything is done. Okay, so let's say I've run this test for a thousand network policies. My dashboards look good, CPU usage is fine. What do I do next? I need to find the scale limit. This is what I care about. So to find the scale limit, we need to define the conditions for a test failure. So when the test is failed. There are multiple ways to do so. Cluster death is an obvious one. That will happen sometimes if you apply too many network policies. Then the other one is cluster health. So you can get different metrics from your cluster and track how well it's doing. Maybe in resource usage point of view or from whatever you care about actually. And the main thing we use here is the conversions time. So we say that if applying a given network policy configuration takes longer than N minutes, and you can set it what N minutes is yourself, then the test is failed. It was too long and it's unacceptable. We say this test doesn't pass. So to find the scale limit, what you can do is you can save all the parameters as you can see here at the little spreadsheet example and just increase the number of network policy per name space for every run. At some point, your test will fail and say I cannot handle that many. That's what you can see there in red. And then basically the answer to the question how many network policies with a given scale profile can I create will be the last successful run. Yay. Okay, so how else we can use that we call this whole system scale testing framework and I am a part of the OpenShift networking team. And what we did is we released some performance improvements for network policy handling in OpenShift 4.14. And we use this framework to just see and measure how much better network policies are handled now. And I will not overload you with lots of details. You can imagine there are lots of different dashboards and parameters to that. I'll show you two simple examples here. So first of all, this is the graph you've already seen which is the number of obvious flows and convergence time. So on top here, you can see how this graph looks for 20,000 network policies with a very simple profile, one local pod, one pod selector and one sider. So in OpenShift 4.12 before the improvements took 24 minutes to converge. Here at the bottom, you can see that after the performance improvements in 4.14, the exact same configuration was applied in five minutes. And that is how you can use this test by the performance scale framework, right? You can measure and see how much better your system is doing. Now, since this system converges faster now, it means we can also create more network policies within a given time. So that's what we have here. On the right side, you can see the scale limit for OpenShift 4.12 was just 15,000 network policies with a given profile. And in OpenShift 4.14, it was already 60,000, which is the sum of ingress and decrease network policies if the numbers don't make sense immediately. Okay, so really good. We use that for OpenShift with React, how well our network policy implementations work for now, but then we thought, oh, maybe it can be useful for someone else too. And the rest of this story, Sean will tell you. Thank you, Nadia. So, I work for Tigera, the company behind Calico. Sorry, just getting a bit of feedback. And where did we come into this story? So, Nadia invited us, as she said. And the problem really resonated with us as well. It's a question that we get asked a lot. We get it from customers, how big does this scale? And it always depends on so many different factors. So, it tends to be costly for us to answer. And we don't have a good way of doing self-service tests for our solutions team and our customers themselves if they wanted to run it. We do have some scale testing infrastructure, but it's a little bit long in the tooth. It gets necessary maintenance, because it's not our main focus, but it is very capable when we need to use it. But we're talking Grafana 4.x, I checked the version of the dashboards that we had there. And it's not really suitable for self-service because the inputs are not as kind of neatly specified as the ones that Nadia took you through. They're not very intuitive. It would need a lot of polish to open source it. So, perhaps, CubeBurner is the answer. We were just investing in scale and scale testing infrastructure again. So, we were also kind of looking around and seeing if the landscape had changed. And Nadia came along and asked if we wanted to contribute to the new tool. So, it sort of dovetailed really nicely for us. So, what do we do? My colleague Mazdeck, mainly, who couldn't be here due to some fun with his passport, built a convergence tracker for Calico. We tackled our sort of mainstream IP tables and IP sets data planes. So, we're monitoring the Calico equivalent to OVS's flows. We're monitoring IP tables and IP sets convergence. But we'll likely extend it to do BPF and other things later. And it all kind of weighs in about 200 lines of Python. So, it's an approach that's really easy to get started with and get something that works and you can start testing. And then you can add more things into it later to monitor the health of your components and things like that. Make sure that things are staying up. But you can do that manually initially. So, it's quite a nice framework to get started with. We added a KubeBurner yaml to scrape our Calico-specific metrics from our Prometheus instance and then those show up in the dashboard. So, that was straightforward again. And we made a new Grafana dashboard. This time for Grafana 10.x. So, things have moved on a little bit. It all went really smoothly with a lot of help from Nadia. So, she helped land some patches to KubeBurner with her contacts there and that sort of made our convergence tracker easier to write. And we ran some tests too. So, we also had some performance improvements and that seemed like an obvious thing to show. But I kind of wanted to show two sides of the coin. So, our performance improvements in our 327 release they apply mostly to selectors and not to like cider-based rules. So, I've got two profiles to show you later in the talk where we can compare the two different change in the 327 results. But before we do that, let's do a little demo and show how you run this tool. So, before the talk, I've set up a cluster in GCP. So, KubeBurner now supports arbitrary Kubernetes clusters. It's not in any way tied to OpenShift. I've got Prometheus metrics enabled in Calico. Actually, I'm going to do that as part of the demo. And I've got a persistent results server with Grafana and OpenSearch, I think we used for this. I've set up KubeBurner, the CLI tool, which is a download on a Jumpbox in GCP. And I've checked out Nadia's repo with the YAMLs that we put into that. And that has Calico's contribution in there. So, I'm going to switch over. Can we switch to this mic? Yeah. Okay. So, I've got my Jumpbox. And I'm going to set up my KubeConfig. And then I have a script. I have a script to configure the cluster. So, I'll run this script. This uses KubeKutl to patch Calico's configuration to enable Prometheus. And it also installs a manifest from Nadia's repo, which is what's needed to monitor Calico. So, it's just a kind of basic Prometheus setup that will work for this. So, if I run that, everything got patched and created. And then I've got a little function I can run, which again came from Nadia, that will just check that Prometheus's service has come up. So, then what do I have? So, while that comes up, I've got the Kate's Nepal Scale repo checked out. And most of the meat of it is inside the sub folder. And the most important thing to look at is the end file. So, that's a set of environment variables that control the framework. And you will recognize these from Nadia's part of the talk. I've also got KubeConfig in there. You need to set the platform appropriately. So, there's a sub directory for each platform that's supported. So, this is the offset Calico, obviously. And I've enabled the convergence tracker. And that's most of what's in there. I think the only other thing is, I've reduced the end of job pause. Normally it waits at the end of the job to collect CPU metrics and so on for a little while after. And I've shortened that for the demo. And if I show the metrics.yaml, I think this is a good one to look at. So, these are Calico's metrics that I've added at the end. So, basically, you just give a Prometheus query to KubeBurner and it will scrape all of these at the end of the test and put them into elastic search for the dashboard to show at the end. So, they go in the persistent store and your Prometheus can be discarded at the end of the test. And then the actual test itself is in that network policy.yaml. But I won't dig into that. It's a lot of templating. So, I think we're ready to run the test. Oh, did I source my end? I'll source the end file. And that's the sort of command that you need to run to execute KubeBurner. So, dash m for the metrics, dash c. I'm not sure what that stands for, but you pass the network policy, which is, like, the test you want to run, dash u to configure the Prometheus instance. It's going to scrape. And I've set log level debug just so we get more output. So, if I run that, it kicks off. And it goes through multiple phases. So, the way you configure KubeBurner is with, like, folders of yamls and chunks of yamls. So, the first phase is creating the convergence trackers. And then it's going through and creating some network policies. And while that's running, it takes about five minutes to run this demo, so I'll switch to something else. It normally takes about 10 minutes to run a proper run. While that's running, we can go back and have a look at the metrics. Switch back to that mic, I think. So, here's our results with a side-heavy profile. And in these results, this is the results from 326. I wasn't really expecting any big change in 327 on this. So, it converges fairly quickly. It doesn't use a lot of CPU, which is the graph on the kind of the left there. And the two graphs at the top, those are from the convergence tracker. So, we get metrics from the convergence tracker showing the IP tables' rules climbing and the IP sets steady in this test. There's also more results that would be off-screen here showing Kubernetes things. Moving to 327, it did improve, which was a bit of a surprise, but it didn't improve by a whole lot on this test and the number of IP tables' rules and so on, CPU usage all kind of fairly consistent. It just converged a little bit quicker. But if we flip over to the selector heavy profile, so here we have a lot of peer selectors in the rules instead of just kind of simple ciders, we actually used the script that Nadia talked about to ramp up the number of network policies until it failed. And then we got this graph with very spiky CPU, probably a lot of garbage collection and nasty things happening, loads of IP tables' rules, loads of IP sets. But if we switch to 327, we were able to confirm that it converged at that rate and the CPU was a lot lower and everything was like as we were expecting. We also kind of doubly confirmed that things were working as expected by adding some extra metrics in 327. So we were able to put those into our dashboard so we can show this graph at the bottom left, optimize selectors. That counts up when we know we've optimized something as we were intending to optimize it. So it's really useful to be able to add those calico-specific things in for our testing. And yeah, it's sort of, it's nice. So, there are my results. It's probably still running. Let's get back to that idea maybe. Yeah, let's see how do we do. Yeah, pausing for one minute before finishing. Almost done. Let's just carry on. So, Tigera Perspective. I think I'm on these mics now. Tigera Perspective. Really glad Nadia reached out. It was a really good time for us. It is a shared pain point for us and hopefully we all benefit from the shared project. I know we'll be using Cube Burner in future because it's a lot nicer than what we have. We're able to kind of map some real policy questions that we had and some real perf questions into the available parameters and kind of showed roughly what we expected but some surprises which is nice in a way because it shows you that you really are testing something. And I'm interested to see where it goes next. So, maybe a little bit out of scope for this but we'd quite like to figure out a way to do like higher scale testing with mock nodes and things like that. We've been experimenting with that internally and it'd be quite nice to have some real connectivity checking like lead on top of the convergence stuff just like stoke my paranoia and make sure it's really, really doing what I want it to do. But yeah, we've enjoyed contributing. So, Nadia, how many network policies can I create? All right, getting back to the original question and telling what's next. So, we have an answer for this question with a predefined scale profile which boils down to seven parameters I've been talking about. Now, you could say that people don't usually run thousands of network policies with the exact same scale profile in their cluster. They probably have a couple of policies with very different profiles at the same time. So, what can we do with that? And that's something that's still in progress but we are working on that. First of all, we can try to define some meaningful scale profiles to be used as an upper bound. Saying, if I know that a 100 bot selector profile allows 10,000 network policies and I know my network policies just use 10 bot selectors and they need just 2,000, then it probably will work, right? Because it's much less than what is already supported. So, I can use this information already. Now, I may have lots of network policies in my cluster but I don't really know what's the scale profile for them. I don't know how many bot selectors and ciders they're using. So, there is a tool now that, given the output of all the YAMLs that it needs which is bot's namespaces and network policies can give you some analysis or a scale report for the existing workload that you have and it will show you all the scale parameters of different network policies that already exist in your cluster based on this YAML. So, it just provides some statistics and data for you and the most ambitious one is the approximation for any type of profile using existing scale test results. The simple option is, I know, one-sider profile allows 10,000 network policies and 10-siders allows 5,000 network policies. Can I approximately say how many network policies for a five-sider profile can be created without actually running the whole test? So, that's the simple idea. It goes further to all the different parameters. Of course, that's a pretty complicated thing but we are also working on that and hope to get some nice results there. And the last thing is adding more coverage. As you've noticed, we don't have IP block accept for existing network policies or named ports. They are not a part of the profile just yet but it can be added anytime. And we also hope to use the same scale test profile for other APIs like admin network policies in the future. Now, let's see if the demo finished. So, it did. And it spat out a UUID. So, this is how CubeBurner records the tests in Elasticsearch. And if we switch over to this tab over here, we should be able to show the last 30 minutes. And if there was more than one test in the last 30 minutes, it would appear in this dropdown here. But I think we got the right UUID. 8652, yeah. So, we'll show that one which was already visible. And if you home in on the time, this was just a rerun of one of the tests I already showed but you can see you get a live dashboard where you can poke around and scrolling down. We have a few more Calico metrics and some metrics from the Kubernetes API server. You can add whatever you want when you're building these dashboards. And yet, the profile at the top. Oh, the nose should say five. That's a mistake. So, it works. It worked. I think we have a couple of minutes for questions. So, maybe we can just have a discussion. Be here for any questions and discussions considering a lot of people are leaving and moving probably. Great. Thanks, everyone.