 Hello everyone. Thanks for coming and joining us today. We have some really exciting stuff to talk about Today we are going to be discussing The intro and project upgrade update to the network policy API subgroup to sync network This is an exciting time for us. We've never kind of had a main stage at cube con So we're really excited to Tell everyone what we're doing. My name is Andrew Stoikis. I am one of the maintainers for the network policy API Repo, and I work at Red Hat as a senior software engineer My name is Dan Winship. I also work at Red Hat on various open shift networking stuff I'm Surya, and I'm an also an engineer working at Red Hat I contribute a lot to the signal drug policy API working group. Happy to be here at cube con My name is young and I work in VMware In terms of container networking and security stuff and I'm also a part of the worst-same working group So hand it back to Andrew. Sweet. Thanks everybody Okay, so we just learned a little bit about who we are at least the four of us here. There are more But what do we do and where did we kind of come from? So for a long time? Network policy was just maintained by the core sync network group And obviously it's been a stable API for a long time. I'm sure many of you know it who here has knows about network policy Who here uses network policy on a regular basis? That's a lot of people. I'm awesome. I'm stoked about that. Cool. So yeah We basically spawned out of sync network as a group who wanted to focus specifically on network policy and focus on the future of it because it quickly has become apparent that Doing the future in core was gonna be very difficult and projects like Gateway API have really kind of Led the way in delivering stuff with CRD So you're gonna see a lot of threads that are similar with that project here today We have a really good group of contributors You're gonna see some actual pictures of them later on in the slideshow, but you can check out all our contributors at this link So what do we do? We help maintain the legacy network policy resource that includes documentation work Additionally, we have completely reworked the upstream network policy test suite that works almost a year and a half old Now, but that's what got us started. We converted it to use full Table-driven test design and I think it helped the community a lot more recently if we worked on the admin network policy kept We actually had a talk yesterday on that and we're gonna learn more about it today And we're gonna tell you what we're doing in the upcoming year as well over to Dan Okay, so a Long time ago in a galaxy right here We created network policy version one the idea was that Developers using Kubernetes had ideas about who they didn't didn't want to be talking to their pods So we have one use case here as an application developer I want to control which other users within the cluster can access my application So you might have two different groups of users You don't want the people in the other group connecting to your service So you created a network policy saying only the people in my group can talk to this server Another possibility as an application developer I want to have a multi-tiered architecture where I can control access between pods in each tier So this is like you have your front-end name space your back-end name space and your database name space And the front end can talk to the back end and the back end can talk to the database But the front end can't talk to the database and in fact nobody other than the back end can talk to the database and Then if a hacker breaks into the cluster then it's limited what they can do because of the policies that you created Now one thing that is not a use case for a network policy is as a cluster administrator I want to impose my policies on end users in my cluster Back when we yeah, so this was designed in 2016 at the time most Kubernetes clusters You know people were scaling up from running Docker Had a DevOps model where it's just one person deploying their cluster doing their own stuff So the administrators and the users were really the same thing There definitely were people who are starting to play around with clusters with with more roles RBAC was actually added in the same Kubernetes release as network policy But of course that meant that when we were designing network policy RBAC didn't exist There really wasn't a strong concept of administrators versus users in a Kubernetes cluster And so we couldn't really design network policy To to differentiate administrators and users and and provide use cases for the administrators specifically So we just said well, we'll do that later So so here you can see a sample network policy, I mean it sounds like all of you know this part already The example is what I said before back end can only can get traffic from the front end the database can only get traffic from the back end Network policies are namespace scoped So you create the policy and it sets policy within only that namespace Possibly affecting traffic coming from or going to other namespaces, but always anchored in one specific namespace The peers can be pods namespaces we added cider blocks later on that came a few releases after the original network policy And and one of the decisions is that the the API design is sort of implicit in nature You say what you want to allow and then everything else gets denied and that worked Okay for the original use case of you know I want to stop people in this other namespace from getting to my service Well, you know everybody who you want to be able to access your service So you can just put all of them in the policy and then everybody else gets denied and that works For other use cases that started to become more of a problem So it's been a stable API for six years. There are a lot of implementations. We have some pretty icons here But it has problems and it has had problems So so one is what I was saying the implicit isolation the policy specifies what you want to allow and then everything else gets denied There there's no finer grained control and that makes it hard to layer or policies Or you can't even like say I'm going to create a service And then I'll just throw in a policy that says these pods are allowed to connect to my service Because if you do that in a namespace where there weren't any other network policies already Then you've just denied everybody else the ability to talk to your service when you might not have meant that you might have just meant I want to guarantee that these pods can talk and I don't care about other parts The fact that there are no explicit deny rules there's no priority between multiple policies There are no policies that span multiple namespaces Means that it doesn't solve a lot of the use cases that administrators had for the policies that they wanted in a cluster and and finally While it works pretty well for pod to pod traffic We never fully nailed down exactly how it works for traffic coming into or going out of the cluster So for ingress traffic if you try to apply a network policy to it and say like I only accept traffic from 10 dot 0 dot 0 dot 0 slash 8 That cider block might get matched to the source of the traffic or it might get matched to the load balancer that redirected the traffic into your cluster There's really no way to know for sure and it's different between different clouds and and different CNI plugins Likewise traffic exiting the cluster Because there are so many like different other networking things involved sometimes It's possible to bypass network policy exiting the cluster by like say creating a service and then having the service redirect Through a load balancer and things like that. So It's a little bit vague there So these are all issues that had had sort of some of them we realized right away some of them it took us a while and We realized we need to do something about this So time for a new API to solve all the problems that then just mentioned the working group decided to open up a cap and Look at the numbers. We have 10 plus contributors Putting up more than a hundred individual commits resolving over 600 Comments and the resulted by the end of the day, you know, if you look on the github PR There are just too many comments to unfold so that the github page will actually get frozen sometimes But this is just how much effort we have been put on to get this PR get this cap to be merged Which is one year after it's open So that's why I wanted to give a huge shout out to all the contributors to the cap which made this all happen now Okay So so why is this so hard? What made this really hard and to merge? We started out just wanting a cluster scope admin facing and CNI agnostic policy API Just to solve the problems that then mentioned now we realized then Soon that you know, we have more work to do first of all we needed to consolidate on the use cases There are a lot of people coming in the community Talk about their use cases where we realize it's just sort of like a single use case People wanted to say that I wanted to segregate my clusters. So there are different networks I wanted some namespace to be able to talk to some namespace but not the other and Some people come to us and say I want to each namespace to be isolated with each other We soon realized that that's sort of like the same use case because you basically wanted to support tenancy on cluster In some sense right now we're come tenancy on cluster The other thing is we wanted to be explicit in the API design Meaning that we don't want it to make the same mistake that the now a policy API has made Each rule that the admin now policy has should be read as is and does not have any Implicit meanings tied to it and it should also be unambiguous Further on it because of we wanted to support maybe that allow and deny actions that means that priority modeling is inevitable right and We also wanted to have rule actions on top of allow and deny that actually makes sense to everyone We were proposing some words called datagate, but it didn't sort of like make sense to a lot of people So that's why there were a lot of iterations in the design of the API We have to went through and make sure the API is actually minimum But solves all the use cases that we come up with and is also future extensible Which we will show you why because we have all these cool Features that what we wanted to add to admin now a policy in the later releases So started from the cab now we hear We have merge we have two objects Which is the admin now a policy and based on admin now a policy that we have in a now a policy API Now a policy API repo now it is out of tree We want to follow the success story of the gateway API and right now the two Objects are in our view one alpha one and it supports intra cluster controls Which in the future we are also thinking now south and we are talk about you know the enhancements Later on in this talk So The use cases that the admin now policy API is focused on You can see there are some major use cases that I think would resonate with a lot of you First of all isolate tenants in the cluster, right? So there are a couple of namespaces that belong to some tenants and other namespaces that belong to another Tenant you want them to not talk to each other, but intra tenant connection is fine There's also that always allow ingress or egress to DNS, right? You do want developers to write a now a policy, but accidentally block access to DNS where they just don't know why and You will always want to the monitoring to be able to egress to the namespace that you care about and Developers should not be able to block monitoring namespace from collecting telemetry Finally, there is also a use case where the admin of the cluster doesn't really just care about You know, I shouldn't say doesn't care about but if they want to delegate to the power For the namespace owners to decide that what kind of policy They want to have for the namespace and we wanted to make it explicit, right? So the admin wants to delegate the power of writing the policy to the namespace owner so those are the things that the admin now a policy can do and There's also a baseline admin now a policy Which does the following so in the Kubernetes cluster by default? You will have one part be able to talk to each other. That's kind of that the default security posture of the Kubernetes cluster where today you haven't you don't have a way to flip it basically So what the baseline now a policy does is that it gives you an ability for the cluster admin to Flip the default now work security posture for a cluster so that you can implement something like zero trust and another use case for that is that if you're dedicating a policy writing abilities to a namespace owner Well, what if the namespace owner doesn't actually have any now a policy in the namespace you want something, you know Default right you want something to catch that So that you know, you don't delegate power to your namespace owner and they just not using it Then everything just becomes allowed by default and mark Maybe you don't want that so there are different intents of the cluster admin that we were able to capture with this to CRD's and we have a QR code here which links to our Talk yesterday where we showcase the admin now a policy and baseline admin now a policy objects in a little bit more Subtle plug on that if you're a Harry Potter fan go check out that talk you will really love it. I promise Go and So the API has been you are not far one for over a year and we are on the journey towards beta Which between our working group we think is soon, right? So we definitely wanted to you know, call out call out for contributions. I think you should you should probably say that in Later slides, but we already have two implementations for the API already Which is Andrea and oven Kubernetes who are all open source the CNI projects and the Admin now policy API is also on the roadmap of cataclysm and could be OVN Which we have, you know, their issues for tracking admin now policy support is there down there Okay, so I'm gonna hand it over to sure for the cool stuff So Dan actually talked about our distant past which was network policies He talked about the issues that we've had with our API Yang did a good job talking about our recent past which was admin at work policies So like he mentioned I get to do the cool stuff Which is all the forward-looking features that this group has been working on and We have a lot of cool features lined up, which I'll talk about in a moment But before that Yang talked about caps, right? So he talked about how painful the Kubernetes enhancement proposal process was so how many of you have done caps Not that many good for you, but I do see a few hands here. So Yes, we did so the core sig network does caps, right? But the subgroups the gateway API does gaps the gateway enhancement proposals and as the network policy subgroup We do npeps the network policy enhancement proposals So if you want to do a cool feature want to contribute and you're unsure what to do open an issue Open an npep, right? So that's how you can get started and we also have Features that don't require npeps. So just come and check out our GitHub repo and Open npeps or open issues, right? So The first npep that we have had which we've been working on for the past few months is conformance profiles So Dan talked about all the cool network policy implementations in his slide. It's an extensive list Even we don't know the full list of implementations that we have for the network policy core API Yang also talked about the two fresh implementations. We have for our admin network policy API We are expecting more and more implementations in the future, right? So one of the main issues as API maintainers and project maintainers is that we've not really had a proper Implementation tracking mechanism in the past and that is what this network policy enhancement proposal is trying to solve So it's about coming up with API conformance tests Where both the end users the implementers work together hand-in-hand with the API maintainers and are heavily involved in Coming up with a set of conformance tests grouping them into core and extended set of features depending on the fields in the API and Then we come up with these conformance Profiles which helps us report these conformance test results from each of these individual implementations back to our project So what this essentially gives us is it gives us a way of telling which implementation is using what fields in the API How is it useful and we get this consistent feedback loop going and that actually helps us make our API is better So that's the end goal of this NPEP It's actually already merged into our project. So please check it out But we also need more help in this area so you can Scan the golden QR code right there the help wanted one We have tagged them with the area conformance and there's a lot of First-hand good first issues that you can also check out So if you're interested in contributing we do need help and reach out to me after the talk also if you want to know more about this and Sample conformance profile test report is shown right over there Where we have the admin network policy profile and the baseline admin network policy profile Right and it shows you a sample of how the tests are evaluated how many past how many failed and then you can basically Basically as an implementation get a conformance badge if you're conforming with our API The next NPEP is about egress. So like yang mentioned currently our admin network policy API only supports intra cluster traffic controls So east-west pod to pod traffic There have been feature requests for supporting egress also so northbound This NPEP is mainly focused on northbound that is egress controls outside the cluster from pods in your cluster And today the API has two sets of egress peers right so pods and namespaces and As a result of this of all these discussions, right? So these use cases where you might want to have a case where you don't want a pods to talk to 0.0.0.0 right so block everything or you might have cases where you don't want a pods to talk to nodes in your cluster So one node or set of nodes so you can use node selectors or you might also have cases where you want to block a Specific cider range allow another specific cider range. So basically egress traffic controls This NPEP proposes two API fields two new peers for the egress, which is nodes and ciders We are discussing how our cider design should look like whether the cider should be egress only which is Outside the cluster external versus should it also have pod ciders or service ciders included. So there's a lot of Discussions happening in the NPEP. So if you have Interesting use cases for egress, please leave commons on our issue right there in github The next NPEP is similar to the previous one But maybe the ciders are not what you want, right? You want fully qualified domain names as your peers with wildcard selector patterns etc. So that's an extension of the previous NPEP The credit here goes to Rahul Rahul Joshi. He's been contributing heavily to this NPEP. So shout out to him He's not here today, but So pleased to check out this NPEP and scan scan the QR code if you want to know more about this Over to Andrew who'll take us through the rest of the NPEPs. Awesome. Thank you, Surya Cool. So this NPEP is involving cluster ingress. We talked about cluster egress a little bit We obviously want to highlight our contributor who owns this Nadia who's sitting right here. She's awesome. So let's give her a round of applause Thank you. And so obviously this seems pretty simple, right? It's like just the flip of egress But we as a working group have really struggled to find some concrete customer use cases for this NPEP So this is kind of a call for help. We have an NPEP open We're trying to get the user stories and use cases together If you are a user or a customer who can think of reasons that we need ingress Control and policy and an admin network policy sort of subject Please let us know give us feedback on the NPEP come to our meetings. We are as always very open Okay, and again from Nadia doing a ton of work On the new NPEP for tenancy. So when we wrote the cap for this for admin our policy We had built in the desire to support certain tenancy related use cases now when we got to around to actually implementing view on alpha 1 of that API To be honest it had been a year of kept work and and we ended up on a design that we've Really not loved so this is the best part about our NPEP process We now have a new Enhancer proposal to make that design better And so the problem was specifically the original API allowed like way more expressiveness than was needed So we're kind of locking it down making it more explicit with the ultimate goal of making it easier to read for users like you That's that's our ultimate goal. So pretty excited about that one Again Nadia's here if you have questions you can ask her after I'm sure she'll help you out Okay, so this actually isn't an NPEP. This is just an example of a new feature that we have folks working on Part of the kept for admin our policy included a focus on developer tooling with the introduction of priorities with the introduction of new API API objects. There's a lot of complexity, right? I'm sure we a lot of us even struggled with network policies to understand at the end of the day What is going on with my pod like why is traffic not getting there? Is it a network policies fault or is something going on? It's more sinister So this policy assistant is built on an upstream cool tool called cyclonus And its first goal is to create a tool that lets you understand and iterate upon your policies So it will give you kind of a bash table like output that helps describe your policies So you can understand why traffic is being allowed to your pods or not We have the ultimate goal of turning this into a kube cuddle plug-in But this is really really important and the work is just getting started shout out to Hunter Gregory from Microsoft He's not here today, but he's really dove in in here and just completely gotten us going We're really really excited about it at the end of the day. It's also going to help us extend our Conformance testing framework. It's going to help us kind of integrate truth table testing for admin network policy And there's just a lot of other exciting use cases for it So if you're inside it excited about developer tooling, this is where you can hop in and For our final topic, I'm sure a lot of people want to hear about is what about network policy v2 Dan's gonna hop in and explain So before we were talking about how we had network policy v1 And there were these problems with it and then we created the network policy working group to create admin network policy We kind of skipped ahead a little bit. They're there originally the network policy working group People had this idea of let's create network policy v2 that will solve all of the problems And it turns out that we had too many problems and we had to like scale it down And that's that's where admin network policy came from so and I'm network policy solves these Administrator use cases, but we still have all these other problems with network policy the implicit deny is still hard to work with There are these weird syntactic quirks I don't know if any of you have ever run into this but like adding or removing a single hyphen from a network policy can completely change the meaning To allow this and this to allow or allow packets that match this and this to allow packets that match this or this Which you know, you don't want to mess those up and network policy makes it very easy to There's this problem that when you have probes on a pod for like liveness or readiness Cubelet needs to be able to send an HTTP request or whatever to the pod and when we first added network policy We were like, oh, we don't want to break everybody's probes So we just added this rule that says okay, Kubelet can talk to any pod and that generally ends up meaning that every host network process can talk to any pod and People don't always want that and but we can't change it now because that would break everybody's probe So, you know a new version of network policy maybe would let us do something about that We've been talking about all these extensions that we're making to the original admin network policy plan It's been hard for us to extend the original network policy because the way it's it's designed and written makes it so that When you add new things old implementations might not be able to see them So for instance original network policy let you say only allow connections to port 80 But you couldn't say allow connections from ports 80 to 90 and people wanted to do that So people created a cap to add a port range feature to network policies But it turns out that if you just replace the port 80 with port range 80 to 90 then an old Implementation would look at that and say oh, it doesn't say anything about ports That means that the policy allows connections to all ports, which isn't what you wanted So we had to come up with this hack where you say port 80 and port 90 Which gets the message across but it's it's weird and it took us a while to come up with something and convince ourselves That yeah, this is really gonna work So we're trying with admin network policy to not fall into that hole again But we still again have that problem with network policy so People say network policy v2 because of the way that versioning works in kubernetes We can't actually make Networking dot case dot IO slash v2 network policy and solve all of our problems because it would still have to be compatible with the v1 Network policy, so you'd still have to have the implicit deny and all that stuff But we came up with better network policy for administrators with admin network policy So could we come up with somehow for better network policy for developers? So this is an idea that we've just started thinking about But the idea is we could replace network policy with developer network policy Which is a better implementation of the same ideas that had gone into network policy originally We wouldn't let administrator focus use cases sneak in which happens repeatedly in network policy So we could keep it simpler and you know, we have years of experience now We know what mistakes we made we could just not make them And ensure that we end up with a developer network policy that that fits between admin network policy and baseline admin network policy to Allow all the use cases that everybody needs and then the idea would be that the combination of admin network policy and developer network policy would would be essentially network policy v2 And you know We would have to make sure that people could still use old network policies because there are a ton of them around But we would you know define interoperability and if you wanted the cool new features, you would just have to use the the new objects And I think oh Is that you Andrew sure? So just to close it out. I know a lot of Sigs say we need help we need help we need help, but I've been there I understand how scary it is to get involved So I tried to take a step back and just like start from ground zero, right? We want everyone and anyone to get involved whether you're a no-code contributor or a code contributor a complete beginner Heck we need a new logo if someone wants to design some art like that would be amazing I can't do it. Maybe some of my other maintainers can So level one you're completely new Go ahead and check out some of the great Kubernetes contribution guides. They have really awesome information for level two. You're still kind of unsure of where to hop in You can hop on to our website at network policy api sigs.kates.io Read through our docs. Please find more documentation bugs. We are not docs expert So it's mainly been us writing all the documentation so far and we are developers. I cannot spell It's a problem. So check that out check out all our issues Of course and then level three if you're sitting in the audience You feel pretty confident with network policy and we've missed something if you have not seen us talk about it today Please please please open an NPEP. We are making our way towards beta And the window will be closing in the next month or two to kind of add new features to the running list To get into beta. So please be thinking about that So yeah, that's kind of how you can get involved. There's slide is attached to our link and a little as well as all of these links I think that's all we have for today. If you could please leave some feedback For how we did that'd be great and we'd love to take some questions. Thank you so much Oh, yeah, and there's a microphone right there. Yep Hey, I'm Keith Maddox. I'm a SEO maintainer. I love the what I've seen today about developer network policy very exciting I wanted to ask is there any openness for Something like service accounts to be able to fit into developer network policy Open an NPEP I would say Will do But that is definitely something that has come up. Um, and we just sort of haven't What I said, yeah, we would appreciate all the help. Yeah Thank you I think this was really awesome. Um, really like to see where this is going one thing that kind of jumps out to my mind, though with these out of tree Opportunities is that it kind of makes me think are all of like After we potentially build up Momentum around all of these out of tree things does that then open the door for a conversation about what kubernetes v2 looks like? And those out of tree things become baseline in kubernetes version 2.0 that I've not seen any, you know whispering about but you know more maintainers might might know it just that doesn't come to my mind There are no specific plans for kubernetes v2 But this is something that people were talking about at the contributor summit is you know What does it mean that we're doing all this stuff out of tree? And and you know, where are we going api wise and evolution? So yeah conversations are happening and everything that is crd right like pod And and I think it's important to think about but like I feel like we're at day one as you know today Many cni's have their own apis It's so easy as a user to get Unbelievably confused with policy related apis because they all have network policy in their So this is day one like let's come come together as community Get it all in one place and then move forward from there. Yeah Awesome. Thank you and having it out of tree helps us. I trade faster as a group We're a small group helps us move faster than the entry stuff like we were mentioning. Yeah. Yeah, that makes sense. Thank you Sweet I think that's it. Thanks so much everybody. Thank you