 Welcome to the SIG Network deep dive! This has been a collaborative set of slides from the members of the Kubernetes SIG network community, including myself, friends, Jay, Ricardo, Rob, and Manuel. So what is the area that is covered by SIG network? Well, it touches on all of the networking aspects of the Kubernetes ecosystem. This includes pod networking within and between nodes, including interfaces such as CNI and IPAM, cluster networking in and out of the cluster network, service abstractions such as load balancing of L4, L7, and service discovery so using systems such as DNS. Network policies, i.e. security and access control, basically how you secure your pods and workloads. And of course, the APIs associated with these functions and these APIs include pod, node, endpoint, endpoint slice, service ingress, gateway, and network policy. For those of you who are interested, we have a well attended Zoom meeting every other Thursday and a busy Slack channel. And don't worry, we will put this information back up at the end of this presentation. As we only have a 35 minute slot, and there are many excellent intro videos, here are some previous intros and deep dives, the links shown here. We would like to take our 20 minutes to give an update about exciting things that have been happening in the SIG, things that have landed, things that the SIG is actively working on, and future directions. Okay, so what has been happening in the SIG? There are many smaller improvements, features that people have requested. And APIs have gone GA that have been covered in detail before, such as endpoint slice. And also, there are a couple of big items, dual stack, which is support for IPv4 and IPv6, gateway API, which is the next iteration of L4, L7 service APIs, rethinking how to support network topology and network routing. And finally, network policy working group, which has already landed some features, and are looking as well at a potential version two of the API. Let's talk about some of the improvements to Kubernetes networking that have landed. First, we have loosened the restriction that a service type load balancer can only support a single protocol. This is the multi protocol services cap. And the link is there in the slide. Basically, before this improvement, you could only have a single protocol service as part of your load balancer definition. With the new changes, you can actually have multiple protocols supported as part of your load balancer service. For example, in this case, you see a mixed protocol service that actually uses both UDP and TCP as part of the load balancer service. It is the case that since this is new, support actually depends on your particular cloud provider. So please check the port status on your service object. And if your cloud provider currently does not support having multiple protocols as a load balancer service, you will see this error, load balancer mix protocol not supported. The next improvement that has been added to Kubernetes networking is a way to disable different types of LB ports. We have had feedback from users that for certain cloud providers, a load balancer service actually does not need a node port in order to function. And the need for node ports has actually caused issues where node ports have been allocated and have been completely used, but actually they're not really part of the functional load balancer. This restricts the number of load balancer services that a user can use with disable LB ports, we can set a flag on allocate load balancer node ports. And if this is set to false, then creation of a load balancer service actually does not also allocate a node port load balancer and pod lifecycle has also seen improvements in this past year. It has always been the case that for service with external traffic policy equals local, when the pods in the service on a specific node have been terminated, there may be a period in time when the external LB health check has not been updated, which causes traffic to be black hole during this time interval, the pod may still be gracefully shutting down, but will not receive traffic. We can see this illustrated in the diagram below where the pods are terminated, the yellow lightning bolt, but in that interim time between the termination and the health check, there is potential for a black hole. By tracking terminating endpoints in a separate state in an endpoint slice, we're able to model pods in the graceful shut down state. While in this state, the traffic steering behavior of external traffic policy local cube proxy is somewhat changed. First, traffic will be sent to ready and terminating pods. If no ready pods exist, then the traffic will be sent to not ready and terminating pods. In essence, we try to deliver traffic as much as possible to endpoints that exist as opposed to black holeing the traffic prematurely. And now Manuel will tell us about endpoint slice. Another big addition the SIG network has been working on is graduating endpoint slices to general availability in Kubernetes 121. Endpoint slices were introduced as a beta feature to Kubernetes in 117, and they offer a more simple way to track network endpoints within Kubernetes at scale. Previously, the endpoint object hold all endpoints of the pods correlating to the given service object and therefore could grow quite big in case of lots of pods. With endpoint slices, those gigantic endpoint resources would be split into more granular endpoint slices, where the endpoint slices only hold a subpart of addresses for the pods. If no a pod changes its endpoint, only this endpoint slices needs to update as well, resulting in way less consumption of for example memory and network. Big endpoint object previously caused issues in different Kubernetes parts, such as the API or ETCD, as it was just grown so big and therefore scaling was difficult. With this new feature, this becomes finally easy again. Endpoint slices support IPv4, IPv6 and FQDNs. Finally, there is an important security related update around the service external IPs feature. Use of the service external IPs feature allows users to specify an IP directly as a way to integrate with manually configured external load balancers. Unfortunately, it is impossible for Kubernetes to determine whether or not the IP specified actually belongs to the user or is an internet IP that they might be spoofing. This can cause unwanted traffic capture in the cluster. As external IPs are a GA feature that is in use, the recommendation from SIG network is to disable this feature by default using an admission controller. Those who continue to use the feature should think about how they will defend against this attack vector. Let's now cover some of the big changes that have been going on in SIG network. First, exciting news is that dual stack support is progressing to beta. So what is dual stack support? Dual stack allows all Kubernetes networking components, that is pods, services and nodes, to have both IPv4 and IPv6 addresses at the same time. Also as part of the dual stack effort is the ability for clusters to migrate from single stack support to being dual stack enabled. As you can imagine with all seamless migrations, there needs to be careful design to make the smooth for the user. We expect that services in a dual stack cluster will operate in single stack or both. Services from a migrated cluster will have previous semantics preserved. Details on the dual stack implementation. It basically requires Kubernetes 120 or higher and the implementation behavior has changed from previous alpha semantics. You will need to enable dual stack using IPv6 dual stack equals true feature gate. IPv4 IPv6 ranges will of course need to be specified at cluster creation time. Note that when converting from a single stack to a dual stack cluster, all of the existing services will automatically inherit single stack with IPv4. This preserves the semantics of the current services. To enable IPv6 on an existing service, you will need to recreate your service in dual stack mode. This slide shows a summary of the changes you will see in the Kubernetes API. Services will have a new field IP family policy. This allows you to determine what kind of dual stack service you want, independent of how the cluster is configured. Single stack means that you are explicitly requesting a particular stack. And also you have to set IP families to choose which stack is to be used. Prefer means that you would like to be dual stack prefer means that you would like to be dual stack if it is available. And the system will choose for you require means that dual stack must be enabled. Along with the policy is a list of preferences. Order does matter when looking at IP families. The other resources such as pod and node have their IP related fields pluralized in a straightforward manner. Gateway API is another major effort in the SIG. The goal of the Gateway API is to create a modern set of APIs for deploying L4 and L7 routing in Kubernetes. The design aims to be generic, expressive, extensible and role oriented. We have seen the trend for Kubernetes user personas to shift from empowered single developers towards a more role oriented model where many different teams may collaborate together to deploy an application. Some common roles that we are thinking about include the infrastructure provider who manages the underlying infrastructure of the cluster, cluster operators who deal with cluster global configuration and application developers who deploy their apps and workloads. The Gateway API has three main goals at the high level. One, better model personas and roles like we described previously. Support modern load balancing features with predictability, maybe not perfect portability but at least predictable portability. Three, create standard mechanisms for extension, growth, implementation and vendor specific behaviors. You will see that to accomplish these things we are focusing on creating a scalable resource model that can work well with RBAC, creating a notion of levels of support and conformance akin to conformance profiles in Kubernetes and creating a flexible resource model that allows for some degree of polymorphism when dealing with resource relationships. This slide shows a sketch of how the API objects relate in the Gateway API. The infrastructure provider will deal with creating Gateway class resources that determine what kind of gateways will be possible to provision in a given cluster. The cluster operator and or application developer will create Gateway objects that represent specific instances of load balancers and the ways that users can access the services. Finally, application developers will write routes that model their applications. Note that routes are not limited to HTTP and L7 but can also be L4. In fact the type of the route object is protocol specific and this allows for a lot of room for extension to custom protocol types. So we're really excited to announce that the v1 alpha 1 version has been cut and now there are at least six different implementations in the works. Please visit the website for more details. v1 alpha 1 includes features such as L4 and L7 load balancing, advanced TLS configuration beyond what is currently possible with ingress, traffic splitting, traffic mirroring, header-based routing and modification and a lot more is in progress. Support for awareness of network topology is another area where the SIG is focusing its efforts. In Kubernetes 121 we deprecated the old alpha topology implementation. After a lot of discussion it turns out that the approach taken was likely too inflexible and much too prescriptive for the long-term evolution of Kubernetes. We have now instead moved towards a simpler approach. First we handled the special case of a node local daemon set as a special case rather than bundling it with the generic topology notion and this is internal traffic policy. Second for topology aware network routing we use endpoint slice and a controller to manage the algorithm. This makes the policy more decoupled in the system. For the alpha we have a controller that allocates endpoints proportionally with hints that kube proxy can consume. Each service can now opt into this algorithm using the annotation service.kubernetes.io slash topology aware hints. This slide gives an example of what happens when using the topology hints. Given three zones with an equivalent amount of node capacity pod endpoints should be distributed evenly among them. In this example zone A has one more endpoint than it needs and so that extra endpoint is then allocated to zone C. The original behavior of kube proxy without topology aware routing is to only use the hint for all zones which is described in this original section of the diagram whereas the smarter zone aware kube proxy will use the zonal hints and distribute traffic accordingly. Network policy is another area that the SIG is working on. One of the most voted requests has been of a port range API similar to what many CNI providers already have. To this effect we've added this to the Kubernetes API now. You can test it out by adding an end port field to your network policy definition. Also we have worked with the API machinery team to put out default labels on all namespaces. This is a common thing that we have heard which is how to select a namespace when I don't have the RBAC permissions to set labels. Thanks to our friends in API machinery for collaborating on this. The SIG has been collaborating on cluster network policy implementations. This will allow administrators to help define defaults that apply across multiple namespaces. This work is very early days but the KEP is now available and we would love for anyone in the security or administrative space to leave some feedback. Also we've worked with SIG testing to make sure that the network policy API is exercised on all relevant CNIs that support network policies. These are the early days for the KEP to define cluster network policies. Once this KEP merges different CNIs with up to now bespoke network policy APIs will be able to converge on a coming upstream. Cluster network policy covers the use cases of being able to enable cluster admins the ability to enforce secure by default policies on cluster tenants. And hopefully the design is a happy medium between needed functionality without adding too much complexity and expressiveness. The network policy group has also started discussing what would it take to create a V2 API which covers some issues such as no more defaults and then extensibility to other selectors. One interesting thing that has been done in network policy is that of conformance. We now have a suite of table tests for network policies that now can be used to test CNI providers as well as learn about how different network policies affect pods in a cluster. Other work in the community is evolving to generate perhaps hundreds of different network policies automatically. We have also begun testing network policies on Windows nodes as well and have confirmed that our existing tests are Windows compatible. I would like to finally mention some hot off the presses efforts in the SIG. The SIG has started to take a closer look at Qproxy maintenance. Qproxy is a complicated component and has many different implementations and backends. Some of the principles that we are looking at is how do we make Qproxy better for users which includes eliminating corner cases and bugs. Looking at different ways of factoring the component and having a new architecture which we call Qproxy next gen to speed Qproxy improvements. And finally as part of this new architecture make Qproxy easier to maintain so have an organized maintenance of the component and also to look at the Qproxy next generation architecture sort of as a natural roadmap. And that concludes our SIG network deep dive update. I hope you learned about what has been going on in the SIG in the past year and have some understanding of where the SIG is going in the future. Thank you. Now it's time for questions.