 Hello everybody, and thank you for joining the SigWindows main chain of track talk for KubeCon and CloudNativeCon EU 2021. Today we're going to give a talk on Windows containers in Kubernetes and do a deep dive into Windows networking. So first we'll introduce ourselves. Hello, I'm Mark Rossetti. I am the SigWindows co-chair, and I'm a software engineer at Microsoft. I work on Azure, and there's some contact information for me on GitHub. Next is Kahlia. Hi everyone, I'm Kahlia Subramanian, and I'm a software engineer at Microsoft. I'm currently a member of the Azure Container upstream team with Mark, but I was formerly on Windows Container Networking, and my past Windows projects have included Windows Overlay Networking, Flannel, and QBatman. You can find me on GitHub at ksuperman. Hi everyone, my name is David Schott. I'm a PM at the Windows Core Networking team at Microsoft. I mainly work on Container Networking and Software Defined Networking. Next up is Jay. Hey, I'm Jay. I'm at VMware. I'm the Tech Lead for Windows on K8s. I do a lot of work in SIG network as well, some things like network policies and stuff like that. I'm JayUnit100 on Twitter. I hang out with Mark and James sometimes too. Here's an overview of the agenda. We'll just leave this up for a second and then get started. First of all, we get asked a lot of questions about what actually Windows supporting Kubernetes entail. I'll briefly go over that here. Windows workloads have been stable in Kubernetes since the 114 release. This is currently only agent nodes are supported, so your clusters will need to have Linux master nodes and potentially Linux agents nodes to run some Linux-only add-ons and improvements. But we support all of the releases of Windows Server since Windows Server 2019 and continue to add support for the new versions as they come out. There's a lot of information at the link provided here for how to set up Windows nodes, how to make sure that your container workloads land on the right nodes, which is very important for these scenarios, and also a lot of just an overview of what some of the differences and nuances and limitations are for Windows workloads in Kubernetes, as well as information on how to actually configure the nodes and get them joined to the cluster. This is a great starting point for anybody who's interested in getting Windows nodes added to their clusters for the first time. I'm going to go pretty quickly here so that we can get into our network deep dive, which is going to be the meat of this talk as we get a lot of requests for. David, do you want to take it over? Yeah, sure. Thanks, Mike. Many are familiar with container networking on Linux and all the IP tables magic that happens under the hood was covered in past presentations. However, Windows still remains a mystery. Let's take a look at some of the Windows networking constructs that are relevant to Kubernetes networking. First and foremost, let's take a look at HNS which stands for host networking service. This is a public facing API that ships inbox of Windows and is used to network virtual devices. It is also used to develop projects part of Kubernetes such as Qproxy and CNI plugins. The first step of transforming a typical Windows Server box into a Kubernetes node is to create a virtual switch. This provides L2 switching capabilities and also enables L3-based network communication for virtual devices that are attached to it. A pre-configured virtual switch ready for container networking is automatically generated by creating a HNS network. Network compartments are a quasi-equivalent of network namespaces on Linux. A network compartment on Windows receives its own port pool and all network interfaces and IP addresses are unique to the given compartment. Each Kubernetes pod receives its own unique network compartment. Endpoints are a quasi-obstraction for a VNIC, essentially. They contain information about IP addresses, gateway, TNS server, and such things. Container VNICs are bound to a corresponding port in the V-switch in order to enable networking. The final missing piece here is HNS policies. These rely on VFP and VFP stands for Virtual Filtering Platform. This is essentially a virtual switch extension that allows us to define rules to process incoming and outgoing network traffic and apply actions to them. The HNS policies resource is used to program VFP and plumb rules that can apply load balancing, network address translation, and much, much more to network traffic. Without further ado, Jay will take it from here and cover the networking landscape on Windows and new enhancements that came there. Yeah. CNI ecosystem is rapidly evolving for Windows, and a lot of folks are really stepping up and increasing their Windows support. We've got news on the entry up front where we support both container D and Docker at this point. And so we're really happy about that. Shout out to Rui Kwao for a lot of that work and Perry. And of course, Calico as well. Now on AKS, you can preview Calico with network policy support for Windows nodes. And actually it's for, you can also do this on C2 as well if you manually install Calico. And there's several other CNI, there's Win CNI with Winbridge overlay, and then there's Azure CNI as well, and you can overlay different policy providers on that. And so as far as network policy goes, for Andrea, you have network policy on Windows for both container and Docker, and for Calico, it's supporting Docker network policies. And below you can see a screenshot. We now have updated the upstream Kubernetes network policy tests. We have a PR and a flight, and we have a demo of this on the recent TGIK episode. If you're interested in watching where we have truth tables that actually validate the network policies are working correctly, you can see here there's a screenshot from a running Calico cluster where traffic from namespaces Y and C is disallowed but traffic from namespace X is allowed. So a lot of progress from all the CNI providers on that front as well. Great. All right, so let's next take a closer look at the topic of CNI plugins on Windows. CNI plugins on Windows are invoked the same way they are on Linux with the container runtime calling a CNI plugin. It essentially just reads on the CNI configuration file and any additional CNI args such as what commands to execute. So with the announced future deprecation of Docker Shim, the Windows community is actively moving to container D. Let's take a minute to explain some of the differences with respect to runtime and CNI interaction. So the original Docker Shim workflow on Windows has a drawback in the sense that there are repeated consecutive CNI calls that need to be made for each workload container in a pod. One possible flaw is that in step nine displayed here, there's DNS information that needs to be set in registry keys. In theory, this can even become a problem because this information gets populated when the container is already running. There is no guaranteed DNS readiness on pod startup if step nine were to take longer than expected to finalize. Could be some race conditions with this workflow. Next slide. So for completeness here's also showcasing how there are multiple consecutive CNI calls for pod tear down on the Docker Shim workflow. And yeah, this previous problem was overcome by introducing a new HNS entity called a namespace. So the namespace contains a reference to the network compartment unique to a pod, the HNS endpoint and containers that will be placed there. This enables us to update the workflow so that there's one CNI call at call for pod creation. Some sample plugins that are enlightened to use this workflow can be found at aka.ms slash win CNI. So next up is Kulia. She will be talking about Qproxy and load balancing improvements. Thanks, David. So for the past few releases, we have been really focusing on making load balancing configurable for Windows container networking. So we've introduced direct server return, which is known as DSR. And that's a new load balancing mode and the previous default was non-DSR. So DSR is a Qproxy feature because that is the Kubernetes component that's responsible for programming the HNS policies which handle load balancing. And then the non-DSR flow, the packet is routed to the host container port for load balancing resolution. And then the packet leaves the host with the host IP as the source IP. In DSR, the backend pod IP is selected in the originating container port. So the packet leaves the host with the originating pod IP as the source IP. DSR ensures lower latency and greater scalability because we don't have to make that extra hop from the originating container port to the host container port. And this mode is supported in Kubernetes 120 with Windows Server 2019 or higher. So to enable and set this feature, you have to set the following flags in Qproxy. You have to set the when DSR equals true feature gate since DSR is an alpha feature. And then you also have to enable DSR equals true because non-DSR is still the default mode when Qproxy programs the HNS policies. So these diagrams explain the flow of the packet for non-DSR and DSR for L2 bridge. The flow is a little different in overlay but the underlying concept remains the same. In non-DSR, the packet leaves the container port and it's end capped so that it's destined to the host VNIC. And then when it's in the host VNIC, BFP is responsible for selecting the backend IP and then the packet is hair pinned and sent out of the host. With DSR, when the packet leaves the container port, the VP is responsible for writing the backend IP as the destination and then the packet leaves the host immediately without going to the host VNIC. We also have some additional DSR features such as preserve destination. So this skips DNAT of the service traffic. So the virtual IP or the service IP is still preserved instead of being written with the backend IP of the pod. And so this feature is supported in Kubernetes 120 with Windows Server version 19.03 or higher. And so this is a DSR feature that means the client IP is also preserved. And to use this feature, you just set preserve destination true and the service annotations and you have to make sure you enable the appropriate DSR flags and Q proxy as well. We've additionally configured external traffic policy local. So this disables the routing mesh meaning that the traffic remains local to that particular node. And so this also preserves client IP via DSR and it's supported in Kubernetes 120 with Windows Server version 19.03 or higher. And to set this feature, you just set service spec external traffic policy to local. So the advantages of DSR is that it greatly reduces the resource footprint and consumption of a femoral port. It reduces the network latency and it has a better data path performance again because we don't have that extra hop to the host VNIC. And then this greatly simplifies the network traffic flows. So if you were to actually look at the flow of the packet, I think it's a little more intuitive with what we would expect when you're in DSR mode. So this is also recommended for Kubernetes network policies because otherwise it'll only work for pod to pod traffic because with DSR we are preserving that client IP and it's required for advanced network features such as client IP preservation, which is DSR and then destination IP preservation which is the preserved it feature. Now the drawbacks to this is that it is still an alpha feature. So we're still in testing but this means that we would greatly appreciate any feedback that users have when they're testing this feature out. And you also need to configure the loopback policy in the CNI configuration. So this requires a little bit more planning when you're setting up your Windows nodes. Additionally, if preserved it is enabled the routing mesh is disabled for node port and this is for health probes to work properly but this can cause load and balance problems. So it's something to be aware of. We also have a new feature session affinity Betsy worked on this, the Windows Container Networking team at Microsoft. And so this feature ensures that connections from a particular client are passed the same pod each time. This is supported in Kubernetes 119 and Windows Server vNex Insider Preview Build 19551 or higher and to enable this just set service back session affinity to client IP. Another really big achievement that we've made is IPv4 IPv6 dual stack. So this supports native IPv4 to IPv4 in parallel with IPv6 to IPv6 communications to from and within the cluster. So this parallels the Linux work that has been done in this space as well. And this is supported in Kubernetes 120 in Windows Server version 2004 and higher. It is compatible with L2 bridge but it's not supported for overlay and VX LAN so keep that in mind. And then see the Windows IPv4 IPv6 dual stack documentation for more info on how to set that up. So with all those networking scenarios sometimes it might be necessary to troubleshoot. Fortunately we do have a set of tools in GitHub. So we have a validation script. So this is a PowerShell script that kind of looks and inspects your cluster by running a test scenario with Windows web server and it ensures that all of the HNS policies are being configured properly. So if that script fails then that can kind of give you a clue into any cluster misconfiguration problems. And you should also ensure that all of your components are running. So that means checking Kublik, Uproxy and CNI logs. And then also just checking those logs to see if there are any immediately noticeable errors. And then you can also analyze the HNS networking state more easily using the collect logs PS1 script. So this way you don't have to manually check it in PowerShell yourself. You can just get the state of HNS and then have that outputted to a text file. And we also have a great Kubernetes network troubleshooting guide that David helped put together. So you can find the link there. And if all else fails, you can always post in the SIG Windows Slack with your logs and any findings that you may have. And then one of us will help you out. Thanks everybody. I think that's a great overview of kind of networking. And I'd also quickly say maybe you could even post first in SIG Windows if you want to bother us. We're always happy to see people coming and hang out. That is a good point. Thank you, Jay. All right, now we'll talk a little bit about kind of the future plans and what's next for Windows workloads and Kubernetes. The big kind of feature enhancement that we've been working towards are host process containers. Host process containers are roughly equivalent to privileged containers on Linux, but due to kind of the nature of how Windows and Windows containers work, there are some differences. Kind of the TLDR for that is that your host process containers are built, packaged and published just like normal containers. But when they're executed, they actually run as processes on the host, not in a separate Windows server silo. So, and because of that, they will have full access to the host. This will enable almost all of the different scenarios that people are commonly asking for privileged container support on Windows for anything from things like being able to run the Prometheus node exporter, which we actually have a screenshot of that running in a container here, to being able to log in, perform kind of maintenance operations on the node or get logs or even just help to bring in components like your CNI binaries via a pod and a demon set to make it easier to manage. I think this is really gonna enable a lot of different scenarios for Windows. Right now, we're targeting an Alpha release in V122. It should be coming out in the next couple of months. And there's a big enhancement proposal there. Please read it, comment, share your thoughts and anything else. Just one note that this will only be supported with container D as that is kind of the future of the CRI state and everything moving forward. Next, I'm gonna hand it back over to Jay to talk about how you can get involved. Yeah, so Windows is a lot of support here for folks who wanna get involved with Windows. Our tooling is getting better. It's obviously not in the same state as the Linux stuff, but it's getting better every day and we've got people contributing and getting involved to make that better. There's a SIG Windows community page. You can jump in on there. We have a community meeting like every other SIG 1230ST every Tuesday, but the thing there is that in addition to that, we've got a thing we call like SIG Windows pairing or SIG Windows after dark or whatever it comes. Right after that 1230 meeting is over, we transition into just like hacking around on stuff and people talking about the problems they've having and going through random issues and trying to build stuff. And so it's a really cool way to just get in from that kind of visceral ground level. You don't need any training or anything. And so just come hang out with us. If you don't know anything about Windows or Kubernetes, that's fine. We have plenty of work to do for all levels. And yeah, we've got help us make user stories, reporting bugs is super valuable because it helps us see what tech people are using and what we need to support better. There's a project board and big shout out to Claudio Balu. He is our unsung hero. He done so much stuff with images, testing, making sure we've got the right tooling in place so that our E2E's work properly and so many other things he's worked on, low level problems that are very hard to figure out and stuff. Perry as well, a lot of deep, deep technology he's built out with Rui and other people around the container D side, around the CNI side and it's been working with both actually the Calico as well as the entry folks on that. Microsoft Container Networking Team, there's so many people there. I don't even know all their names. Maybe Kayla does, but they have been working with Perry. They've been working with all of us to answer questions and meeting with us and stuff. There's a very, very vibrant group of people that'll just talk to you anytime if you just come on Slack and start hanging out. So yeah. And I wanted to reiterate what Jay had said for that SIG Windows pairing meeting. It's come if you are familiar with Windows but not familiar with Kubernetes and want to learn more with that. Come if you are familiar with Kubernetes and Linux but not with Windows and want to see how things work on Windows. Come to just hang out and see what's going on. Really encourage everybody to join that as well. So yeah, go ahead. Oh, is this me? Okay, yeah. So Mark is the co-chair. Me and James are the tech leads. We've got a YouTube playlist. I also have, yeah, we have a YouTube playlist. We've got the SIG Windows Slack channel. That's really the ingress point for figuring out what to do. Come join us in SIG Windows upstream Slack. There is a Google group also for more long form formal questions. But yeah, file an issue, get on Slack, come hang out with us. That's how we do things. Very, very informal and very active community. We're gonna jump in on anything. Thank you, Jay. Thank you everybody for talking. We do have a couple of appendix slides we wanted to include in this talk for future reference. If anybody is interested, please download the slides. I believe there's a link to the slides on the skedge.com page. There's just more information on how kind of all of the different network scenarios work, what's all supported, what versions of Windows they're supported on and all that good info here. I'm gonna just page through these really quickly. And that concludes our talk. Thank you everybody for joining. We're going to open it up for Q&A now. And thank you, David, Kaya, Jay, especially for helping give this presentation.