 Welcome. Can you hear me? Great. So today we're going to talk about Windows containers and host process containers for configuration and beyond. And so before we get too deep into the details of how Windows containers, host process containers work, we're going to show you a quick demo. Get right into it and show you how this works. Who here has used Windows containers before? All right, and who has had to debug and get onto the node, like SSH or RDP onto the node? Okay, so only one or two folks, but you know how painful that is. And so for everybody else, SSH, SSH into the node usually requires doing a proxy jump or already like setting up a VM inside the network so you can RDP. Then you need to know the password to that VM and it's difficult, challenging. And so what I'm going to show you here today is we're going to use a kubectl plugin to connect to the node and get direct access to the node. No SSH, no passwords, none of that. And so we're going to get just easy access. And so in the demo, things to look for are how fast this container boots up and the tools we get to use such as Vim and other things. So I'm going to start the demo here. First thing we're going to do is take a quick look at the nodes. So you're going to see a Linux node and a Windows node. And then we're going to install the plugin. So we're going to use crew to do that. Crew install Windows Debug. Crew, if you're not familiar with crew, it installs a plugin so you can start to use that tool directly with kubectl. Next, I'm going to run kubectl Windows Debug and it's so fast that I can barely say it. We just got connected to the Windows node and we have access to the root file system there. And we can run programs on the application. We can see all the logs and there was no SSH or RDP. I've also got access to Vim, which is typically not something you install on the machines, but now I can edit the config files, kind of poke around, do some searching with inside the error logs or anything else there. And so this is all hosted inside this HPC folder here. And inside there you're going to see we have Vim, but we also have some networking scripts that we ship out of the box so that you can collect traces and packets and all sorts of other things there. And the best part about that is it all goes away when the container gets killed. So that's a quick demo. Hopefully that gets you excited. I'm James Cervantes. I'm a software engineer at Microsoft. I've been working with Windows containers for four or five years now. I'm a tech lead for SIG Windows and a maintainer at Cluster API for Azure, mostly doing Windows work there. And I know how to make fire six different ways using only sticks and stones, and that's related to my Twitter handle, Aspenwilder, so you can ask me about that later. Mark? Yeah, hi. I'm Mark Rossetti. I am also a software engineer at Microsoft. I am the co-chair of SIG Windows and I kind of pop in all around different parts of core Kubernetes to make sure things work with Windows. I am not generally on Twitter. Okay, so here's a little bit about what we're going to talk about today. First, we're going to do an overview of what host process containers are. That was the new kind of feature that powered everything in that demo. Then we're going to go deep into one or two deployments to show you guys kind of how it works, and then we're going to probably fill out most of the rest of the time with some demos and then have some additional resources and some questions and answers. So first off, what are host process containers? Host process containers are conceptually equivalent to privileged containers on Linux for anybody who's familiar with that. Really, just as a way where you can package, distribute, and deploy your workloads as containers to Windows nodes, these containers run as a process directly on the host, hence the name host process containers, which gives you almost full access to the host's file system, network stack, process space, event viewers, all of that. I say almost because we'll demo some security considerations for all of these in the future. The other great thing about this is we really designed this feature to be Kubernetes first. So they deploy these containers, but they run just as a normal pod, so you get the benefits of all the different constructs that you're used to, like volume mounts, resource limits, everything that's managed there or listed there. So what are some of the motivations for this? Really, I think as James hit earlier, managing and provisioning Windows was extremely difficult. Also, there really wasn't a good standardized way to deploy many of the essential components that needed to run on the node. Your CNI solutions, Kube Proxy, before all of that, really it was just up to whoever was setting up the node to set that up. It often involved a lot of custom PowerShell to parse environments, relied on third-party service managers. Often NSSM was the big one. There really was poor upgrade story. You needed to orchestrate logging into the node and rolling your own updates each time. And also, yeah, just another downside was requiring access to the node. And then your workloads, once they were running with this, they were very difficult to monitor. You really didn't have any visibility into if they were running or not the error state. Again, you required access to the nodes to even get that information, and you had no easy way to get your logs. So with those process containers, you can deploy all of these workloads just as demon sets. So it's a familiar upgrade story, familiar deployment story, runs as a container, which everybody here is used to. You have rolling updates, all of that, and actually no access to the node is required as already demoed. And then once your workloads are deployed, you monitor them just like anything else. Just check to see if your pods are running. They'll get restarted if they're not. There's logs to monitor them, all of that. There's a little bit of information about the history of this feature. This feature went beta in Kubernetes-123. That's when it's generally on by default, so it called that out here. And it is going stable in this next release of Kubernetes. One thing to note, though, is this functionality is only supported if you're running container D as your container runtime. I believe most Windows users have migrated to that now, so it's not a big issue. So here, we're going to go into some deployments, James. Cool. So now that you know kind of what host process containers are, we wanted to show you how you'd actually use them. And so we're going to take a look at a big spec here. And don't worry, we're going to zoom in here. And the first thing we're going to take a look at is the pod spec and using the security context. This is the standard Windows security context that we're hooking into here. You can see that we'd say host process true. And then we specify who we want to run as. And this is powerful because we can use security policies to block people deploying these because it's built into the pod spec. We also specify host network equals true. We need to work for the CNI specifically. We need to work within the host network compartment to be able to program all the rules for the networking in there. And so we have access to that there as well. And Mark's working on a cap to enable that for regular containers as well. So if you're interested in that, you can talk to him afterwards. The next part I want to dive in on is the init containers. So Mark said we built this with Kubernetes ideas in mind. And so we can use init containers with these host process containers. And this enables us to be able to install things like CNI binaries. This is a very familiar pattern from the Linux side. And so I no longer need to go download these things and prep them on my Windows node initially. I can apply them at the time I'm applying my CNI. It enables things for upgrades and other things like that. And I can also pass in the configuration at this time. So I no longer need to know that I'm going to be running Overlay or SDN Bridge or any of those things. I can swap out the configuration using a config map at that time. Next up, we have multiple containers here. Calico needs a little bit of prep before Felix runs to be able to talk to the Kubernetes API. And so we're able to run more than one container inside this pod with that host process. And they communicate to each other via the file system in this case, but there's other ways they could do that. And the most exciting part here is that they have full access to the API server using the security, sorry, the service account that's applied to the pod. And so you can specify that they're able to access different components of the API and communicate with them without having to copy around cube configs and do other things, which is the way that we did that on the host node prior to this. Next up is we're using volume mounts. And so I kind of alluded to this before, but I can map in exactly where I want to copy those CNI binaries. And then I can also use the config map to configure the CNI in the way that I want to. So the next part that I wanted to call out here is cube proxy. So we're also enabled this for cube proxy. And we're using it basically in the same way. You'll see the host process security context at the top there. But we also are using the downward API to pass in information. Because this is built into Kubernetes, we no longer have to use our PowerShell scripts and do some kind of fancy thing to go figure out exactly what the pod IP address is. We get that information just given to us through the Kubernetes native constructs. And on top of that, we can use taints and tolerations with these. So we can say this is a critical component because it's the cube proxy. We need this running at all times. And then upgrading becomes super simple. We're using Kubernetes upgrade strategies here instead of having to build our own through some sort of PowerShell scripts or something along those lines. And so for those that may have noticed at the top when we were talking about who this container runs as, Mark is going to talk a little bit about how we can restrict the access of that container user. Yeah, so you'll notice in the previous demos we were running as a local system account that's generally not desirable in a lot of cases. So we do have some mechanisms so that people can configure what access these containers have on each one of the nodes. There's a demo here that I'll kind of narrate as it's going through. So this deployment has a PowerShell script. What was that? So he asked if that would be an equivalent of a run as as a service account. And yes, it is. So here you'll see here there's a PowerShell script that gets plopped in as a config map that runs on the nodes. The first thing we do is we use NetLocalUser to create a local user account. And then we create some folders and give different access to those. Here you'll see, well, we just missed, there's an init container that all it does is it runs that script and then there's two workload containers in here. One that runs as that NT authority system and one that runs where we set the run as username to that new local group name. And that's what's highlighted here. So this local group did not exist on the machine before this deployment started. So here we'll start this deployment and then we'll wait for the containers to run and then we'll poke around inside the containers. So this is what I call the admin container. Here you can see we're running as NT authority system and we'll just see if we have access to these different chairs. In this case, we have access to both of them. And here's the other share. So now we'll exec into the other container that was running as that local user group. I'll talk more about the output of this who am I in a minute. But here you'll see that we try and access that admin share and get an access denied. And then we try and access that other share and we have access to that share. So I'll talk about what's going on here. So now we realize that this is a pretty powerful feature and we want to limit access to the node but we're trying to figure out a way to do that that didn't require having people manage user accounts on the nodes either using weak passwords or potentially no passwords or other things. So what happens here is if you pass in the local group name as that run as user name, when these containers are started, we'll actually create a new user, a local user, add it to that group so it will inherit all of the security permissions from that group and then your container will run as that user and then that user will get cleaned up at the end. This is beneficial because, yeah, it makes it a lot easier to manage these user accounts who don't need these passwords and there's just a lot of benefits to this. Another nice thing with this is it really completely falls into the Windows security model. So you assign access or deny access to all of the different Windows resources just with native Windows constructs like these local security groups here. And as James mentioned here, we're relying on people using different policy engines to restrict who these containers can run as and what namespaces and all of that. I need to check. I don't... Well, let's... I'll bring some other people up to help answer that question. Yeah. One other thing that I wanted to call out here is you may have noticed that these containers start pretty fast for Windows containers. As we were developing this feature, we noticed, like, since it is a process just running directly on the host, you really don't need all of the stuff that comes in with these Windows container-based images. So we tried to make it so that you didn't need this and so we created a new base image here. This new base image can only be used with host process containers, but it is very small. It's about 25 kilobytes, which people are used to working with Windows. Other Windows containers, Nano servers on the order of 100 megabytes. One other benefit with this is the same image will work across all versions of Windows Server. So you can build a single container image and deploy it to your Windows Server 2019 nodes, your Windows Server 2022 nodes, and that too. One thing I will note is that you don't have to use BuildKit to build this image. There's a demo here, and for anybody who's interested, there's a GitHub repository that has the source code that's used to build the image and instructions on how to use this too. And if anybody is not familiar with using BuildKit to build images, it's pretty easy. So here's a quick demo. First, we'll have a Hello World file that just says Hello World and a Docker file. And then... So with BuildX, you need to make a BuildX builder to specify the target operating system or the platform to use. But then you can just use Docker BuildX build commands to build that. So in this case, building it, pushing it to a registry. And then here I've got a pod spec that's using that. And... then we'll run it. And this image was not pre-polled on this machine. Here too. Super fast. James? So now that we have an understanding of what host process containers are and how to configure them and operate with them, we wanted to show off a few tools that the community has already built with host process containers. The first one we're going to show is a tool that I developed that I solved the problem that I needed inside SigWindows. And so I'm one of the ones that maintain all the SigWindows tests. And we have occasional flakes that happen every 40th or 50th run. And we needed Windows traces, ETL traces to be able to collect and debug these things, because we identified that the bug was in the OS. Running a trace, these end-to-end tests take about an hour to or so to run. And running a trace for that long would result in a many gigabyte-sized file. Massive. And so I needed a way to be able to trigger when a specific test was running and then kick off the traces and then get them uploaded somewhere. So what I'm going to show you here is the tool that I built to do this. I've called it the trigger logger, which is not very creative here. And so the first thing we'll do is we're going to install the trigger logger. So it's just a YAML file. It is a host process container that's running in the background. And it has a config map. And that config map allows you to configure this tool. And the thing that we're going to look at is it has a trigger that runs on the name space creation. I know what that name space is that's going to be created in the test. And so every time that test kicks off, I'm going to run that. Next up, once that trigger triggers, I'm going to run WPR or Windows Performance Recorder. The Windows Performance Recorder is the thing that creates that ETL file. And then finally, at the end, once I destroy the name space, I'm going to upload that to the cloud storage that's available to me. So I've got this tool running. You'll see that it is using the in-cluster config that I mentioned before. So it's communicating to the API server. And it's using the server's account that I've wired that has access to list namespaces. It's listening to all the namespaces inside the cluster. And then the next step for me is going to... We're going to listen to the logs of the trigger logger and create that namespace. So on the left-hand side here, I'm just going to tail the logs. You'll see the same logs we just saw. And then now I'm going to create the namespace that it's listening for. In this case, I called it WPR CPU. It could be anything you want. And on the left, we'll see that immediately Windows Performance Recorder started running. It's built into all of Windows Operating Systems. And I'm collecting CPU, disk, file information. And then I'm going to delete that namespace. And once it's deleted, we'll see we're processing that event. And then we write the ETL file out. So once this ETL file processes, we then run off and start uploading it to Azure. So I'm going to cut this out. So I'm going to skip ahead a little bit here. Skip ahead. That's too bad. So I will upload this to YouTube later. But what happens next is it gets uploaded to Azure. And once it's on Azure, I'm able to pull that ETL file down, use Windows Performance Analyzer, and I can see the individual stack traces, how much CPU, how much memory is used on the node. And I'm able to identify exactly where we're having a performance problem or something else. And I've used this specifically to improve Cubelet from going from using about 3% steady CPU down to 1.5. And so it's a super powerful way to analyze and debug these things and fire them off on demand. So the next one, this one I won't skip through, is a networking demo. And so this one was built by our networking team and they need to be able to identify packets being dropped or manipulated throughout the VM. And so what they've developed is this component called WinContainerNetworkInspector and it's a daemon set that runs on the nodes and does some controlling of the various processes that are related to networking. And once that's installed I can connect to it using a command line tool called WinInspector and once I'm connected there, this tool will let me query various networking commands. And so I can do packet captures. I can do counters. I can query the HNS logs which is the networking configuration that's set up. And so here I've run that networking configuration and I can see all the networks that are created on a node. I can see all the load balancers that have been created and I can see all the mappings between those. So if something got misconfigured somewhere it's very easy to inspect and it's helpful in a large cluster if you have 100 different nodes and you're saying hey this node here is acting weird and I'm going to go inspect it. So the next step that I kicked off here is a packet trace. So I've got an IIS application running on that node. And with Kubernetes obviously most of the packet traces that I'm interested in are probably about a specific container. So when I so I can say I'm going to capture the packets for this particular pod within my entire cluster. Again, very powerful because I no longer need to capture all of the networking traffic across all of the stack happening on a particular node which could be very noisy. And so here I'm going to send the curl request and then on the right there's a lot of traffic coming in. If you know what you're looking here this is just telling you the packets going through the various components of the networking stack. This can be converted to a wire shark and so you can load this into wire shark and then analyze it from there. And we're using packetmon behind the scenes which is another tool that ships with Windows server in the latest releases and so I can filter down to individual components within the networking stack and just see those components. And the last thing I'm going to show here is sometimes you don't know what's going on and you know that maybe the packets are dropping but you're not sure how bad it is and so I can just quickly query against a node and it will give me the counters that are being seen on that node and so then from there I can narrow it down to something else that's happening. The networking team actually used this to identify some various bugs within some customers that we had experienced and this enabled them to identify an individual pod running in a 100 node cluster and say this is the component that is causing problems. Super powerful tools. These are all open source and the next part we're going to step into is just some additional resources. So since we have released host process containers for Kubernetes in 123 we've had a huge adoption of this across the ecosystem as you can see from all these projects. So that when Windows container inspect is in there Qproxy is in there Calico and I think one of the other really good ones is Windows exporter for Prometheus. So there's a ton out there, there's even more that we didn't get to list here and I think I want to call out if you are new to the Kubernetes ecosystem and you want to contribute this is a great way to contribute to Windows. There's a lot of examples out there that you can leverage and there's a lot of projects that either don't know this is available or don't have the Windows expertise and so you can go out there and make a pretty significant contribution that enables the ecosystem. So this feature and all of the things that we've demoed here were really developed in construction with SigWindows so we wanted to take a minute to highlight some other upcoming talks related to SigWindows. There's a Windows operational readiness talk tomorrow. There's a SigWindows maintainer track talk tomorrow and then there's interesting lessons learned from scheduling 20 million Windows containers that we'll talk on Friday. So if you're interested, please come support us. And before we get to the Q&A I just wanted to put up some resource slides here. Don't worry about taking this down just download the slides off-schedged but there's a lot more information about the CAP the discussions that went into eventually bringing this feature to fruition documentation on how to use it lots of examples, all of that. If anybody is interested in participating or getting more information here I encourage you to just go through the normal SigWindows communication channels the biggest one being the SigWindows Slack channel but there's also, we have a community guide on how to reach us and we have a community meeting every Tuesday at 12.30 PST so we'll hope to see more people there. Next we'll open it up to Q&A and at the same time I'll just leave this up if anybody wants to leave some feedback. One of the differences between container in Linux and host process container is that the container is running as a processing the host. I also know that there is no file system isolation so for cluster administrators that want to enable this feature what security recommendations do you have so that workloads cannot access, can't have the same access on the file system. It's a little bit of hard time hearing it because the speakers are facing all of you but I heard what kind of security considerations should administrators take when using these, is that correct? Particularly around the file system so since so much of this is controlled just through the normal Kubernetes constructs really the built-in security or security admission policies will help cover you a lot. I know if you're in the restricted policy you won't be able to schedule host that so we really recommend all the best practices about really not allowing host path volume mounts and that sort of thing and I think that should help get you kind of started but the rest is really just knowing what workloads are deploying honestly. Yeah I would say the part where Mark showed how to set up those user groups and restrict access to the file system from there you could create a user membership that didn't have access to most of the file system and then kind of opt in from there so that would probably be the best approach. Alright thank you. In line of that question are there any futures you can share about what you might do if you were to ever consider proper isolation on that level? I mean it's hard because you're giving root access away effectively so is there a future roadmap that you can share on that or? So I thought I heard is there a future for more isolated access to the host? Currently no. We're hoping that we just haven't had it planned but if there's some interesting use cases we'd like to hear it. Most of the use cases that we've highlighted required either pretty broad access to the host or we're allowing us to just restrict it with what we've demoed but if you have some use cases feel free to bring it to us. So the container's root of the file system is the node's root right so C drive is C drive is that correct? So there is some virtualization happening there so your container image will get mounted to a well-known location C call and HPC and then it will be there all of the C drive will get mapped in to that as well and then your volume mounts will show up just like they do in normal Windows Server containers so if you mount in bar log it'll show up in C call and bar log. Right but if in your example where you're using the init container to install something if I delete that pod it's still installed you installed it physically on the node and all I did was delete whatever the deployment of the container image was from the disk and then close the stop the process again. Correct all of those side effects are going to carry forward one-side pod stops. So just specifically in the CNI case there we copied the CNI binaries into the CNI location but the VIM that doesn't live on the host that was in the container and when the container went away it's going to be removed as well so it depends on where you put the binaries but if I go running MSIs or something like that I'm installing on the node. Yeah most likely and that was one of the kind of use cases we had envisioned for that too is like what if you want to apply your security patches in a staged update. Any other questions? One more? Probably the last one and then continue afterwards if needed make me run here. For that particular scenario is there going to be some sort of shadowing of what's mounted in the container so if my container defines a directory that is in the host this is going to be which one is going to be taken? So there's there's a cap and actually the PRs to do this have a lot of discussion around that too so I'll probably defer to that but the short answer is no we actually so if for whatever reason SQL and HPC folder exists on the node we clear it so that you won't get that the contention the same contention that you can get with shadowing but other than that it's there's not really much concern there. Right thanks. Last last one for real. Fabulous talk. Question for you I know that there are a lot of really exciting caps in flight around Windows stuff as well if somebody wants to jump in and start participating in some of the actual like the caps and other things like this that you're in the process of working on yeah this one is going GA that's great but what other ones are kind of a an exciting place people might get involved? Oh that's what our maintainer talk is covering tomorrow too so if you're interested in that too but as James mentioned host networks access to the host network compartment for regular Windows server containers is one and really following a whole bunch of the node the sig node caps to bring parity to Windows is that too. The one that we're working on right now is the cry API for stats which is a big change in the way that stats are listed from the node in Cubelet so that's another big one that you could get involved in. Also I mentioned I think James and I are going over to the Azure booth if people are interested and then there is the booth crawl at 6. We were just asking what time is that talk tomorrow that you're mentioning? 5.25pm. Wonderful. Thank you so much.