 Okay, welcome everyone. It's great to be here with all of you this week. It's pretty exciting to see where this technology's come. What we're here to do today is to talk to you about how to make the most of it, while also showing off some of the recent advancements that we've made and what we've got cooking for the future. So let's get started. So my name is Brydon. I'm a product manager at Microsoft working on the Windows Container Core technology. There's a good chance you've seen me in some of the SIG Windows meetings or GitHub issues or bugs or wherever else. So I've been around a lot in the Windows Kubernetes community. Right now my primary focus is working on the advancement of performance within Windows Containers and Windows Containers on Kubernetes. And here we've got Howard, who is one of our chief performance analysts. Yeah, my name is Howard Hao and I've been working with the Windows Container since day one. My passion is to look into the tough customer issues and working with them closely to understand the performance issues, reliability issues. I'm glad to be here and thank you guys for joining us. But we're really here as representatives of the amazing group of people both at Microsoft and across the Windows Containers in Kubernetes community who are building and innovating with this technology. We've got a pretty massive effort running around this product and it's with the primary purpose of empowering you to more effectively scale your business. I know that's the common Microsoft tagline, but it's true. What inspires us to do what we do every day are the amazing stories that we hear of what people can do with the technology that we create. So today we'll focus on some of the most common stories and struggles we see when it comes to container performance and how you can mitigate them and more effectively scale your business. So we'll kind of go into some of the background of why that is and then we'll go into some history and then do a demo and troubleshooting of how to solve performance with your containers. So let's enter a hypothetical scenario where I'm one of the many, many customers who are looking to bring their legacy applications into the modern era of infrastructure. Most often the generalized need can be summarized as I have a legacy application and I need to find a way to bring it into the modern cloud while reducing costs, all without requiring a massive rewrite of a technology that may have poor documentation or compatibility issues or considerable tech debt. We see a lot of companies just entering this journey and it can be an extremely daunting task. But the overall business need makes sense, especially as of late I'm more likely to shift my capital to operating expenses to better hedge against poor economic conditions and more effectively adapt to volatile demand. One of the most essential components of running a lean business is the minimization of resource waste and specifically ensuring that every single compute cycle contributes directly to the execution of business logic. So over provisioning, inefficiency and unnecessary redundancy are all major threats to this objective. And it becomes more expensive to maintain legacy infrastructure over time as the newer workforce generation starts to use and focus on newer technologies becomes harder to find people who can keep these systems running at their best output. And it also technology also advances over time. So businesses that utilize modern technology and modern infrastructure are more likely to be able to provide the same if not better business value for a reduced cost. So bringing legacy systems into the modern cloud is a major objective for businesses today and that's where Windows on Kubernetes comes in. So let's say I'm a business executive and I hear about running Windows on this new Kubernetes technology everyone seems to be talking about and so I bring it to my engineering teams and they take a look into it and wow this looks great. I can take my legacy.net application and which can be quite expensive running in on-premise infrastructure with hardware, software and management expenses and package it up in a more consistent and predictable environment that packs more efficiently into the same resources. And it's true, Windows containers are more efficient than traditional Windows server VMs. Windows containers all share the same kernel so you don't have to run kernel setup upon starting a container making it a lot faster. And you can optimize and configure Windows like never before especially as we move closer to the Lego-like construction of Windows images. And you can start and stop containers faster to create more scaling patterns that more appropriately match demand and you can reduce operational costs through the use of auto-scaling and consistency between test and prod and the utilization of open-source software over custom internal tooling. So today I'm one of these engineers working on transitioning this old technology into this modern space. So I create a Windows container in my application and start to run it and hit the button there. So I start to create this application and it seems to work. I had to fix some frameworks and what was included by default maybe I used the wrong image, all sorts of stuff there. But eventually it works and I can run a container and its functioning seems to be putting in an output that's expected and I put it in a test cluster. Everything looks great. So we decided to deploy and we throw a company party, everyone gets bonuses, we're all good. It's finally running in production. And then things start happening. Oh God, why is everything running out of memory? Pods are restarting, nothing's getting through the network and customers are reporting really slow times and I'm like, what happened here and why is this happening? And this brings us back to the reality that software is still a machine and in order to make a machine run at its best you need to purpose design and tune it to most effectively answer the demands of its purpose. So as the creators of this technology we know that Windows is capable of some really cool stuff but we also know its limitations and areas of improvement. So what we're going to do here today is provide you with a starting point to learning to effectively tune and monitor and analyze the performance of your container so that you can move forward and bring yourself into the modern cloud. So we'll start with a really brief history of some Windows containers and then a little bit of how architecture has changed. Then we'll get right into a demo on how to go through this. So as likely all of you know in 2008 when containerization started to gain traction with Linux containers it eventually transitioned and gained momentum with Docker in 2013. Then in 2016 Microsoft decided to take the choice to add container support to Windows containers and Windows Server 2016 and since then we've continuously iterated and added improvements to now where containerization and Kubernetes have first class support within Windows and we're continuing to improve it pretty extensively in Server 2022 and onwards. But in tandem throughout that time period how a Windows application and how a Windows Server application chain had perceived its environment changed over time. So not that long ago things ran on bare metal and the application was at the lowest layer of abstraction and scaling was literally turning machines on and off. But then virtualization came into play and it became easier to manage in more consistent and predictable environments is to be able to scale things more manually especially with Hyper-V and VMware. And then eventually we're in the space we're in now where applications are at the highest layer of abstraction and the focus is becoming more on the application itself rather than the environment it's running in. Where as previously the environment it was running in within Windows you had to pay attention to a lot of variables within what systems and services were running and all these sort of things. So there's a lot of paradigm changes that we've made over the last years and that's where Windows containers is right now is shifting in that paradigm. So how do we tune Windows to meet modern demands and is it really that configurable? So we can think of it in sort of three categories. The first is the platform itself. So this is a combination of what we're doing with our engineering work and what we can offer in Kubernetes with Windows and also in how you're configuring Windows itself to run your application most effectively. So this includes things like the minimizing system overhead selectively choosing which services are running what image you're using, what size, what type and all these different things with how you're optimizing the OS itself. And secondly there's the application optimization. This is you designing your application to be aware of the environment it's running in and ensuring that you're not adding unnecessary either call system calls or things that might cause issues and performance issues and delays. And then finally a methodology in ensuring that you're watching your Windows workloads and you're observing what's happening and you're going through a testing feedback loop to make sure that you catch a lot of these issues ahead of time and test. We see a lot of customers kind of just jumping straight into it and seeing what happens after that and it's important to run a test in simulation infrastructure to be able to ensure that the things that you might expect for Linux also translate to Windows as well. So what are some of the things that we've been doing to help with this? So first of all we've got a whole bunch of optimizations that we've done especially working with the SIG Windows community and the engineering teams here at Microsoft. So one of the things we've done as of late and should be releasing soon is pretty much an entire rewrite of the Windows container operating file system. We've effectively rebuilt this from the ground up and it's totally optimized for Windows container images and container operations. So with this we're seeing a 30% improvement in container import and start times and it does lay the groundwork and framework for dynamic imagery streaming to reduce the time of an image poll to ideally under one minute or less. And then finally we have a dynamic caching solution that we're creating for Windows on Kubernetes which allows us to recache Windows on a node and dynamically and horizontally scale that more efficiently. And secondly we're adding Prometheus further Prometheus support with Windows Exporter and kind of continuing enabling you to effectively monitor your Windows workloads and we're supporting lots of popular CI and monitoring vendors that people like to use in the Kubernetes space and what we'll demo in a sec is the easy trace collection with host process containers. And finally we're working on a new solution for the automatic analysis and intelligence and performance auto tuning for your Windows workloads. So more on that soon. And here we have Howard to talk about some of the research efforts that we've done. Yeah, thank you Brandon. We know how challenging one customer is to start using the Windows platform. At the platform level we try our best to try to get feedback from customers looking to some of the common issues we have been trying to resolve. Especially in the density and the networking policy and also, you know, how to reduce customer failures and also some of the memory leaks were being seen. So we're going to continue our efforts to address the issues at platform level. And regardless where you are in terms of your deployments that's the fundamental issues we need to address. At the same time the software is not just like a standalone of course it just depends on our platforms. And there are things as a customer you can work with us and also there are some recommendations you know you can follow or you know kind of adapt and to see whether they're going to help you achieve better performance goal. So you know before you deploy your application you want to ensure your application not to overly consume the resource. Regardless of which operating system you're dealing with the system only has a limited resource you have. And so some of the operations like for example people coming from the Linux background they have a tendency to say well let's start threats and just applications and query things and in the Windows world you realize they have some overheads there. If you're working with Windows you're trying to say hey maybe you do it differently and also you want to prepare your application ahead of time so you get ready look through the issues before you say let's deploy it to the cloud. So here are some details regarding the applications so container image people keep saying windows container image are pretty big but we do have a smaller size image like for example nano server so if you can deploy application on the nano server I encourage you guys to do so because that's going to have less overheads so if you're saying okay now application have dependency on certain component are not built in with nano server then server core may be another option for you to see whether application can deploy in that image. And also I have seen lots of tendency people just saying let me build the image and I see one time I saw like a hundred layers of the image being built so being many layers actually going to slow down your pod or application start up time so I encourage you guys to reduce the layer of the application in the images and also they're currently we're supporting windows server 22 if you're still using the older version of the operating system talking about the images I encourage you guys to actually move into the windows server 22 in terms of the OS the host OS and also the container images so another typical problem I'm seeing people like I mentioned before people actually launching the command lines query system information with the timer pulling pulling pulling and that's something you guys probably wanted to avoid if possible another people are seeing like using pin.exe so we actually identify an issue in our pod using the pin.exe and to keep container alive but this is not really a good way to keep your container alive there are some better ways doing so because when you ping you're really flashing the network port and when you try to create the endpoint they have this connect and also you want to pre allocating your pre-caching that's like a common thing in terms of computer programming so things you can pre-cache you want to do that for example in the .NET framework world before you do like inside you want to do pre-jit your code on fly or compiling your code on fly that's going to slow down things and also I've been watching people like trying to use very porting tools from Linux to Windows they just say let me compile it and everything works then I'm good to go and actually if you're looking deeper at the system I have different behaviors if you're looking deeper actually will help you to really improve the overall your system programming in terms of storage, networking or other components so the last thing is like you want to decoupling if you're doing open source any frameworks you want to provide a way to able to decoupling the components I'm seeing some of the frameworks they include everything if you have application you try to deploy to the nano server the GDI come with it they're just like so hard to separate out and so that's you know if in the future if anybody want to have your own anything like that think about it how you can separate the component just deploy the component necessary have your application running so here the typical way of tuning optimization you know like you can have the you deploy your product to the productions and configure workloads and you say we'll deploy to the test then you start collecting traces yeah Windows has you know like like ECW that's built in the kernel and you can collect in your traces and you can start looking to the behaviors of your system right then you go through this cycle and looking through again again again to make sure your system actually before deploy in the optimum way of performance so they are like there's this internal system you know like resource usage you can use in ETW make sure they are actually being monitored being understand and also their application logs you can go through like for example iS server logs ECS and you know and also the HNS logs you can look through so here are two examples one the first example actually related to the iS server this is like real customer scenario return of reliability so we're seeing 503 means service is not available and so how we solve the problem right so of course you want to go to the roots of the problem which need to look at the iS server logs and how do you get the logs out and so it can be challenging because the you know the server maybe go off the container may go offline how do you get the log out right so that's issues we need to look into you know here you may want to map the iS server you know log folder to your host then even the container go when offline you can still be able to retrieve the iS server log the second issue is related to the iS server performance and so as you can see we're using some common practice command line to capture the ETW trace log at the kernel level and also there's one WPRP file which defines the necessary providers are needed for troubleshooting the windows container performance so here's a typical day for me to work in front of a computer and there's cloud I'm just sitting in my home but looking at the big monitor so on the windows the left side window for me is the WSL and the right side is the windows console as you can you guys can see the both side actually looking at the same folder right so I'm taking advantage of both worlds right so Linux world and the windows rich UI world so I using the Kubernetes and deploy my nodes and deploy my parts everything so now it's time for me to understand what's going on with my node so one of the key technology enabled us is called host process container and Mark and James from sick windows they developing the technology they have a talk on the YouTube you guys can go back to listen to the details so with the host OS containers we're able to execute the command without logging into the cloud this technology is not just a limit to the windows cloud it can be deployed to any clouds you want it so you execute command on the node and collect the traces and you just like run the same command called azcopy.ex that's windows cloud solution pretty sure other vendors have the same solution and bring that file down to your desktop so how do we achieve that actually it's really straightforward nothing really fancy and with minimum amount of code you can achieve this automation if you're looking at this the YAMR file the only difference here compared to regular container the past specs is the security context options which allows you to specify which user context you want to execute under in here I'm just saying let's execute under the system context give you full power you can do anything you want to put your node also underneath there you need to take the nodes which node you want to deploy to which is using a label a label the nodes you want to do your notifications so they only take like six steps so to go through the data collection process with the automation you need to offer you need to set your accounts which one account you want to run it under then you want to create the host process containers then you start executing the commands collecting the traces from using the host processor containers then you start your scenarios and stop collecting the traces and upload traces to cloud storage then after that bring it down so one time a customer gave me a cross dump it's about 8 gigabytes so it took me 8 hours to download that trace file but if you upload to the cloud storage it takes about 10 minutes to bring it down so that's why encourage you guys to do a similar kind of approach so this is nothing fancy it's just setting up some of the parameters to show you how you name your clusters and the storage all that it's nothing to worry about and so also let you guys know all the script actually is checking to the github and you can just go there download and play with it so there are two windows here one on my right left side it shows the how to collect the traces it's pretty straightforward and you know using this only difference when you say Kubernetes execute and gave the host processor name then write the content just like assume you execute on that node that's really straightforward and you execute your scenario the scenario you can replace whatever scenario you want you can replace density or never policy whenever the scenario you want to execute and this framework will guarantee you to get the traces of your system and let you to understand what's going on so this right hand side here it shows a little bit complicated but the thing is still the top part is showing in the edge environment how to get your SAS token with the SAS token you're able to copy your traces to your storage and then bring it down to your local machine so it's really straightforward nothing fancy here so here are some the log files from that script execution so first of all you set the accounts then you can see there are some default system paths running on your system and this like pre before we actually apply the host container then you want to select the nodes you want to do your testing and now reply the host process container is running then we can start execute the command inside that node in this case I just want to make a directory on D drive which is called perf that's where I can store my the trace file to the local so here actually we start executing the command which is starting collecting the traces so even on your local machine you can do the same thing just without that dark Kubernetes execute you can run the command on your local machine into your local machine perform it the same way so then we apply the scenario then stop executing and generate the tokens and then we copy easy copy the trace file to the cloud and bring it down to my system so once you are in a system there's a tool called WPA so this actually is public available you can download and to use it so let's give you a detailed information about your nodes what's OS running how much memory, CPU, all in terms of that also up here so pretty much your system is open for you if you have that as long as you want to dig it in you understand how it looks like so here's a perfect example it shows a busy system so if you look at the idle it's only like 3.3% your system running probably about 96% of the CPU are busy doing things so this is the way for you to understand what are the processes that are running, why they are running and you can even using the cost stacks to see the functions actually causing the CPU turns thank you well thank you Howard again like Howard said you can download these open source the scripts here and we're also happy to talk about any of these in person after this but one thing I want to mention is we want to work with you and we want to hear feedback from you and to find ways to make windows better because helping you helps us make windows better so a couple things here we want to make windows performance analysis easily available to everyone here in virtual and etc but so our plan is to continue producing and publishing updated guidance and guides on how to run through all of these processes and analyze your traces and everything performance is a huge subject so there's many ways you can slice it and we're going to be working on making that easily accessible and secondly just being able to talk to us and work with us to help us help you work through your traces and secondly that auto analysis tool that we're talking about before we plan to make that publicly available as well and just getting in contact with us so I'm at SIG windows every week so feel free to just jump in there and ask me whatever crazy questions you have and we also have our public windows containers repo which you can submit issues, bugs and logs and all sorts of stuff and we do take a look at that and yeah and share your stories too we like to see people talk about how they've created their clusters or how they've set up their windows workloads there's a lot of good talks, I mean relativity had a good talk at the Detroit coup con it's always good for us to hear how people use our technology and share that with the world and that helps us get better so yeah feel free to come up to us and ask questions and talk with us whenever we're here to help yeah we'd love to be challenged so bring your questions to us and we'll try to help you out thank you hello Jules so I would like to ask about maybe some options how to minimize the image itself at first you need to choose the base image as small as possible but if you're trying to move legacy applications usually those have a lot of dependencies so yeah there are tools like dependency work and such but maybe you have another suggestions that would collect needed assemblies so first to start just in the last few months we were able to reduce the size of the windows images by 40% and we have more coming in the next few months a lot of work being done on that there so absolutely here image size is a tough problem for windows especially so there's a lot of work that we're doing there on enabling better construction and more like selective construction enablement of services to slim down the image there's no tools that exist right now to support that but there are ways you can extend nano server to be able to meet your specific needs but we've also found ways to effectively minimize the size of server core too and plus with the image caching and you know teleport the image caching technologies will be supporting that as well Howard yeah so I think it's doable but just like like I said you know if you you're familiar with the tracing so that you know you can look into see what our dependency and I think in the long run we want to automate the whole process to give you recommendations to see that you see like what kind of component need to be bring into your you know bring on to the nano server so I've done some of the analysis myself but I know it's a a challenging task but in the long run we wanted to make sure we have tools to fully support you guys so we don't go through all the the trouble links you know the hard process we had to go through thank you okay thank you hi great presentation hello about the tracing files right the files that you used by the tracing and generated it like I know about Windows performance analyzer but is there more tooling coming out or like being put progress into when it comes to or maybe just making it available for other operating systems because we have in our company we have a lot of engineers that want to do the tracing as well not necessarily they run Windows computers on their main machines so you like we wanted to make everyone be able to analyze it or maybe enable a web service where they could put the tracing and see the the results in a web platform or anything like that yeah we have exactly that we're working on at the moment more details to come yeah so you know like even on Windows there are multiple ways of turning out your traces in the logman WPR you know there's a proof view among many different ways but the thing eventually you have to collect the trace and using different tool to do analysis on top of it one of our goal is trying to to make the process easier for people to able to just understand instead of going through all the learning curves there thank you great thank you