 It's on, the mic is, can you hear me? I'm on the speakers? Okay, hello, hello, can you hear me? Cool, all right, so let's do this. Okay, thank you everybody that came to this talk. For those of you that don't know me, which is probably most of you, my name is Paul Yu. I am a cloud native developer advocate at Microsoft. And one fun fact about me is that this is, although it's scale 21x, this is scale two for me. So I was actually here last year as an attendee, and I was like, wow, this is a pretty cool conference, got some good people. And I said to myself, you know what? I'm gonna come back next year and I want to speak. And so here I am. And today I'm gonna talk to you all about strengthening the secure supply chain using open source tools. And so my goal for you is basically to give you a tour of some of these tools, because they can be a little daunting to get started with. And so I'll be doing a live demo. I will be testing the demo guides today and also testing the conference wifi. This is the last day of the conference, so hopefully the wifi will be good to me. Let's go ahead and dive right in. So first and foremost, strengthening the secure supply chain. This is gonna be focused on container security and taking that container security and making sure we're patching them and delivering them to container orchestrators like a Kubernetes. And so whenever you think about Kubernetes, you're always thinking about containers first and foremost. And of course, containers are awesome. They're awesome because they create a consistent and portable environment. I spent a lot of my time in the I would say the .NET, the Microsoft stack, that ecosystem, and back then, ooh, it was really painful to deploy your application onto the servers that you needed them on. Back in those days, I was doing MS build, packaging them up into zip files and then literally copying and pasting them to every single server, unzipping them and praying and hoping that the server would have the right dependencies that I needed for my application to run. And if you've worked in the .NET framework days back then, if you were missing a DLL or if you overwrote the wrong DLL, man, your application is cooked. So I would say when the concept of containers came out, it was like magic. Like you layer in the operating system, you layer in the application dependencies and then boom, you have an application in a nice containerized thing that you can take and deploy into every single server that you needed to. And so that was like super cool. What that gives you is that also gives you the ability to easily manage and orchestrate these workloads so you can actually deploy them using things like automation and orchestration. I mean, that's what Kubernetes is all about, right? And so what you get is you get your application that is highly scalable. You can deploy them out to literally hundreds, thousands of servers with a single command if you really wanted to, right? And they can also improve your security. But because it's highly scalable, that means your application, the blast radius is also super wide and big, right? If your application is highly scalable, then so are your vulnerabilities. So as you're shipping your app and if your app has vulnerabilities in it, then unfortunately so do your applications wherever you deploy it to, right? And so we're living in this world now where there's just security vulnerabilities galore. I mean, I think we're at a point where there's reporting, we're reporting about 50 plus CVEs on a daily basis. And it's just like a never ending game of whack-a-mole, right? So like this cat here just constantly patting down the fingers and it's just a never ending game, right? Just to illustrate the point here, you can see that the number of security vulnerabilities that were reported just last year alone, this report was taken by QALIS, a Threat Analysis Report and it was generated about mid December or so. You can see that there was 26,000 CVEs reported in mid December. I think we ended the year like under 28,000 CVEs but in any case, you can see that the number has jumped an alarming rate and starting from the year like around 2017, you could probably attribute that to cloud native workload adoption. Could be due to other reasons but the point is we're seeing this huge spike in vulnerabilities that are being reported. Now, not to alarm you too much but out of that 28,000 or so CVEs that are reported last year, not all of them are highly critical vulnerabilities. There's only a small portion of that. But even if you start to look at the vulnerability types with the high critical vulnerabilities, you can see that a large portion of that is due to the operating system. And so that's what we're gonna focus our container security story here and just basically looking at ways that we can secure that operating system inside of the container and make sure we can deploy that out to our servers in a timely manner, okay? So what do you start? When it comes to Kubernetes and the CNCF landscape, it can get a little bewildering but at the same time cloud native is pretty cool because it's like all the tools that are out there, to me it's like Lego blocks, right? I spent a lot of my time as an Azure architect helping customers on board to Azure and it's like where do I start with these services? I don't know where to start. Same can be said about the cloud native landscape. They're all just different open source projects that you can pick and choose from, but where do you start? It would be nice if all the building blocks were just like this and they just snapped together. They're all the same shape and size, but it's not like that. If you look at the cloud native landscape diagram and you see the litter, literally it's littered with hundreds, maybe not hundreds I'm exaggerating, but it's littered with a lot of projects in that CNCF landscape diagram and that diagram starts to feel like this to people, right? And these poor Lego heads that are somewhere in the rubble, that's probably a Kubernetes administrator somewhere just like, it's just too much to pick and choose from, right? So where do you start? We're gonna start here and I took a subset of that CNCF landscape diagram and I wanna highlight a few projects that you can take a look at to strengthen your secure supply chain, okay? First and foremost, I wanna focus in on Trivi. If you've worked in the cloud native landscape, you've probably heard of Trivi before. It's a great vulnerability scanner that you can actually leverage within your whole ecosystem. It's not a CNCF project per se, but it's a highly used cloud native security scanning project. The next thing I wanna focus in on is this project called COPPA and I'll get into a little bit of the project details in a little bit, but I do wanna focus on flux as well because we're gonna be leveraging that for our automation bit and this project down here called Eraser. One special shout out I want to make, I'm not gonna demo in this talk today, but I just wanna specifically call this out as Notary. You can use this to actually sign your container images and then you can use other applications to actually do the attestation and do the signature verification. But all the blocks that you see in pink there highlighted, if you can see that color, that's what I will be demoing today. One other project that I did not talk about that wasn't on that landscape list is GitHub Actions. GitHub probably needs no introduction. You probably know it already for its CICD capabilities. Actions is basically its workflow platform where you can trigger workflows based on certain events, whether it's a push or a PR merge or a schedule. And these runners, they're typically running in GitHub Cloud, but you can actually self-host that if you wanted to. Just to quickly touch on flux as well, this is a pretty old project within the CNCF landscape. Donated the CNCF in about 2016 timeframe, graduated in 2022. This is basically your continuous delivery solution for Kubernetes. And I think they actually coined the term GitOps, right? Or actually Weaveworks did. Special shout out to all the Weavework folks. I know there was some, they went through a little turmoil so hopefully everybody landed on their feet there. But what we're gonna do is we're actually gonna leverage flux, a feature inside of flux called Image Update Automation. And we're gonna use this to actually detect new images and make changes so that we can deliver that new version of the image as soon as possible. The other project I wanna highlight here is a CNCF Sandbox project donated in 2023. What this is is I like to call this the Roomba of your Kubernetes cluster. It'll actually just float around your cluster and pick up any unused, vulnerable container images and delete that from you, right? Because within Kubernetes, they do have this almost like a garbage collection type capability, but that's based on load. So once your disk space exceeds a certain threshold, that's when it'll actually delete that container image. But in our case, we wanna delete this thing as soon as possible because we don't want any vulnerable images hanging around in our cluster. It actually supports trivia scans so it'll actually scan the containers if it detects vulnerabilities, it'll get rid of that for us. This other project, and I've actually listed these projects in chronological order in when they were donated to the CNCF, so this is a project that actually came out of the Azure or Microsoft open source incubations lab. This project is called Copacetic, and it was donated in late 2023. The job of this project is to basically use trivy vulnerability outputs and patch container images based on OS vulnerability scans, right? So the whole idea is sometimes as an organization, you can't really wait for upstream to fix that container image for you. You have to go ahead and patch that thing yourself, right? And so what this project will do is it'll scan for vulnerabilities if it's known that there's a fix out there, it'll apply the patch as a layer and you don't need to rebuild that entire container from scratch. So it's a really quick way to get your OS vulnerabilities fixed and that's the whole point here. Okay, so enough talking onto the demo. What I will be doing today is I'm gonna do this live and I said I'm gonna test the demo guide so we'll see what happens. What we're gonna do is I'm actually going to deploy a local Kubernetes cluster using Kine. I'm gonna bootstrap that cluster using Flux CLI. I'm gonna run, well, Flux CLI will deploy my demo application and then I'm gonna run trivy on one of the containers in my demo application and we're gonna see a vulnerability. We're gonna use Copa to patch that container and then I'm gonna deploy that patch container and then yay, we're all safe, right? But that's like half the story because a lot of that stuff is gonna be manual work and nobody likes manual work. We all want bots and AI and things like that to do the work for us. So I'm gonna configure Flux to actually use image update automation to basically listen for the new container image tags and then automatically deploy that into the cluster. So you have like this nice continuous cycle of constantly patching your container images as vulnerabilities are found. And then finally, we're gonna get that Roomba in there and it's gonna go around and sweep up all any other container, vulnerable container images. So let's see, let me close this slideshow real quick and then hop over to my demo. All right, so the first thing I'll do is I will create a Kine cluster. So this is the part where it probably takes a little bit of time so just be patient with me. And by the way, as we're waiting for this stuff, if you have any questions, just feel free to shout it out and I could kind of answer along the way. So wait for that. Yes. Compliance support, thank you. And one of the things that application teams are telling me is that it's not easy to update modules within the image that they're using for the containers and they have to find a patched up version of that and redeploy the whole thing. So what you're, if I understand correctly, you're saying that these tools that you're demonstrating will help patch them in place on a needed basis. Right, right, right. But this project copacetic tool will patch OS vulnerabilities only for now. If you have an application vulnerability, you probably just need to just rebuild your container at that point, so yeah, yeah. Right, yeah. So you said that it patches the OS vulnerability by adding a layer on top of the existing one rather than rebuilding it from scratch. That's correct, yes, that's correct. And normally the OS then is referred to sometimes as the base layer, but it just overwrites it then, essentially. Yes, yeah, and I'll demonstrate all that for you and you can actually see the layer there. Okay, thanks. Yes, we have a question in the back. Hi, so based on what you just said about the patching, given a project with a Docker file, for example, for the container, if I have a new deploy of that project, or if I rebuild the image for the deploy propose, this is not going to be patched until all this pipeline finds it and patches it as a new container that is running. So does all this contain a way to... I imagine that if there is a patch to the operating system, there is also an updated base image from which you can build your container. So is there a tool to also eventually update the Docker file or container file to point to an already up-to-date operating system tag? In order to have it already patched when you do a new deploy or a new build of this image in a CI, CD pipeline or whatever. So you're asking how can we update the base images to the appropriate numbers? There isn't a CNCF project that I'm aware of that will do that. I've seen things like, I was at the Docker condo, I was here in LA a couple of months ago, I think they came out with a Docker Scout application. So that might be able to do it, where it says, hey, your base images is out of date. You should bump to this number and they could probably do that for you. This, the purpose of the COPA tool is primarily for, you can't wait on the upstream container images to be rebuilt. And so you just have to add a patch layer on top of that. So it's more like a stopgap. Yeah, I'm thinking too about depend a bot on GitHub, GitHub. Oh yeah. Something like that. Given patches, once there is an updated version of the image, you get an alert, hey, change the version to this one so you address those vulnerabilities. So maybe not a major patch, but a minor patch that addresses vulnerabilities. So maybe depend a bot is, no, I never saw a wrapper from the bed, depend a bot for that. So I don't think it's included in GitHub, but will probably be useful in that. Yeah, no, that's a fantastic idea. Have a depend a bot like capability for the container images or base layers. Yeah, yeah. Now, I could take that feedback back to the team that's actually working on it. So yeah, good idea. Okay, so looks like our kind cluster is actually running. And so let's just do some more testing here. I'll go ahead and now bootstrap my cluster for flux, right? And so I'm using the flux CLI. I have a local Kubernetes cluster. It doesn't have anything installed in it right now. And so what I'll do is I will use flux bootstrap to actually point to my GitHub repository where my manifests are. And so this also takes a little bit of time. And so we'll just have to wait on that. And sometimes it goes fast, sometimes it goes slow. So we'll just be a little bit patient here. Fun fact, you probably guessed that I'm not really typing all this stuff because there's a lot of typing going on. So I just want to give a quick shout out to Paxton Hair. He has this repo on GitHub. I think it's GitHub slash Paxton Hair. It's called demo magic. And you can literally just script out your entire demo. It's actually running the commands for me. It's just I'm not typing it. Because we all fat finger stuff every now and then. It's called demo magic. Demo magic, yes, yes, yes. Yeah. If you want, I could just switch to it real quick. So this is the app that I'm actually deploying. And this is it right here. So special shout out to thank you. You saved me a ton of time with this stuff. All right, let's see. Hey, look at that. So all the components are healthy now. Let's go ahead and check the status on that. So we'll check the deployment. We'll do a cube CTL get pods. And you can see that I have a bunch of pods running in this sample demo. Containers are creating. I'm pretty sure it'll be fine. So let's just go ahead and move on. So now git just bootstrapped itself. And it has all these actually written like three commits into my repo. So I'm just pulling down those files so I have the latest. And then what I'll do is I'll actually scan one of my container images. And I know this one has a vulnerability, so I'm going to demo this for you. So it's called the storefront container image. And I'm actually going to use trivy to go ahead and scan that. So when I scan this, this is another part where I need to wait on the internet to download the database. And then it'll do a trivy scan. And then we can see the output here. Not that big of a file. So it should come through very soon. OK. So there we go. Font is a little wonky. So let me just drop this down a little bit. So if you can see here, it detected one high severity. So one high and then one medium. But they're all marked as fixed. And there's a fixed version for that. So what I can do is I can run this command again. But this time, I'm actually going to re-scan, but output the trivy results to a JSON file. Because when COPA does its thing, it's basically going to read that trivy output file and then basically patch based on what trivy says is the latest patch number. So now that I have the trivy file output as JSON, what I'm going to do is I'm now going to call COPA, CLI. And it's going to patch my image based on that JSON file. And I'm going to tag it, as you see here, using a new version called 1.2.1, because my original was 1 2 0. So right here. So we'll wait for that. And you can see down here that it ran an APK update and an APK add and an upgrade. So that's the patching process. So now, if I can get back to it, there we go. So now if I view the image history, you'll be able to see that these are all the layers that existed previously. And the latest layers are right here. These two things are here. So what I'll do now is, now that I have a patched image, I can scan it again using trivy. So when I scan my 1.2.1 image, it should show up as being, we're all in the clear. So now there's no more vulnerabilities, no high, no mediums. We're good to go. So where is this patched image living? Is it in just the local Docker registry? Is it possible if you're using some other external registry that would put it there for you? Yes. And so well, that's what I have to do as an administrator is the next step I'm doing here is I'm going to log into my GitHub Container Registry. And I'm actually going to push that new patched image into the GitHub Container Registry. But check this out. It doesn't need to patch the entire container image. It's only going to push the two layers, yes. So then when you see, you can see that all the layers already existed and it only pushed those two new layers, which makes it very nimble in terms of pushing our changes out. So what I'll do is now that I have a new version out there, I want to update my Kubernetes cluster to actually use that new image. And so I'm using Customize here because my workload is all backed by Flux CD. And Flux CD basically listens for changes in my GitHub repo. And I'm basically saying, hey, you know what? Let's use Customize to bump my version number up. And let's go ahead and commit and push that change. And as soon as I do that, Flux CD will actually kick in and then push that new image out to my Kubernetes cluster. So the new manifest change has been pushed up. And if I do a kubectl get pods, what you'll start to see is we will see this pod go away and a new pod come online. So we'll just be a little patient here and it will eventually come through. While we wait, there is support for multiple tags. So going back to my previous question, we may tag the patched version also as latest, for example, so that during the deploys we always point to latest and we don't care about the current version we declared. Is this something feasible or suggested, not suggested? It's probably not. Well, it depends. What you want to do is I'll tell you what you don't want to do. You don't want to patch on top of a patch. So what you want to do is you always want to start with an image that has a proper base image versions. And you always want to work off of that. And so I'll show you in my strategy just for this demo that I approached. But basically, I don't want my patched versions to be latest because I want to always ensure that I can find the image that should be the appropriate ones and then patch on top of that. Because you don't want to get into a scenario where you're constantly patching, patching, patching, patching. And then you lose sight of where my original image was. So it could be latest, it could be something else, but I would recommend that you mark your golden images with a specific tag. And then you can always just work off of that instead. And I'll demonstrate that in a bit. Thank you. We can see that the new container image is actually running over here. And so I will just drop out of this watch. And then let's go ahead and confirm that the new image is actually running in our Kubernetes cluster. So there we go. We can see down here that we have the 1, 2, 1 image. So there, Flux did its thing. That's cool. What I want to do next is that was a completely manual process. I'm patching, I'm pushing, I'm updating a manifest, and then Flux does the automation thing. But I don't want to do that manually. I want Flux to do that on its own. And so what I could do is I could actually leverage a feature of Flux called Image Update Automation. And I'm pretty sure Argo CD has a very similar capability. But what I want to do is I want to tell Flux, hey, my image repository is over here. Go listen for that. And while you're listening for that, I need to tell Flux, hey, here is my tagging strategy. I'm using Semver. So anything greater than 1, I know it's kind of cut off, but anything greater than 1.0.0, right? So I'm putting together my tagging strategy. And this is how Flux is going to know, like, oh, you started from 1.2.0. Now there's 1.2.1. That's the latest, right? So that's the image policy. So the image policy will know how to evaluate tags and then determine what the latest should be. The last little bit here is using Flux Image Update Automation. Because once Flux knows what that latest tag is, it also needs to know where within my repository it should be making manifest changes. And so here, I'm creating an image update. And I'm basically saying take a look at this folder within my repo, check out the main branch, and you can point this to whatever branch you want. But then here is my authoring template because it's going to make a change to my manifest files. And so it's going to need to write a commit into my repo. So that's what that's all doing. And so we will give this a little bit of time waiting for the image update automation reconciliation process to happen. This is usually quick, but there we go. Final thing that we need to take care of in order to tie all this automation together is Flux knows now where to make the edit, like what file. Now it needs to know, OK, I don't want to update the entire file. I only want to update certain pieces of that file. And so what we're going to do is we're going to have to add markers to our customization YAML. And this is a Flux thing. So what we do is, and I'll show you a diff here to see what I just pulled in. So you can see here that we're adding this tag, or this comment. So image policy, and we're pointing to the image policy that we just created. So that will update the name for me. And then this is important right here. So we're going to add this marker that says, hey, any time you find a new tag, go ahead and update this. So it's one to one right now. And we're going to make sure that it constantly updates that piece of the manifest. So if I commit and push this to the repo, I'm now at a point where I have the full end to end automation in place, but I don't have the patching component in. So let's go ahead and add that. And to add the patching and to your question about how can we get this patch container up into a remote container registry, well, we can do that using GitHub Automation. And so I'll pull in this sample workflow that I had, and let's take a look at the workflow. So if you can see here in the workflow diagram, what I'm doing is I'm basically scheduling this. Who remembers Patch Tuesdays? It was a terrible day. But yeah, Patch Tuesdays is when we're going to run this thing. And what we're going to do here is we're going to leverage a couple of GitHub actions that are available in the GitHub Marketplace. And so what I'm doing here is number one is I'm finding the latest tag. So to your question about using latest, you just need a tag that you can find, that you can determine, that's where you want to patch from, right? And so what I'm doing here is I'm finding the latest container image regardless of if it was tagged as latest or whatever. But I'm finding the latest semver, and I'm going to bump that, okay? After I bump that tag number, I'm going to use that for my patching. When I run the trivia scan, I'm not going to scan off of the latest semver number. I'm going to scan off of latest. So this is my unpatched container image, and I'm always going to patch based on that. And then I output the trivia report, I check the vulnerability counts. And then down here, if the vulnerability count is not equal to zero, then I'm going to run the copa action to actually patch based on my latest, right? So that goes again to the point of not patching on top of patches, we'll always patch off the latest. From there, I'll use GitHub to actually log in to GitHub Container Registry. And then I will re-tag using my bumped semver tag. And then I'll push it back out to the GitHub Container Registry. And now remember, this will get the new container to the GitHub Container Registry. Flux will listen to that container registry and say, hey, I have a new tag. This tag is going to be 1.2.2. And then that's when Flux says, I have a new tag. Let's go ahead and automate the whole deployment rollout. So with that, the workflow is in place. You can kind of see it doing its thing right now. And just to show you what's going on, we can also look at it over here in the actions. And we can see that there's a patch going on. If I click that, yeah, you can see it's running the copa action. Actually, that finished, that was pretty quick. And it just pushed the patched image. So at this point, it found a vulnerability again, because remember, I'm always patching based on the latest. And I'm running this based on a cron schedule. So this will run once a week. And it'll always find the patch until I update that base layer in the Docker file, that's when I'll really fix the issue for good, okay? So from there, I can force the reconciliation of my image repository. And to be honest, I think Flux already took care of this by now. You can see that it's scanned and it found six tags from my container registry. And then I can check the image policy. So I'll do a QCTL describe. And you can see here that it updated from one to one to one to two. So Flux took care of that. And if I force the image update reconciliation, again, I'm guessing that this happened already. So this should be relatively quick, maybe not. Because by default, Flux will reconcile every minute. And you can configure that to whatever you want. Or I can use a Flux reconcile command to kind of force this thing through. So we'll wait on that. There we go, yeah, no updates made because I'm guessing it kind of did that on its own rather through my reconciliation. So let's go check it out. Theoretically, at this point, my manifest should be updated. There we go. And then I'm gonna force that customization to be applied to the cluster. And if now I look at this, get Flux commits, there we go. You can see that there was a change made to my customization YAML. And Flux actually bumped that version for me. So now, I have a new version in there. And I can watch the pods roll out. And it's probably too fast for me. Because you can see here that I have this pod, which is now 20 seconds old. And if I go ahead and verify that new image, should say 1.2.2, there we go. So now, I'm fully automated, right? I now have a process in place in which I can continuously scan my images and have those patched so that I'm always running non-vulnerable, at least from the OS perspective container images in my cluster. The last bit that I wanted to show you is that Roomba, right? So, if I do this command against the nodes, and I'm only running a single node cluster here. But you can see that images, they're just cache and they sit on the server for as long as they need to until Kubernetes decides that it needs to clear out some storage space. And so you can see here I have a 1.2.0 and a 1. These are unused containers, and we know that the 1.2.0 is vulnerable. So we don't want that in our cluster because if you have a malicious user in your cluster, they can actually use that container image for whatever the heck they wanna use it for. So we want that out of here, right? And so what we're gonna do is we're actually gonna deploy eraser into the cluster. And it's pretty easy to install that. Just a single YAML file that you can use to deploy. You can also deploy using a Helm chart, but this is easy enough for the demo. And so what it does is it actually installs a couple of different things in here and it's fully configurable. And you can see that it's actually deployed a controller manager because almost everything in Kubernetes these days is all controller operator based, and so it installed the controller manager. And then it also installed this thing right here, the eraser scale 21 demo. So it's like the cluster name and that control plane. And you can see that there's gonna be one pod, three containers, okay? The three containers is gonna be number one, the collector. So it's gonna gather up all the images that needs to be scanned. And then it actually has a scanner container which will actually scan the containers and then you have a remover which actually deletes them out based on the scan results, okay? So they're all running right now and they will run to completion, okay? There we go, so it's, they've ran to completion and they are now terminating so I will go ahead and bounce out of that. And then I'll go and find my eraser controller logs, there we go. And you can see that it deleted images, deleted the pods, right? And then it's running job deletion, right? So you can see some of the things that it's doing along the way. And so now if I run another command to verify that the unused images are gone. You can see now that I only have 1.2.2 cuz that's what I'm currently running on, okay? And just to show you a little bit of that configuration. This gets a little messy in my opinion because everything is all inside of a config map, but I do wanna highlight here. Within this config map, and let me just drop this down a little bit so you can see, there is a scanner config. You could enable it or disable it. If you disable it, it'll just delete all unused container images all the time regardless of the scan results. Down here is where we actually configure the scanner. So you can see this actual configuration is a little bit different from when we ran the local trivia scan. So here it's looking for library vulnerabilities. So Copa will only patch for OS vulnerabilities. Eraser can actually delete containers based on any vulnerability application or what not. And then here you're actually configuring the severities. And then, yeah, so that's basically it there. Okay, so that's pretty much it. So now we're at a point where we've automated the entire thing, right? From looking for vulnerabilities, patching that vulnerability, pushing it up to our container registry, and then using cool tools within the Kubernetes ecosystem to kind of automate that entire, I guess, distribution of that patch image. So that's pretty much all I had today in terms of the demo. And if I just jump back over here, what I would say is just to summarize, I mean, containers are awesome. It's a great way to package your application and get those out into your, I guess, your server environments. But keeping up with container security is always challenging. But if you leverage open source tools, it could help ease that pain just a little bit, right? And secure supply chain is just one aspect, right? That's taking care of it from the building side of it. There is also runtime security that you have to worry about. And so that's a whole other topic. With that, all I have for next steps is just a few links here to the CNCF projects. I would recommend everybody that's here and viewing online to go out to these projects, get involved, help out. This is why we love open source. To get everybody, the community involved, and help build the product the way you need it to work. Also, there's a few things, a few social things that I want you to follow here. I actually published a blog post this morning that'll walk you through the entire thing so you can actually do it at home by yourself if you wanted to. Also follow my teammate Josh Duffney. He actually did a lot of work on the secure supply chain as well, including signing your container images using Notary and then doing signature verification using ratify, okay? The last bit here is we do have a open source Discord at Microsoft. So if you want, go ahead and scan that QR code and join the Discord community where you can chat with some of us folks here. So with that, that's all I had today. I would say I would open up to questions and if not, thank you for your attention, yeah. Yeah, I have one. So we run the cleanup to remove the images that contains vulnerability. But what is denying a malicious user to manually target a vulnerable image? Do we have an easy way to ban versions? So not even manually is possible to target them and get them downloaded and used in our environment. Yeah, so I'm not sure if I completely cut that question, but how would you prevent folks from targeting very specific? Yeah, we said 1.2.0 is vulnerable. Yeah. And we remove the local image. Yeah. But I can still modify Docker Compose or Kubernetes configuration whatever to redownload that image until it exists on a Docker registry that can be our registry or the Docker app registry. So do we have a way to, I don't know, a policy or something that as a high level administrator I can block? Yeah, that's where kind of like runtime security will come into play. Things like signing your container images and then having something like in your Kubernetes cluster, like there's like OPA gatekeeper, things like that. Cuz what you can do is once you have identified that a specific version is vulnerable and you don't wanna use it anymore, you could probably invalidate that signature so that when you go and pull down the container, it's gonna say, now we don't have that signature, we're gonna block it, right? So that will be like runtime security against that. I assume maybe you didn't show this just cuz it's complex and you're doing a demo, but are there ways to do staging before it gets to production after you've patched this image and maybe do a blue-green deployment or something to test it before you put it out in that scale? Absolutely, so Flux has this, and I'm only speaking from Flux cuz I have more experience over there, but Flux has this cool feature or capability called Flagger, it does exactly that. So Flagger will basically say, there's a new image coming down. There's a new manifest update, so let me go in and deploy that. But let me go deploy it as a separate deployment, right? And it's like a deployment of a different name, because in Kubernetes, a service will point to a deployment. And so what Flagger will do is it'll deploy it as a, I guess, for talking blue and green, it'll deploy it as a green deployment. Do its local testing using, I forgot what the internal tool is. But basically, it'll start hitting that service and making sure that you have enough 200 requests or passing requests. And then once it meets a certain threshold, it'll slowly start to pivot traffic to that new version. And then once it meets a certain threshold, it'll just cut over all traffic to that green deployment. So, yes, within the cloud native ecosystem, you can do that, yeah. And then, can you have these tools engage in signing as well? So if copacetic triggers sort of, I'll call it a custom image, if you will, because it didn't come from the upstream, could you sign that just to indicate it came from this unusual source as opposed to the standard one. And maybe integrate that into the security features you've had based on sign. Like have the signature point that it was patched rather than. Right. Or that the patch tool was actually the signer of this, the originator, as opposed to the upstream's authoritative source. Yeah, that's an interesting one. I'm not sure if any of the signing technology can do that. But my teammate Josh Duffney, he actually put together an article that's very similar to this demo. But rather than just patching and pushing, he patched, signed, then push, right? So then once it's signed up there, then, yeah, it can pass the signature verification process at the cluster. Yeah, it just strikes me that if you're an org using something like Trivi probably uses signing, the Venn diagram of those two things is almost going to have full overlap from what I've seen. And if it comes out of the system completely unsigned, most places would have policy that wouldn't actually let it run aside from a demo. Right. That would be an interesting one. Just sign it to the point where like, okay, this is patched. It's not the full, it shouldn't be your full image. You shouldn't be running this. You should eventually. It's basically on a to-do list that you should look for an authoritative one with that patch later rather than run this thing forever. Yeah. Any other questions? Well, why don't you join me in thanking Paul for this great presentation. Thank you all for being here. It's been well done, Paul. Thanks. Thank you so much. Check, check, check, check, check, check. Check, check. So introducing Bernie Vu. He's the VP of strategic partnership and business development for Memverge. He has 25 years of experience as a senior executive for data center trend including Micro, Falcon Store, and Metalsoft. He's also on the board of directors, forces, data solutions. Bernie has a bachelor's in science as well as a master's in science and engineering from UC Berkeley and an MBA from UCLA. Well, thank you very much for that introduction. And I also want to thank you dedicated attendees for sticking around for the next to last session here at this event. Today, I'm going to talk about the title of this talk is about K Kubernetes operators for enabling hot restart of stateful applications. But I also want to talk more generally about the memory problems that the computer industry faces. So I'm also going to be talking a little bit about CXL technology. So all the stuff I'm going to be talking today is pretty much a preview of things that are about to happen in the industry or our proof of concepts that are very close to becoming realized. So here's the outline of my talk. I'm going to be first of all talking about this concept of transparent checkpointing and restarting given overview of that. And I'll talk about doing a CPU based checkpointing on Kubernetes. And then I'll pivot over to GPU checkpointing and restarting on Kubernetes. And then I'll go through a demo of this actually working. After that. Yeah, hello? Or shall I talk closer? Yeah, and then the next part of this presentation, I will talk about something called CXL technology, which stands for Computer Express Linked Technology. It's an extension of memory technology across PCI buses that's been adopted by the entire industry. And I'll talk about use cases for CXL, both in Kubernetes and also for accelerating AI workloads. And that involves memory tiering, memory pooling, and memory sharing. So first of all, let me start with the first topic, the what and why of transparent checkpointing and restart. The what is that the idea is to capture and restore an application's running memory state, machine state, and coordinate with data storage and not have any application changes required. Hence we call it transparent checkpointing. There are other kinds of checkpointing technologies where it's built into the application as domain specific knowledge. Many AI ML workflows, for example, know how to checkpoint during various parts of their pipelines and also during training epochs. But this is transparent checkpointing, which requires no application changes and allows operators to do greater flexibility in managing their infrastructure. This checkpointing I'm going to be talking about also is not intended for what we call live migrations, but this is really for hot restart of applications. So it does have a very fairly narrow recovery point, recovery time objective associated with this kind of approach. And the why of this is that we want to provide full tolerance for long running applications and workflows that do not have built-in checkpointing and restart. In the Kubernetes community, for example, people are actually trying to start running HPC workloads, batch workloads on a Kubernetes infrastructure, or they're running a lot more AI workflows, same thing. Some of these can take a while, and if they fail asynchronous to the built-in checkpointing, it could still be problematic for users. And another reason is that the platform and infrastructure operators, many would like to be able to rebalance and reprioritize workloads while minimizing disruptions. So for example, when you're autoscaling, you may want to autoscale and align GPU workloads better or have higher priority GPU workloads run ahead of other ones while not losing all the work that's already been done on the existing workloads. So there's a variety of reasons we see for having this kind of transparent checkpointing in those use cases. And another area is also for the ability to do branching, cloning, and preserving of workflows and pipelines. We've talked to people that want to save their... We've talked to SaaS providers that, for example, have Jupyter Notebook Clouds and they... People go home at the end of the day or they just forget their machines and then they come back and they're either reclaimed and everything's lost or whatever. This kind of technology could be used to automatically save all that stuff and shut down those instances and then bring them up the next morning. So there are use cases like that that we see interesting for this technology. So this goes into some more examples of using transparent checkpointing and restarting. One is just to be able to have periodic snapshots in case there is a failure, a hardware failure, whatever. We can roll back in time to a specific checkpoint and resume operation without having to go all the way back to the beginning. The second one is for time travel where there could be a multi-stage pipeline and you can annotate in your pipeline that you want a checkpoint done at this junction of a pipeline. And for some reason, if you want to roll back to that stage and iterate and go a different direction, that's a possibility. We've seen people in the computational biology area where they're doing machine calibration. They will literally clone instances of our compute and then run them in parallel because they have enough cores available so they can try different calibration settings and actually do a little bit of parallelization of their workflows in that fashion. And then I think another interesting one for Kubernetes is application migration. We have people that want to burst from on-prem into the cloud. They want to checkpoint that. They want to move it up to the cloud. They don't want to have to restart from the beginning. They want to happen transparently to the users. They want to run spot instances where you can save 70, 80 percent on your spot instances. They want to migrate from smaller instances to larger instances or back down to smaller instances because of running into OOM problems or actually over-allocated instances either way. And then last clones for workgroup collaboration like that example I just cited earlier about with Jupyter notebooks, how do you automatically preserve all these notebooks, everything, even the machine state, everything and bring them back the next morning and nothing's lost and people can be more productive in that fashion. So the basis of all this I want to make sure you guys understand this is we use Creo. Creo is an open source project that's been around since like 2012, so it's been out there quite a while. I believe it's used in Docker and Podman and Virtuoso's virtual VM migration and things like that. It's a relatively mature technology and I'm providing the link to the main page of that open source project. So this is sort of a core piece of what we've developed at Membridge that'll be demonstrating and the way it works is that it does the checkpointing of the application at the process ID level so it can go all the way down the tree and checkpoint all the child processes. And I've got an expert out of this but the high level way this thing works is it uses Ptrace to freeze the process and execution and then capture some of the machine state. The rest of the machine state has to be captured by this clever technique of injecting this parasite code into the application itself to drain things like file descriptors, signals, timers and events and then bring those out. And what happens is it gets dumped into a serialize and dumped into a file and then once this is done you can either, it either removes itself this parasite code and you can resume operation or you can just kill the process. For example, if the node's being shut down or whatever, you can just kill the process. And the restore process is basically the reverse of this. You launch this master restore process which recreates all the child processes, source out the shared resources, some areas have shared memory and then sends up a position independent code mapping in each one and then each child then goes about restoring its own state and morphing itself. So it morphs from Creo into the original processes essentially and gets it synchronized and then eventually it unmaps Creo and resolves all the virtual address mappings. And then the master restore process kind of unhooks itself is almost like a reverse of the parasite and allows the original process to resume execution. So this is a foundational piece of doing transparent checkpointing and restore. I think in the Kubernetes, it's an alpha mode I believe some variant of this used for forensics on containers, but yes. Yeah, actually I haven't seen P trace be the performance problem. Actually the performance problem has been more just the time it takes to transfer things around. Empty files and I'll just show you that and just, yeah we are and I'm trying to minimize that freeze time and I'll show you in a little diagram here in just a second how that works. So anyway, Creo itself we consider it to be a good start but there's other things that have to be considered and make this usable by a system admin or an operator. One is the snapshot overhead. How long it takes to actually freeze? It couldn't take several seconds. So again, this is not a live migration. This is what we call a hot restart type of use case that we're pursuing. And so there's overhead in terms of time. Like for example, we've tested this with some EDA workflows, half terabyte applications used for simulation of semiconductor designs. I think the freeze time maybe could be several seconds. So that's that order of magnitude. And then also there's an issue of the space consume. So if you take a snapshot of this memory and you just double the size of the memory, well you just double the size of the instance you might need. So that kind of problem. And then so compute resources. So all these things have to be taken into consideration, the drain time. And so the way we've been addressing this is to make CREO more usable in these more production like environments by doing what we call an asynchronous draining of the instance. And then also an incremental checkpointing. So right now in the default situations you just snapshot the entire memory and the memory is half a terabyte. Well, and you only can dump it at three gigabytes a second you have a big problem. If you have a spot instance situation where you get two minutes basically to drain this thing. So we're trying to address those kinds of problems. And then also the storage space consume using compression techniques to compact these files and remove duplications. And then the other trick to this is to integrate with external events such as for example in the AWS cloud you have these spot instance preemption signals. How do we integrate with that? Or the Kubernetes scheduler. Or we also have to work with resource managers like Slurm, LSF on the HPC side. And then cloud platforms people have things like AWS batch where they want to use this kind of technology or whatever. So security. So security is another aspect of this. And then a lot of times people even in Kubernetes are using temporary or femoral files on those instances associated with those pods or nodes. And so those also need to be coordinated along with data storage if you want to capture the entire state of the application and bring it up successfully somewhere else. We also have to deal with server licensing models. Some of these server licensing systems don't like it when this node disappears and pops up over here and things like that. So there's things like that. And then also Creo itself was not at least provisions for handling device drivers, but doesn't really address them. So you have to develop a device driver, a plug-in, or something like that to address it. There's another, I should mention there's another GPU vendor AMD, the MI300X, I think it is, or MX300i or something like that. That GPU actually also has Creo integration with a device driver. So before I get into that, let me explain more about how we try to minimize the interruption window for transparent checkpointing. In this example, we developed for EDA, but also applies to the Kubernetes use cases. We first of all initiate that freeze. So this red area, this is time versus the different parts of the stack, the software, the memory, a local SSD or an ephemeral drive, and then a file system somewhere out there, either on the cloud or on-prem, where you can deposit all these snapshots of the machine state. So we first have to freeze this thing, and that takes a certain number of seconds. And then while we're doing that, we actually go into what we call a copy on write mode where we can unfreeze it, but any new changes are stored in memory. At the same time, we are also putting the files, local files, taking a snapshot of those and copying the local files and the local memory asynchronously to an external file system where we can bring it back to another instance. So that's how we're compressing the interruption time. So we can get the transparent checkpointing overhead down to ideally one to 2% of the total runtime of an application per checkpoint. So there is some trade point there. If you take checkpoints every hour, and if you accumulate too many checkpoints, actually, then it becomes uneconomical. But there's some minimum where the right number of checkpoints versus the time to replay the job from that point makes sense. And so our goal is to actually minimize that amount of checkpoint overhead. And that's what we've been working on. And then now I'd like to talk more about Kubernetes. And most people, I see two schools of thoughts. Some people say just run stateless apps on Kubernetes, but we also see a lot of people increasingly trying to run stateful apps on Kubernetes. And some of these are fairly long running, or some of them are on public cloud instances. They want to use public cloud spot instances to save costs. And so that's why we've been working on this for the Kubernetes community. So another besides spot instances, another major use case is just hot restarting and rebalancing of GPU or CPU workloads, boosting utilization. One of the biggest problems today is GPU utilization. A lot of GPUs are only 25, 30% utilized just because it's not easy to move workloads around or they're actually memory bound or other reasons. And we'll talk more about that and give you some examples. And then also people, as I mentioned, want to save and restore their notebooks. They may want to, if they're a Kubernetes operator, they just want to drain the node or do auto scaling without having people get mad at them for moving their jobs around and having them take longer to run because they have to be restarted. And then also, again, a lot of applications are being brought over to Kubernetes. We're starting to see that now, especially on the HPC side. There's a whole track at KubeCon later on this week where it's just about HPC and Batch that are being moved on to Kubernetes where I think this kind of checkpointing would be beneficial. So what we did was we developed an operator and then what happens is we have a demon set in the worker node in a sidecar container inside the pod itself. And then we go to it. We can dump everything when we checkpoint a particular pod to a persistent volume that's designated by the user to preserve his privacy. And it's a two-step deployment process. We can use, first of all, we just install this operator that we've developed and then you can annotate the manifest for the particular stateful application that you do wanna have checkpointed and that's about it. And so, excuse me, now I wanna talk a little bit about something new. You guys get to be some of the first people in the community to hear about this. What we've been doing is, Membridge is a inception partner with NVIDIA and so we've been working on their GPUs and how to checkpoint and restore those. There are other techniques out there already where you basically kind of fence the GPU area and try to capture, basically log all the memory and capture it all while it's running, but this is a more direct implementation where basically NVIDIA in their CUDA driver is starting to implement its own built-in checkpointing capability, not that different than what Creo is doing. And so we're collaborating with NVIDIA to bring that to the community. So there'll actually be a presentation on this tomorrow or this week at GTC up north and then also I'll be talking about this at KubeCon over in Paris later this week. So what's going on is that the CUDA drivers, the newer CUDA drivers at some point, hasn't, you won't see it officially in the latest 12.4 CUDA driver, but not too distant future. You will see CUDA drivers that have capabilities to checkpoint. And what happens is that the, what happens is that the CUDA driver itself will implement its own checkpointing of the GPU area and basically dump all of its memory and state into a host system memory. And then you can either resume operations or you can just kill the process and then restore as the reverse. So it's very similar in many ways to Creo, except it's built into the CUDA driver architecture. And so we've been working with this on a POC level with NVIDIA to capture that state. So what happens is now we have a two-stage problem. We have to first of all freeze the GPU and then copy that into system memory and then freeze the rest of the CPU along after that. Well, sorry, I take that back. We freeze the GPU first, then we freeze the CPU and we're putting all that into a checkpoint image along with any file and object storage state. So it's a two-stage operation, freeze the GPU first, then freeze the CPU processes that are associated with that GPU and then put everything into a checkpoint image. So this illustrates that checkpointing and this reverse illustrates the restoration of that. So we have to first of all take the checkpoint image which is stored on a file system somewhere then recover the GPU state first and then bring up the CPU and release the whole application. So now I'd like to just show you a quick demo of this. Again, this is one of the first times we've ever previewed this demo. And what we're gonna do is we've built a Kubernetes, actually it's a GPU and CPU snapshot operator that will drain a node and then migrate it to a, well, the schedule automatically migrated to another node. So let me run that. And I apologize, this is a little of an eye chart. Basically, first thing going on down here is we're just discovering the, showing the, displaying the nodes. There's two workers there. This is a T4 GPU on Amazon. Then we're launching the, up in the upper corner there, we're launching the operator and this case we're running a TensorFlow MNIST demonstration GPU training load. And then on this upper right, you can see the container being started up on the particular worker. And again, I apologize for the size of the font here, but I have some good news for you if you wanna look at this again later on. Then we turn on the logging so you can see what's going on. So basically the TensorFlow application is being launched and compiled and all that good stuff. And then down here, what we're doing is watching all the training epics go by. And while that's going on, the epics are counting upwards, so it's up to epics seven or so. We're issuing a node drain command down below in this console as if we're gonna take it down for maintenance. And then over here, you can see it terminating in the upper right corner that node and then it'll restart the other worker with that same workload. That's where it's going here. Container's being recreated. And then down below here, the key thing is you'll see this thing resume right now is that epics seven. Normally if you restarted this thing, it would have to go all the way back to the beginning the whole training cycle and go from there. But here, we're actually seeing this resume from where it left off. Let's see here. Yeah, there it goes. This is running. And there you can see it starting to advance again. So basically that's the end of the demo. I just wanted to show you that this is possible now to be able to stop a training workload on Kubernetes, do what we call a hot migration to another node and restart that pod and pick up where it left off. Okay, so let me move on. So yeah, again, that was an eye chart. So what we did is I recorded a bunch of demos and if you click on that square, you can view them at your leisure. But we just show an example of a TensorFlow on a bare metal checkpoint restore. Here we're also running a more like an HPC workload. So NVIDIA has something called parabricks. If anybody's from the computational biology side, there's this secondary analytics that goes on when you're doing DNA sequence that we're trying to line up the sequences where the reference and parabricks now has been put on GPUs and we can demonstrate that running in a checkpoint restore fashion. Then we have our own opinionated version of checkpointing in restore called memory machine cloud. And where we do a migration of parabricks from one instance to another spot instance. And that has to be done for Amazon. You only have 120 seconds from the preemption signal to when the node gets terminated. So we have to do everything pretty fast. And sometimes these nodes get pretty big. And then the last one is a TensorFlow example on Kubernetes I just showed you. So you can look at that again if you want to see that. So those are demos. And so what's the head? Right now we're working with NVIDIA to complete some modifications to their CUDA driver. It's about to go into preview mode. So if you're interested, you can email me. I can let you know exactly when it goes into preview mode from NVIDIA. And then also we are working with NVIDIA. We're going to coordinate and contribute and upstream some changes we had to make to Creole to accommodate this implementation of checkpointing. The reason being that, again, this is a two-stage phenomena here, we have to keep the GPU process IDs alive and long enough to checkpoint those first. And then shut them down along with the rest of the application. So there's some modifications that will be made. And then we'll be working on production-grade applications and operators. But my call out to you guys is what use cases do you see for transparent CPU or GPU checkpointing? I kind of gave you some ideas of what we're thinking about. But we're always looking for people to collaborate with on this. And so I mentioned earlier we have an opinionated version of checkpointing and restore for batch operations on clouds like Amazon. And it's called our memory machine cloud. And this is a way that we can combine not only migration based on using spot instances, but we also can combine that with our own kind of auto-scaling or vertical resizing. And this, again, is for batch operations. It's outside of, it's more for the HPC community. But this gives you an example. So again, I'm going to use a computational biology example. This is kind of the CPU and memory profile as a function of time for a metagenomics workload. And you can see it goes up to a couple of thousand seconds in time. And by just turning on, if you just use an on-demand instance on Amazon to run this workload, it costs $9.10. Now if you turn on what we call WaveWriter, and WaveWriter is a capability that allows us to detect whether there's too much memory or not enough memory. So here you can see the beginning of this workload, there's a lot of memory here being used. It drops off and we automatically checkpoint this thing and resize the instance down and then bump it up slightly here. So there's two, there's actually three different stages here where we're migrating this thing automatically or what we call WaveWriting it based on the size of the memory or in some cases also on the size of the CPU utilization levels we can automatically migrate. So if we automatically migrate with on-demand, we can cut the costs by about 17%. Then if we combine, if we just use spot instances alone without WaveWriting, we can cut the cost by about 67%. And then if we use spot instances plus WaveWriting, now we can get down to a 75% reduction in compute costs. So just give you an example of ways to use checkpointing in your own data centers. Now I'm going to pivot to the second topic of the day, which is computer express link. How many people have heard of CXL? Anybody raise their hands? Nobody? OK, I'm not surprised. It's relatively new technology, it's emerging. And let me explain the motivations for it. There's three motivations. The biggest one is what we call the AI memory wall. There's just an insatiable appetite of AI to have more and more, especially high bandwidth memory. And this is not only for training, but this is also increasingly for the inferencing side of the workflow. So the training side here, we're just showing how fast the number of how many billions or trillions of parameters are being used. And those have to be stored somewhere. They have to be retrieved quickly, et cetera, et cetera. And there's only so much of this that can be contained in the HBM memory of GPUs before they start melting, literally. There's issues like that. And then so that's one problem. But also the other problem that's out there is for cloud operators. A lot of times they're allocating by workloads and packing them into virtual machines or pods or whatever by CPU core, driven primarily by CPU core requirements. But that also could cause a lot of memory just to be underutilized or stranded. And so Microsoft estimated that up to 30% of the memory in their servers is stranded or just allocated but not used, things like that. So there's that kind of issue. So one is a shortage of memory, this first one. This one is actually underutilized memory because it's trapped in a local node and it cannot easily be reallocated elsewhere. And this third one is just a general problem with systems that the core count is going up and the memory bandwidth is not keeping up with it. So the CXL consortium, which stands for the Computer Express Link, formed several years ago and it basically includes 255 members. Basically, as you can see here, the entire hardware industry has joined this consortium to help solve this particular memory bound or memory wall problem that the industry is facing. And I think it's going to be increasingly more challenging in the AI world. And they have developed a specification starting in 2019 of something called CXL 1.0 and then 1.1, 2.0, 3.0. Basically, this specification today is at this 3.1 level and that pretty much maps out the next several years of how CXL will be rolled out. And what I'll do now is I'll show you what that rollout looks like. First of all, the thing I want to let you know is that memory today, DRAM memory today, runs on a DDR bus. And it's a parallel bus. It takes up a lot of space in your system. And the whole idea of CXL and just the physics of it and does not lend itself to scaling more memory because of this parallel bus. So people are starting to use the serial protocol and the standard now in the x86 world is PCIe. And it scales by a number of lanes. So it's a serial process. And today, the industry is shipping at PCIe Gen 5 right now. And so this is the roadmap of how the PCIe generations as well as the number of lanes. So a number of lanes like a number of lanes on a freeway can scale up. So you can see here, when PCIe Gen 7 comes out in about two years, we'll be up to about a 512 gigabytes a second here with 16 lanes of PCIe. So this is a much more a better way to scale coherent memory. And then the CXL use cases. And I'm just borrowing these slides from the CXL Consortium. And there are three types of use cases that are being addressed by CXL. The first one is what they call a Type 1 device, a caching device, or accelerator. So the classic would be a NIC card where you can load a read directly from the cache in the NIC card and transfer things faster that way. And it's done in a coherent fashion. So all of this is being done in a memory cache coherent fashion. Second one is Type 2. And this is intended for accelerators and GPUs where they can trade off sharing this memory and who has the right rights to that memory. And then the CXL Type 3 is for memory expansion where you can add more memory now on the PCIe bus to expand the memory. So a lot of systems like, I think the new AMD systems, they may only have one DIMP per channel or something like that, one DIMP slot per channel. And so one way to expand those is to use these CXL memory expansion approaches. And then when it gets to Type CXL 3.0. So 3.0 is where it gets really interesting because now we'll be able to create disaggregated memory pools and then also allow memory sharing of that memory pool among multiple hosts. And that's also not too far around the corner. The industry is already demonstrating prototypes of this this year. And what we're showing here is a CXL switch, a PCIe-based switch that supports the CXL protocol and multiple hosts up here that are accessing these memory devices down below. And these memory devices can be partitioned into what they call MLDs, Multiple Logical Devices. So for example, each one of these could be a 256 or 512 gigabyte memory card built by your favorite memory supplier, Samsung's SK Hynix or Micron. And then you can take, I believe, like 2 gigabyte slices of this thing and allocate them to different hosts. So all sorts of interesting use cases can come out of this. And then also there are cases where you can actually share that same memory address space between two hosts, which is something that we're looking at also. So this is a little bit of a detail, but it just summarizes those specs, the progression of the specs, and the increasing number of capabilities. So the initial CXL 1.1 spec established in 2019, those products are just starting to ship in the second half of this year. And those will just basically allow you to expand memory in existing computer servers, either Intel or AMD, Genoa, and some flavors of ARM processors, and basically allow you to create a fat memory node. So some workloads like more memory, people are running graphs or some other kind of analytics, maybe better to have more memory and fewer partitions or whatever. And then CXL 2.0 also is being previewed along with 3.0, which will allow you to do, first of all, pooling, and then at least add a single switch layer. And then CXL 3.0 and 3.1 will allow multi-tier switching, kind of like you would see in a regular fabric architecture, multi-layer switching to allow even bigger pools to be created. At the same time, there's also new memory technologies coming out, like memory semantic SSDs, which will allow, give us a lower cost type of device, but that also speaks in memory semantics load store fashion. And I think that's going to be important as we go forward with AI and ML workloads. This is more of an idea of the software stack. And so our company's been busy contributing to the Linux kernel to enable CXL capabilities. So right now, this is a timeline. So from left to right is what we have today down below and then future additions to the kernel to enable CXL capabilities. And then up here, we're showing the user space and the standard MIMEF and malloc kinds of capabilities. So first of all, the first way CXL can be used is as a DAX device. And then there are already hardware vendors that can interleave the, so you may have some of your memory sticks in a DIMM slot and other memory now inserted into PCIe. Sorry, we can basically interleave across those or kind of like stripe if you're used to RAID technology between all those memories and get higher overall bandwidth. So that's one capability that's actually being built into some hardware or capabilities. And then the second approach to using consuming CXL today is to have it presented as a zero CPU numinode. So basically, that memory now looks like a one-numa hop away kind of memory and use auto-numa or something like that to utilize that memory in the application. And in some cases, companies like ourselves were building memory tiering software where we can more intelligently place the memory pages either to make them basically numo aware. So we can move pages that are hotter to the closest point to the compute itself. So we may be doing hot and cold page demotion. There also is capabilities that also have been contributed to the latest kernel called TPP transparent page placement. Most of that work was done by Meta that will also help you automatically move pages around hot and cold pages around in a CXL context. And there will be further extensions to this to allow for shared memory. And then right now, there's a hardware level interleaving, but members also is contributing to the Linux kernel-based interleaving. So now we can do weighted interleaving. So you may have, for example, 80% of your memories in DRAM and only 20% of it is on the CXL, well, we need to weight it accordingly. How much, how much, how that interleave is done, not just mindlessly just use up all the memory in some sort of striping fashion at the hardware level. So things like that. And then also making sure that certain things like the kernel functions themselves don't get subject to being moved to remote memory areas and things like that. So anyway, there's a whole roadmap of this, but there's already enablement today of CXL going on. And this is some of it. Just to make this more tangible, some server vendors are starting to ship CXL-enabled servers. And there's two ways to install memory. One is to use these memory modules that look just like NVMe modules and plug into a similar slot, but it does need to be CXL-enabled. It's called E3.S. And those come in fixed capacities from this example. Samsung has 128, 256, and 512 gigabyte expansion modules. And then another approach is to put an add-in card. This add-in card here can actually hold two terabytes, eight sticks of 256, two terabytes of memory, and would take up two PCIe slots. So this is how you would create what I call a fat node, one that has a lot more memory than you would normally have. So there are certain workflows where I've seen 40 plus gigabytes per core needed to run those workflows effectively. And this gives you some examples. In the HPC database area, we can get faster time to result by these fatter nodes and higher price performance ratios. And then by using memory-tearing software, the stuff our software or stuff built in in the kernel, we can consolidate nodes. So in one case, we had a situation where 100 nodes with 64 gigabytes a piece of memory that took two weeks to run, had lots of failures, oom kills, data skew problem, whatever. We consolidated in a one single four terabyte node and took four days to run. So from an economic standpoint, you could start with a baseline of a single node of 64 gigs. If you want, you could scale it out. So everything, basically, the queries per second would double, but then also your server costs would double. Or the alternative is you can, in some cases, double the queries per second, but just by adding more memory. So some of this is a little bit like years ago, I remember Fusion IO came out with a technology where they added basically local cached NVMe memory to a node and they could consolidate or increase CPU utilization. In that case, they were reducing Oracle licenses and things like that required. Similar phenomenon can happen here with memory expansion. And there's two ways that we can tier or place memory more intelligently that we've been working on. One is based on what I mentioned earlier, hot and cold pages of memory and promoting or demoting memory and using various profiling hardware or other types of profiling techniques to detect when we should move pages around. And then the second one is to do bandwidth optimized memory placement. So some applications are capacity bound, memory capacity bound. Some we find are memory latency bound and other applications are memory bandwidth bound. So figuring that out is part of the key thing. And so for the bandwidth bound memory applications, there are various strategies of waiting the interleave between the memory in the system DDR bus versus the memory on the CXL. And so we're still experimenting with some of that stuff, but there already is some very promising results. So as a matter of fact, at GTC later on this week, the NVIDIA conference will be showing this demonstration at the Micron booth. And this is a batch inferencing operation. It's using a 66 billion parameter model. And this is the batch size and prompt length 512. And basically what we're doing is we're comparing system memory plus, again, this is a batch inferencing problem. So we're comparing system memory plus NVMe versus system memory plus CXL, CXL RAM memory. And what we find is that we can cut the total wall clock time. So what you're seeing down in this bottom axis is time in seconds. And then this axis is GPU utilization. So basically the bottom line is that, and this blue line here represents the CXL memory offload versus using standard NVMe disk offload. And so basically what we're seeing here is that we've cut the total batch time execution in half. And we have doubled the GPU utilization. So it's running at about 90% instead of more like about 35%, 40%. So it went on both sides there basically. Faster time to insight and better utilization of GPUs. And GPUs are very hard to find. So this could be an approach for people that do batch inferencing to speed up their throughput and performance. And then on the pooling side, that also is in a prototype stage. So there are people out there building these CXL switches that will allow multiple servers. I'm just showing two here. But connect to a memory appliance or a J-bomb, just a bunch of memory. And allow them to either share it or at least allocate from a pool to give you a more what we call composable infrastructure. So you can have a composably dynamically allocate this memory to different use cases. And one example of this will be in the Kubernetes area where you can imagine instead of having to just be unkilled or whatever, or having just auto-scale migrating nodes from one node to another bigger node, you can imagine another type of pooling where we allocate, instead of the action being kill this thing and move it somewhere else, is give it a squirt of memory. And so it overcomes whatever the transient skewing or auto-memory problem is. So this is something else that we're working on. So I don't want to go through all this in detail. But this is being done also in conjunction with the Open Compute project, which I'll talk more about in just a second. And then here's another example. We've been working with AnyScale on how to. The AnyScale architecture is interesting because it is one of the first that has a distributed shared memory object data plane as part of its architecture. And so what we think we can do is possibly accelerate that performance of that by not having it distributed but actually shared among all the ray nodes. So we are prototyping a project called Gizmo that allows you to share those memory objects. And what that does is reduce the number of memory copies and also probably gets rid of a lot of communication overhead because of the network's traffic to move this data around. And I think this will be significant because when we look at the, especially on the inferencing side, we expect things like key value caches and things like RAG, Retrieval Augmented Generation, are going to require huge amounts of in-memory based semantics and memory based architectures to support those kind of AI workloads. They don't work very well with block or file access type of modalities. They need that granularity. So just to wrap up, if you guys are interested, anybody's really hardcore interested, there's something called the Open Compute Project sponsored by Meta and other players in the industry. And within that Open Compute Project, which is just designing general data center scale stuff, there is a subgroup working on 6L-enabled systems. That's called the Composable Memory System Group. And so if you're interested in participating, this is one way to get involved in that community. And so I'll just leave that up there. And with that, I'll wrap up and answer any questions. Thank you. Yes. For checkpointing? Right. So it looks like the way that works is it's running on the Linux system that you're checkpointing and it's modifying code or a memory. And then when you go to the next diagram where you're actually checkpointing a container, or I guess was it even a pod? Yeah, we could checkpoint a pod. Is this running inside all of the different instances or all the different nodes in that pod? Only on the nodes that you designated. So if you designate this as a stateful pod, we'll inject a sidecar in there and checkpoint that only. But it looks like you said you could do recursively down a process tree. Yeah, that's Creo itself. But you can't start from system D or a knit, right? Because then that will include the process that's actually doing the checkpointing and it'll freeze itself, right? Yeah, I think. So how do you actually checkpoint the whole container? Well, this may be above my category. But I think there's situations where you have a knit and then the actual running container. And so I think we have a way to. Right, I mean, I don't know if there's an a knit per, per container, but per node anyway, right? Yeah, within a pod, sometimes there's multiple containers and one sometimes just used to a knit the thing. And then the other one is the actual executing thing. So all the processes for a given pod are in a subtree, you think? Yeah, yeah, so the ones that are running over here, not the a knit, we'll checkpoint and bring up elsewhere, yeah? Okay, any other questions? So just as a data point, I until recently worked for a large hedge fund and they use Creo checkpointing like this to run lots and lots of simulations on spot instances on the whole cloud. So similar kind of thing. Yeah, yeah, yeah. In your product where you're sort of adjusting the memory for an application. Yeah, wavewriting what we call it, yeah. Wavewriting. I obviously can understand how you sort of deal with like memory decreasing, but do you have instances where memory jumps right suddenly and you can't sort of catch that before the app crashes? Yeah, we could set thresholds before it reaches some point and then before it ooms or whatever happens to it. So we have to set a threshold and that's part of our policy and there is some heuristics involved. We could run this thing and we kind of learn the behavior of this thing over time and go that way, yeah. Good questions. Good question. With these CXL memory switches, I assume that technology-wise they're probably not terribly different from a modern like Ethernet switch? Yeah, there are PCIe switches out there already. Okay. And now these PCIe switches are being augmented by the CXL protocol. And so quite honestly, one of the challenges will be extending the coherency domain to a much larger area. And so I think that'll be rolled out gradually. Initially, what we're showing here with this shared memory object, for example, is really more for analytics or ML, AI, ML training, where it's kind of a right once object, immutable object kind of model. But we're also finding use cases for this kind of initial implementation of CXL in the pub sub area for other things, high-frequency trading, market tick data, things like that. So there's some interesting use cases starting to pop up now. Okay, any other questions? If not, I am going to boogie out here and run the KubeCon. So thank you very much. Appreciate it. Thank you.