 Okay, cool. It's 11 o'clock. I think we'll go ahead and get started. Sorry, before we begin, I just have one thing I need to do real quick. I've been working on this custom controller. I was kind of hoping to have it done before now, but I didn't, so if you don't mind, I'm just going to take a couple of minutes and deploy this thing real quick. Cool. That's going to be a problem. Yeah, live demos, right? See if this works. Cool. We're good. Sorry. This is the thing I've been working on. It's supposed to help me out during conferences like this. I called it a con job. It's kind of like a cron job, but it's for conferences. Anybody? Thank you. Thank you. Thank you. All right. Yeah. All right, let's go ahead and get started. My name is David. I am a research scientist at Applied Computing Research Labs. I'm going to be talking today about running large-scale scheduling simulations using Virtual Cubelet. So let's go ahead and jump in. I want to give you a quick overview of what Applied Computing is. So we're a small business doing research and development in distributed systems. I'm particularly interested in scheduling and modeling and optimization. I've been doing this sort of in industry for the last eight or nine years. Earlier this year, I decided, you know, I kind of, my background's in academia. I'm really passionate about open source. I kind of want to take the stuff I've been doing and not do it for a company, but kind of like put it out there into the open. So that's what Applied Computing's been doing. It's been a great journey so far. I'm sure some of you know when you're running a small business, you have to wear a lot of different hats. So this is my developer hat. Today I want to talk to you about a project I've been building for the last few months called SimCube. This is a project that I initially started just kind of for myself. It was a tool that I thought I was going to need in order to get to my real goal around scheduling and optimization and these sorts of things. But as I started chatting with more and more folks, I'm like, oh, this is something that other people actually care about and are interested in as well. Maybe this can be like a thing on its own. So to hopefully motivate this talk a little bit, I want to answer the question, why simulation? I want to talk a little bit about what kind of simulation I actually mean because that can mean a lot of different things. And to do that, I'm going to present four different user stories. So first you can imagine maybe you're an infrastructure engineer. Maybe there was an incident or an outage on one of your Kubernetes clusters last night. You, I don't know, you're not exactly sure what went wrong. You were able to like mitigate it. You deployed some stuff. It stopped the bleeding. But you're not really sure like if that's a good long-term solution. You're not even entirely sure what the problem is. Wouldn't it be great if you could just capture a trace of all of the events that happened during that incident. Replay it on your local laptop. Go in, dig into the logs, dig into the metrics and really understand the problem. That would be awesome, right? Also, wouldn't it be great if you're like, okay, I've got this fix. I think it's going to work. But I'm not 100% sure. So wouldn't it be great if you could like then take that same simulation, replay it with your fix supplied and then demonstrate to your manager or to whoever that like, hey look, this actually did solve the problem. If we had had this in place, then we wouldn't have had the incident. That would be really cool. So second user story, imagine that you're maybe a CICD engineer. You're responsible for all of the pipelines deploying to all your different clusters. Maybe you want to make sure that people aren't introducing regressions into your Kubernetes config, especially those darn infringeneers. And so wouldn't it be cool if as a step in your CI pipeline, you could just run a simulation and observe particular metrics to say that, hey look, after you deploy this change to your controller or to your Kubernetes config, things are still behaving the way that we want them to. Cool, right? Maybe you're a guy like me. You're really interested in scheduling. You've heard about these like scheduler plugins. Cube scheduler has like all these different parameters. You can change the parameters. You're not really sure if they, like they're kind of hard to reason about. Like if you make a change here, does that actually like improve your bin packing efficiency or whatever other metrics you care about? So wouldn't it be great if you could just run a simulation on your laptop, tweak these parameters, and see how your cluster efficiency changes. Again, using real production data. Maybe take that a step further. Maybe you even are doing like hyperparameter optimization where you feed this whole thing, this whole simulation into a ML engine that actually figures out the best scheduling parameters for you. Or maybe you're a ML engineer. You've got all of these batch jobs that you need to deploy. Kubernetes doesn't have great primitives for running batch jobs, especially around like gang scheduling or some of these like really important ML, you know, if you're running a LLM or diffusion model or whatever. And so you've heard of these projects. You've heard of Volcano, which is an alternative scheduler for these types of batch workloads. You've heard of Apache Unicorn. There was a talk yesterday about Q, which is introducing some new primitives into the Kubernetes job. And you're like, I want to try out these different things and just see how they work before I deploy these into one of my clusters. And wouldn't that be great if you could just kind of try out all of these things on your laptop. Again, using data from your actual production systems. That'd be super cool. So that's what we're going to be talking about today. We're really focused on simulating the bits of your cluster that are sort of involved in like your control plane and all of your custom controllers and objects. So we're not so much concerned with simulating the applications themselves. So this is a high level architecture diagram of the project I've been working on, called SimCube. There's kind of a lot going on here. We're going to dig into this a little bit and hopefully like explain a little more what's going on. I want to start with the most important part of this diagram is me. Hi. I'm the problem. It's me. Any Taylor Swift fans? No? No? Okay. Thank you. Thank you. Actually, that's maybe a little bit self-centered. Let's swap that out. This is you. Hopefully on the last slide I was able to motivate simulations useful for everybody regardless of what your role in the company is. Regardless of what you're doing, simulation can be helpful. There's something missing here. Oh, you'll need a fun hat. Cool. So SimCube has six different components which we're going to dig into. And these are highlighted in orange on this slide. Again, sort of going from the previous slide, the most important bit here is the way that you as a developer are interacting with the system. So we've got a command-line utility called skctl or scuttle if you prefer, that allows you to both talk to your production clusters and export data from them and then replay that data on a simulated cluster. Going clockwise around the left, we have this component that runs in your production clusters called sktracer. The job is literally just to sit there and collect data. And then when you as the user say, hey, I want to actually do some work with this data, you can call scuttle export. It will talk to sktracer, and sktracer will save a trace to some persistent storage. Okay? And then once you're like, okay, now I want to actually do something with it, you can do scuttle run. This talks to some components on your simulated cluster. This cluster can be, for example, a kind cluster that's running on your laptop. It could be some other cluster that's maybe running in a dev environment. On your simulated cluster, we've got a few different things that are running. skctl is a standard Kubernetes controller. It's just watching for a simulation custom resource to get posted to the cluster. That custom resource, when it sees that, it's going to spin up a driver. The other job is to download the trace of the simulation and replay it on your simulation cluster. Now, the reason why this is scalable, the reason why it's possible, is these last two components. We've got skvenode and skcloudprov. These are components that sort of mock out everything else in your cluster. And so let's dive in a little bit into what these look like. skvenode is a virtual kubelet-based node implementation. If you're not familiar, virtual kubelet is a project that implements the kubelet API, but it allows you to wrap kind of whatever you want behind that API. So there are virtual kubelet implementations for things like Azure Batch, AWS Lambda, and it allows you to treat these things as nodes in your Kubernetes cluster. In our case, we're not wrapping anything. We're just pretending like we're a node. skvenode is going to sit there. It's going to listen for pod objects. When it gets a pod object, it's going to say, hey, your pod's running, but it doesn't do anything. There's no docker, there's no containers, there's nothing. And so you can spin up hundreds of thousands of these things just on a local laptop. It's very lightweight. Cool thing here is that the node properties that you're simulating are configurable. So you can just do a kubectl export and then your node definition from your production cluster. You can take that config and you can apply it into your simulated cluster, and voila, now your simulated nodes look exactly like the nodes in your production cluster. So this is pretty cool. One other thing that is important for certain types of simulations is skvenode will watch annotations on your simulated pods to control their life cycle. So you can set a particular annotation on the pod that said this pod ran for 115 seconds and skvenode will report that that pod's running for 115 seconds during the simulation and then it will report that the pod terminated. We'll see why that's useful in a couple of slides. The other component here that makes this all sort of possible is skcloudprov. This is a GRPC cloud provider for cluster autoscalar. So if you're not familiar, cluster autoscalar interfaces with all the different cloud providers, but it also provides a custom interface where you can write your own. And so here we've written our own cloud provider. All it does is it talks to the deployment managing skvenode and it scales that deployment up and down. So you can see like, oh, my cluster has increased or decreased the number of nodes that are running there. One sort of, this is something that not a lot of folks knew about. It's a newer feature in Kubernetes, but I think it's really cool. There's a pod deletion cost feature for the replica set controller. What this does is it allows you to set an annotation on pods and the replica set controller will delete pods that have lower cost first. And so the way that this works is cluster autoscalar has to be able to specifically terminate this node. It doesn't want to terminate nodes that have running pods on it. And so the way that we implement that in our simulated cluster is we annotate the nodes or the pods that we want to terminate with this pod deletion cost. These slides are available online. There's a bunch of clickable links on here, so if you want to download these after the talk or right now and go get some more information, you can do that as well. Cool. So that's sort of on the simulation side. Now I want to take a look at the production side. What is actually going on inside of SK Tracer? So SK Tracer literally is just a pod that sits on your production cluster and it puts a watch on the API server for resources and pods. You can configure it to watch any type of resource that you want, deployments, replica sets, stateful sets, whatever you want. You can also configure it to watch any sort of custom resource. So if you're running Volcano and you've got your own custom controller and CRD, you can configure SK Tracer to watch those as well. And all it's doing is it's recording a timeline of important events. What do I mean by important events? Well, these are any time that these objects change in a way that might be relevant to the simulation. So I've got a couple of examples here. On the left, you can see that there's a deployment. Maybe the deployment is created at time zero. Shortly thereafter, you know, it spins up a replica set underneath the hood and then that replica set spins up a pod. That pod, pod A, runs for a short period of time. And then maybe something, a human operator or HPA decides, oh, this deployment needs to scale up. It increases the number of replicas and then it creates two more pods, pods B and C. So SK Tracer, what it's watching for is it observes that the deployment was created. It then observes that pod A was started and it's able to attach pod A back to that deployment. And then it observes that the replicas for the deployment changed sometime later. It records that event and it also records that pods B and C got attached back to the deployment as well. And so then it's able to take all this stuff and replay it later on. Another example here is maybe you've got a cron job. It's running things periodically. And so here we've got an example. One runs for some period of time. Then pod two runs, then pod three runs. This is where your life cycle annotations become important because when you want to simulate a cron job, these aren't long-running things. So you need to know if you're going to run the simulation, you need to know this pod started and then it ended. And so you're able to, SK Tracer is able to capture that life cycle information and record it so that it can be fed back into our simulation. Okay. Cool. So let's move on. Okay. This is a little embarrassing. I just got myself paged in the middle of my talk. Okay, hang on. Saying something, one second. Saying something about there's a bunch of pods and pending. This doesn't really seem like my issue. Sorry, I'm going to go ahead and just reassign this to one of our infrastructure engineers. Maybe they can deal with it. The server's on fire. On fire. Sorry, I saw several of you visibly flinch. I hope I didn't impose any undue trauma here. One second. Okay, hi. I'm David. I'm the infrastructure engineers at Applied Computing. This is my infrastructure engineering hat. I just got paged for this thing. I'm really frustrated with our developer. He just kind of yolo stuff out to production. He doesn't test anything. So anyways, we're going to have a chat after this is over, but maybe this is an opportunity to kind of talk about some of the tools that I've been discussing here. Let me see if this is working. It may not be working. That would be disappointing. Let me switch over to our... Oh, you know what? I don't have the port forwarding setup. Nope, it's already working. Hey, live demos, right? Okay, let me see... Maybe the Grafana dashboard isn't working. Let me see what else I can show you here. It isn't Grafana. Okay, so it turns out actually the controller that I deployed at the beginning of the talk is not running anything at all. So I don't know what's up with our pager-duty config. Let's do this. I'm going to skip the live demo, and I'm going to talk a little bit about what you might have seen if the demo were working. So as I mentioned, we do have a command line utility that allows you to talk to the various pods here. Let me just see... Let me just see if I can figure out what's going on one second. What I'm going to do, actually, we're going to skip all of that. We're just going to go back to this diagram here. I'm going to tell you a little bit about what's going on inside of SKControl and SKDriver. So what's happening here is when you post this custom resource into your simulated cluster, we've got a controller that's watching for this thing. It does some initial config, and then it launches this driver. The way that the driver works is it's got two components. It's got a simulation runner, and the runner itself downloads the trace object that we talked about previously, and then it replays that trace. So it's just able to look at all of the data in the trace and apply it to the API server in your simulated cluster. The other bit that is part of the SKDriver is it installs a mutation webhook. And the reason for that is whenever pods get created in the simulated cluster, I want to redirect them so that they're getting replayed onto the simulated nodes. And so you're mutating webhook. It intercepts all of the pods, checks to see if it's part of the simulation, and then it applies particular labels, tolerations, annotations, and then it also is applying those pod lifecycle annotations as well. And so it's able to sort of track throughout the lifecycle of your simulation. You know, these pods belong to this simulated object. They're supposed to run for this period of time, and all of that information just kind of seamlessly happens in the cluster. So this is pretty cool. Let me see if there's one other thing I can show you here. Cool. So I want to show you what the trace object actually looks like. So when you run Scuttle Export, this is what gets saved. So here we have, at the beginning, we're just saving sort of the config that we configure at SK Tracer with. And then the important bits here are these timeline objects. You can see there's kind of an initial marker in the trace that says the simulation started at this time. And then from then on, it records time stamps of interesting events. So we can see here, at a particular time stamp, it applied some objects. And we're literally just saving the, it's kind of a sanitized version of the raw Kubernetes YAML, or the raw Kubernetes manifest. And then we can keep going down here at some later point in time that same object then got deleted. And so this is what it looks like. It's all saved in a binary format that's JSON-esque. It's, I think, reasonably efficient, so it doesn't take up a ton of storage. But this is, you know, now this is what gets downloaded and replayed. So what I'm going to do is let's move on. I want to spend a little bit of time talking about what are next steps. You can all imagine that I gave you a really fascinating demo. I have some applause for my, thank you. Thank you, thank you, thank you. And there was a hat, so you didn't leave completely disappointed. Let's talk a little bit about what I want to do next here. So one thing that I think is really important is being able to actually compare and visualize these results. Right now you can replay this stuff and you can look at it on Grafana or whatever your monitoring tool of choice is, which is great. Like I love Prometheus, I love Grafana, but it lacks some of the more data analysis type of features that I want to be able to do. You can imagine being able to take two of these simulations and maybe compare a diff, or like what is the, you know, what is the mean scheduling time or what is the mean like cluster efficiency or something along those lines. Being able to answer those types of questions is really important, and I would say that it's something that in the Kubernetes ecosystem we're kind of lacking those tools right now. There was an interesting Grafana talk yesterday where they were demonstrating some features around, not for simulation, but demonstrating some new visualization tools which I thought was really interesting, so maybe check that one out. But I think there's a lot of really interesting kind of green-filled work to be done here. Another thing that I really want to be able to do is right now what you can do is you can take a trace from your production cluster and then you can replay it. This is, you know, I think really useful. What I want to be able to do is start answering what-if questions. So not just like, well, this happened, but like what if this happened? What if we had a deployment that scaled up to 10,000 pods? What if we, you know, we're running this ML thing and one of the jobs terminates? Like how does the cluster respond? So I think there's a lot of work that we can do around being able to take these trace objects and modify them or even like generate completely hypothetical traces and then apply those in a simulated cluster and see what happens. I think that that could be a really powerful thing. There's also, there's a ton of duplicated effort that's being done right now. So I don't know if you're familiar with Quok. Quok is Kubernetes without Kubelet. They have a very similar setup to me. They don't have all of the tracing and replay components, but SKV Node, SK Cloud Prove, they have their equivalent thing over there. There's actually two Quok talks happening at KubeCon. One of them's happening right now. So if you're here, sorry, I guess you missed that one. Maybe they have a better live demo. There's another one tomorrow about doing large scale, like scalability testing using Quok. That's tomorrow at 2 p.m. I really encourage you to check that out. There are other projects as well, like KCP is the Kubernetes control plane. It's just running the control plane. It doesn't have any of the Kubelet stuff behind it. Virtual Kubelet, of course. There's things like KubeMark where they have their HoloNode implementation. So there's a ton of duplicated work here that I really think that, let's stop duplicating all this stuff. Let's settle on one thing. I'm hoping that maybe some Kube can be a part of that. The last bullet point here is a little bit interesting. One thing I didn't tell you is about half of the stuff here is written in Rust. Maybe that's exciting to you. Maybe that's like, oh my God, what is he thinking? When I started this project, it was a thing that I thought I wanted. This was an opportunity for me to learn some Rust, learn a little bit about the Kubernetes ecosystem around Rust. Side note, the Kube RS project is fantastic. I love it. But as more and more folks have demonstrated interest, I recognize that might be a hindrance to adoption. So I'm not committing to this, but maybe there's a go-ling rewrite in the cards. R-R-I-G, yeah. Rewrite it and go. If that's something that's of interest to you, I'd love to hear more about your use case. I'm not making any solid promises there. There's a ton of other work to be done, but as I mentioned, kind of at the beginning of the talk, we're an open source first company. Everything we do is kind of out in the open, so I would love to have more contributors to this project. I got my first PR this morning, which is super exciting. If you're interested in contributing to SimCube, or you want to use it in one of your clusters, or maybe if you're interested in sponsoring some of the work that you're doing, I would love to talk with you. This is also my website is on these slides. You can email me. I would love to get in touch and kind of know more about your use case and if the stuff that I'm building is helpful for you. This is pretty much all that I have today. I'm happy to take any questions. There's also a bunch of links on here. Again, these slides are available, so you can see the code on GitHub. All the artifacts for this presentation are on GitHub, so you can go run the demo yourself if you wanted to. I have a blog where I publish thoughts about whatever I feel like writing about approximately weekly, so you can go sign up for that. Then I'm also on Mastodon, so if you want to go follow me on there, that would be great. I am happy to take any questions that you all have, and thanks so much for putting up with my dumb jokes. I think there's a microphone over there, so for folks who are watching online, if we can use the microphone, that would be great. No? I think it's on. Oh, there we go. Oh, hello? Hello? Okay. We actually have a use case around simulation, and I wanted to ask you a couple of questions about the data that you're collecting. I work for IBM. We need scenarios that we could simulate some very large clusters, and specifically, but I think I can see hints of this, we need to simulate some complex pod-to-pod affinity, anti-infinity rules. So in the slimmed-down manifest, it looked like that we were going to be able to pick that up, right? Yep. And so the virtual kubelet will then position these virtual pods based on the affinity, anti-infinity. So I think that's a plus. We are looking at quack also, and for us one of the drawbacks is I actually need some other metrics other than the container specs. So in a real world, I could talk to the kubelet and get some metrics, but I want to capture a snapshot of that and also send that as a payload. Do you think that's possible? Yeah. One of the things that I want to be able to do, so right now I'm just capturing when your pod started and ended, it would be totally possible to inject other annotations, like your init containers took this long to run, like your sidecar container exited with this exit code. We could feed all of those sorts of annotations in, and then, yep, yep, yep. I think the... I'm starting small. I don't know what other metrics folks are going to be interested in, and I don't want to just capture everything, but certainly there's a lot more we can do there. Okay. Are you currently pulling in any information about node startup times and things like that? So putting into the simulation of if I improve my 99th percentile for how long a node takes to spin up, how does that impact my simulation? So right now I'm not pulling that information in when your node... The simulation will just report that the node comes up immediately. Again, you can inject all of that information in and have virtual kube look kind of fake it out. Hi. My name is Indika. So my question is now, since you have the trace, can we do kind of like a replay? Like some incidents happen on the production cluster or any testing cluster now. If you want to replay on this one and try to debug what's actually been through, can we do that with this trace? Yep. So you can do the replay on kind of whatever cluster you want. As long as you have this SK controller running there, you can then tell it to replay the trace. And that could be on a local cluster that's running on your laptop. It could be in a cluster that's in a dev environment. It could be on some other production cluster, I guess, if you wanted to. But you should be able to do that anywhere. Okay. So that means we can pick the date and time from the production cluster you want to trace. Exactly. Yeah. Thank you. Any other questions? Okay. We'll go ahead and stop there. Thank you so much for coming. Really great to see all of you here. We'd love to chat more after the talk as well. Thank you.