 Welcome, everyone, to the next talk of the session, which will be held by Michael McKean. And yeah, it's yours, Michael. All right, thanks. Yeah, my name is Michael McKean. I'm a software developer at Red Hat. I work on OpenShift Engineering. And I do a lot of stuff with cloud infrastructure. So we're going to talk today about testing Kubernetes at scale with Cluster API and KubeMark. So first, a little forecast here. We're going to talk about what Cluster API is, what KubeMark is. And then we'll talk about the KubeMark provider for Cluster API. I'll do a little demo of how this stuff works. And hopefully, that'll clarify what we're doing here. Then I'll show some helpful tools to make this process easier. We'll talk about some real-world use cases and then talk about what's next in the KubeMark Cluster API world. So first, we'll start off, what is Cluster API? And I don't know, put a one in chat if you're familiar with Cluster API or you've ever used it. And I'll just kind of keep going here, but I'm just kind of curious what, if anyone that's familiar with this project. But from its own description, Cluster API is a Kubernetes sub-project focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters. Now, there's kind of a lot packed into that sentence, but let's just unpack it and see what it means. So if you're familiar with Kubernetes, you're probably familiar with this pattern where, as a user, I approach a Kubernetes cluster. I have a manifest that describes what I would like, maybe a pod. I give it to the cluster. The cluster does some stuff and it returns the pod to me, or it returns a result that it's done whatever I asked it to. Cluster API is very similar, but it provides a new type of primitive that people can use, which is a cluster. So now, I can go to what we call a management cluster, and I can ask it to create another Kubernetes cluster for me. And I use a declarative manifest to define the topology of that cluster. And then it will produce a cluster for me, and it will return back a kube config telling me how I can get into that cluster that it's created. And what's nice about Cluster API is that they've structured the project in such a way to where the back ends can be pluggable. So with one management cluster, I can spawn Kubernetes clusters on AWS, on Google Cloud, on Azure, on several other providers. And likewise, I can also manage those clusters. So I can create machines. I can delete machines. I can kind of change the topology of the cluster as needed. So really, Cluster API gives us the Kubernetes tools to start addressing infrastructure and clusters in the same way as we address containers. So what is kubemark? Kubemark, as described by the blog where it was announced, is a performance testing tool, which allows users to run experiments on emulated clusters. This one's a little more straightforward, but let's look at what that means. When I create a cluster with kubemark nodes, what I can do is use a real control plane, meaning a control plane that's actually backed by a normal kubelet and whatnot. And I can use that control plane to spawn what we call holo nodes. And the holo nodes are the kubemark nodes, and they run as pods or processes. They don't necessarily need to have their own machine available, so they're very lightweight, and they don't really do any action. They just say they've done action. So if you tell a holo node to make a pod, it'll just say, yep, I made that pod. Or attach this volume, it'll just say, yep, I'll attach that volume. It's not really doing anything on the back end, and it's got a Docker shim that kind of prevents it from really doing anything real, but it will say it's done it. And so this is kind of cool for testing scenarios, because I can go to a kubemark cluster and say, all right, give me a thousand pods, and it just immediately turns around and says, yes, I just created a thousand pods for you. And it's actually placing load on the API server, on the scheduler, so you can test the throughput of how these mechanisms of kubernetes are working. So a couple of notes here to take away from the holo nodes, they do not create containers, and they do not mount volumes, even though they say they will. And they do this by running a Docker shim that just kind of like responds to things. Yeah, like Makita, right. They do this by using a Docker shim that can just pretend like it's doing things. Now, if that Docker shim is unable to like understand an image spec or something, it might barf. So there are reasons that these kubemark nodes might throw an error, but usually they won't. Also, because kubemark is a process, a single host can run multiple kubemark processes. So, you know, you can use hardware to make it look like you've got hundreds of nodes, even though you really only have a few nodes that you're operating with. They also place a true load on the API server. So when you're trying to test throughput and you're trying to see interactions that happen in a live kubernetes cluster, this can be really helpful for you. And the other thing is that now they're able to advertise resource capacities. So a kubemark node can look like it has a GPU. It could look like it has accelerated networking. It could look like it has a hundred CPUs on it. You know, you can really make them look however you want to. And that really makes interesting possibilities when we talk about testing. So what happens when we mix, you know, the cluster API provider style with a kubemark, you know, as the infrastructure provider. And in my mind, you know, peanut butter meets chocolate. And so for those of you who aren't familiar, this is the Reese's Peanut Butter Cup. They are quite delicious if you like peanut butter and chocolate. So what does that look like though when we deploy it? This is one methodology of deployment, especially when we're using testing. And this can be used on a single host and it requires the host to have Docker, perhaps in the future there'll be better support for Podman right now because of the way a cluster API creates its workload clusters. We haven't quite worked out all the details with how to make Podman do this. But for now, Docker is kind of the path forward here. And what we do is we use kind to create a cluster in Docker and then we load the cluster API management tools into that Docker cluster. Now from there, we can use that to spawn a new workload cluster just like we talked about before. And that workload cluster is actually running in something we call the Docker provider. And so previously and currently in cluster API testing, they use the Docker provider because what that can do is instead of having a cloud infrastructure or a public cloud like OpenStack that you could go to, it uses the local Docker as if it were infrastructure and can create new machines by spawning new containers in your local Docker. So what happens is we see in step one, the management cluster creates a Docker cluster which is the workload cluster. And then we'll tell the workload cluster to create Kubemark nodes. And those Kubemark nodes based on the way the provider is set up will just be spawned as pods in the management cluster. And I'm gonna walk through this and we'll see what this looks like in a live scenario here. So to kind of review this, the cluster API Kubemark configuration, it needs an actual management cluster. And for the purposes here, we're using kind, but that could be a real management cluster that exists on physical hardware that you have or something in the cloud. It doesn't necessarily have to be this kind solution. It also needs a real control plane for Kubemark because Kubemark actually requires at least one machine to be a real control plane. So you have to have at least one node and that's where the Docker workload cluster that we created comes in. But that doesn't need to be a Docker cluster. That could also be another real physical cluster or it could be something in the cloud. And then likewise, the Kubemark nodes become pods in the management cluster. And this is the way it's configured now, but in the future, we're gonna have some flexibility for where those pods could end up landing. So let's get into a little demo action here. And I realize I'm gonna have to unshare and reshare. So this might get a little tricky. I gotta make sure I grab the right window here. Yeah, exactly. Thank you, Peter. Okay, so we're looking at a VM here. And this is an Ubuntu VM that I've created because it's got really good support for Docker and kind of works the way I like it to. And I've got some Ansible scripts that set up this VM for me. So I'll share those later in case anyone wants to kind of replicate the same thing. I've also got a local directory of some script files here that I use for doing things. And so what I'm gonna do is I'm gonna start by kicking off my management cluster. And this will take probably 20 or 30 seconds to start. I've also got all these like little script files in another Git repo that I'll share at the end. So if anyone wants to replicate this entire process, you should have all these files available for you. So right now, what you can see what we're doing is we're using kind to spawn a cluster in Docker. And it's a 1.22.2 cluster because that's what I've just been testing with. I'm also gonna apply a couple other things here. This will put a local registry inside of my Docker cluster. So I've got a container running that's running a kind registry and I can put that inside. And this is a way for me to get images into that cluster. And now what I'm gonna do is start the cluster API tooling installation. And this is another script file. It's using this cluster CTL command which is from cluster API to initialize the various components that I need here. And so you'll see it's gonna put cert manager on and it's putting several different components of cluster API in. You can specify the core version, what bootstrap controller you'd like, what control plane controller. Most of these we don't need to change. And then finally the infrastructure controller and that's where you see we're installing Kubemark and Docker as our infrastructure controllers here. Okay, so at this point, I've got a cluster running and if I do, if I do a watch kudctl get pods minus a, what we should see is, okay, everything's running. I was waiting to make sure all the CAPI components are running here and they are. So the next thing I'm gonna do is start that initial Docker cluster. And I've got a manifest here. I'm not gonna go through the manifest because it's kind of long. Maybe if we have extra time, I can run into that. But it is, it's just a bunch of yam on it may not be that exciting to look at. So what we'll do is we'll create that control plane. And if I do a watch kudctl get machines here, we'll see that right now this is provisioning the machine. Now machines are a type that is created by cluster API. There are three kind of primary types that cluster API users wanna be aware of. And those are machines, machine sets and machine deployments. And you can think of machines kind of like a pod and a machine set kind of like a replica set. And then a machine deployment kind of like a deployment. So a machine deployment defines a way that you could deploy a set of machines. A machine set describes that set and then the machine is the individual machine itself. And so what's happening right now is this is just creating the Docker control plane and it's setting up things inside there. Usually this doesn't take more than like 30 or 40 seconds. So it seems like, you know, I might have tempted the demo gods too much here by asking for this. But eventually what's gonna happen is we should see a provider ID once it's created the once it's been able to create that node. But for some reason it's taking a really long time to do it. Now the nice thing about this setup and I'm gonna switch to a different window here just so we can look at some things. The nice thing about this setup is that because I'm using client kind it's really easy to tear down and rebuild it. So if this doesn't work I can try and tear it down and rebuild it really quickly and we'll see what happens here. Okay, this is taking longer than I would have expected. Although when you look at kind we can see that I do have that second cluster. So I think what I'm gonna try to do here is I'm gonna use this other script file to get a kube config for that cluster. If I can type it's still a little early here for me. Okay, so it looks like this other cluster has not created properly. So what we're gonna try to do here I'm gonna just tear down the whole thing and we'll see if I can put this tooling like really to the test here. So I'm really thrashing the machine hard here but what I've done is I've destroyed both clusters that I just created and I'm trying to recreate the management cluster now. And I guess I'll give this another minute or two but if this doesn't work, I've got a video of the demo so maybe we'll switch to the video we'll just see what happens. But you know, you never know, I could get lucky here. So we'll install the cluster API tooling again. Cert manager is usually what takes the most time here because it has to deploy it on the nodes and then make sure everything's running properly. Does anybody has questions or anything too? Feel free to drop them in chat. I'm looking at the chat now so I don't have any problems if people have questions or comments or whatever. So we can see it installing all the various CRDs and everything. Let's just make sure that everything is configured. Okay, everything is running. And let's try this again. So we'll do a kubesikio, I keep doing that. All right, now hopefully this time what we'll see is that it will actually make us the node. And if not, then I'm just gonna, I'm gonna switch to running my video and we won't get the nice interactive demo. So and sometimes this happens because there are networking issues like it can't, it has trouble creating the second container and then running it back to the first one. Normally, at this point, what I would do is start looking at the log files to see like why it's having problems creating it. But I just did this an hour ago and it worked. So maybe I've upset something with the VM here or whatever. It looks like we're having a similar problem. This is like really a bummer from demos now I guess. I'll try one more time. Okay, I'm going to abandon this and just get my video out here. Stop sharing. Okay, so now we're looking at the video version of this. So very similar, you can see that like, we start, I do that pretty much the same thing here where we're starting up the control plane. I'm gonna fast forward a little bit because I know we're running short on time here. See it kind of go through the rest of the stuff here. I was wondering maybe we could run this a little faster. We see the same stuff, it's trying to set up cert manager. Takes a long time to do it. Skip forward here. All right, so here you can see we've created that camcp and we're waiting for it to provision again. This is a step where we were just kind of stopped at. And what eventually happens is that we get that provider ID and then I'll move on to the next part of this. Let's see here if I can just skip. Okay, so it gets a provider ID and now we're just looking to see what clusters it's created. I'm gonna get the kube config for the second cluster. Let me see, I just saw Eric put a question here. Are there public services that run these management clusters? I don't think so, not yet. And next question was, are there public services that can run these management clusters that people use? I don't think there is a public service that creates these for people but they're pretty easy to install from the QuickStart manual and cluster API. I'll have a link to that at the end here. So what we see here that now is we're looking at the Docker cluster that I've created, the workload cluster. And I've gotten the pods to show that core DNS, then there's been no container networking started yet for that. And so what we need to do is we need to deploy a container network. And I use Calico in here just because it works fairly well with what we're doing. And what you'll see eventually is that these nodes become active. The containers get created and if we look at the nodes, they'll become ready. And I'm skipping kind of forward here. The next part of this that I'm gonna do is create the actual kubemark workload cluster. And so what we're looking at here is I'm doing a git nodes. You can see this kubemark MD0 is my machine deployment for the workload cluster. And we're just waiting to see the container network deployed to it. So now that it's ready, that container network has been deployed and we've got a kubemark node running. Now what I'm gonna do here is I'm gonna show kind of how the cluster is set up. So we're gonna watch the pods and I'm gonna create a workload. And this is just a standard workload I use. It's a little sleep routine. It requests two gigs of memory. So it's nice for this because when I use it with the autoscaler, I can use it to push out new nodes because only one will fit on a node if it doesn't have enough memory. And what I'm gonna do is I'm gonna scale the replicas up here. And what we should see is that we'll have a bunch of pods in pending because at this point, we only have one kubemark node set up in our workload. And what I'm gonna do at this point is I'm going to scale the machine deployment of the kubemark hollow nodes. And we're gonna see how quickly this turns around and the nodes get created and the pods get assigned to them. And it's gonna happen really fast, like much faster than you could ever expect in an actual Kubernetes cluster. So what I'm gonna do here is I'm gonna scale the replicas on this. And remember, this is the same operation as scaling the machines in my cluster. So I am growing my cluster right now. And even though it happens very quickly here, you could see how quickly all those provider IDs came up. And I'm just, I'll go back real quick here if I can catch it. You watch as the provider IDs come up and as soon as I've run the scaling, like those nodes are already created. There's no way this would happen. You saw even with the Docker provider, it took a while. Imagine how long it would take with AWS or GCP or something like that. But here you see these nodes are all not ready. They're going ready as the pod network's deployed and then the containers get deployed like right away and they're running. Now, if my demo had worked a little better, I would have gone into some auto scaler stuff and kind of showed you how that works. But I'm gonna cut things short now. I will make this video available if people wanna watch it. But I hope that kind of gives you a taste of why this is exciting in the testing space because of how rapidly these things happen. And like these nodes are real nodes. I can apply taints and labels to them. And I'll kind of talk about what we're doing with it here. So let me go back to the slide deck. I know we've got just a few minutes here, but if you've got more questions, please feel free to toss them in chat. We're kind of running at breakneck speed here. Okay, so there are some helpful tools that you can use kind as one of them. I've got some Ansible Playbooks that I've shared that will help you set up the virtual machines that you might wanna use. And I've also got those shell scripts. You'll find this presentation linked on the DevConf, the schedule page for this. So you can get links to all these things there. I wanted to talk about some real world use cases too for how people are using KubeMark. And some of these are kind of old, but currently the Kubernetes SIG scalability runs a set of KubeMark jobs in their SIPI. And this link here will show you that. And you can see they run these on every workload just to kind of see if they're making any regressions in that. And most of them are scaling kind of things where you're looking at large numbers of nodes and pods and whatnot. Several years ago, the Apache Unicorn Project, which is an alternative scheduler that works with Kubernetes, used KubeMark to do their scheduler throughput. And they were able to show how they could increase throughput through the API scheduler with that. Also, the Kubernetes blog, when they first released KubeMark, they showed how they could create a 2000 node cluster and launch 60,000 pods. And then my personal bugaboo is this last Red Hat OpenShift bug here. We're currently working through a problem with GPU deployments with the auto scaler. And I'm using KubeMark as a way to help me test that because I can test nodes that have GPUs without ever having to spawn them. And I can control how the nodes look like they have GPUs. And this is really important to us kind of diagnosing this bug. And I realize we're at the time here, but this is what's coming next for the KubeMark provider. We're gonna add continuous integration and some automatic image builds. We're also adding support for external cluster selection so you could choose where the KubeMark nodes land. And we're adding some scale from zero implementation to work with the auto scaler as well. And this will be the first implementation for cluster API. So hopefully it'll show people kind of what to do. So thank you everybody for bearing with me with all the technical problems and everything. These are some links that you can use to kind of follow up here and stay in touch with me. We got cluster API book, hollow mode, the cluster API KubeMark provider. And then a couple of things for me, those script files, the Ansible playbooks and then a blog post I wrote kind of describing how all this works. So yeah, thanks everybody. Thanks Michael. That was pretty, pretty ambitious demo. And I'm really sorry it didn't work out well. Unfortunately, we are on top of the time allocated. So if there are any more questions or if you want to ask Michael about something cool about this, please go to the work adventure thing we have set up. Thanks again, all the audience, thanks again, Mike. Thanks Peter, that was awesome.