 So next session, we have Lukas who is the CEO of Loft Labs and also Brandon from CoreWeave. All right. Is that good? Okay. Hi. Yeah. So yeah, I'm just an engineer. This is Lukas. He's more important than me. But yeah. So like I said, I'm Brandon and this is Lukas. We're here to talk about how we power AI labs, how we do inference, how we do training at CoreWeave. So yeah, so to introduce myself, I'm an infrastructure architect. I've been at CoreWeave for almost two years. Been from basically the start to where we are now. And we've done a lot of changes along the way. And we'll kind of talk about that today with kind of how we do things now. Yeah. And I'm Lukas. I'm the CEO of Loft Labs. I got the company started about three years ago from now and involved with different open source projects. We got V-cluster started. We got DevSpace started. Contributed that as a sandbox project to the CNCF. And we recently launched another project called DevPod. But today it's all about V-cluster. So for, I'm sure maybe some of you know who CoreWeave is. For those that don't, this is the only sales pitch that anybody probably needs. CoreWeave is a specialized cloud for, purposeful for GPU accelerated workloads on top of bare metal infrastructure. We don't do hypervisor layers at all. You access the GPUs directly. And we allow people to scale from zero to whatever they need in a matter of minutes. And that's what we're built for. Purely GPU based workloads. So kind of how do we serve AIML? Just to kind of give you a basic rundown is, we don't have a hypervisor layer. We run on bare metal. Kubernetes K8 is on bare metal. We do support Kuvert for VMs, virtual machines on top. So for people who do need that, we do support that. We have an extremely fast response of autoscaling for our inference, thanks to a Canadian stack that we run, which seems to do really well. And there's also some more talks about that this week. Some of our team contributing to that as well if you'd like to learn more. And our stack is GPU optimized. Everything that you need to run GPU workloads is managed by us, the drivers, the health checks, all that stuff. So you just come with your applications, download your model weights, and you're good to go. And we're open source friendly. Everything we do is basically represented at Kupcon. Everything we run, you don't have to learn anything new. You can just, you know, you can ask questions. We're very responsive. We're very friendly to the environment. So it's, there's no proprietary stack here. We want to note the Tensorizer, which is a really incredible open source, like we're going to open source that soon. It allows us to basically stream models directly into the GPU memory. And actually Wes Brown in the front here is one of the lead engineers on that. He'll be happy to talk more about that this week if you're interested. So come by our booth to learn about that. But yeah, so it's just Kubernetes. It's really simple. We love Kubernetes. Plug. Yeah. We have just two little demos just to kind of showcase what we're going to talk about. The keyword here being Catalyst. Catalyst is what we're calling our next generation of how we're going to support customers, large and small, giving people the same experience. So hopefully we can see this. So what this is representing is we've worked with Loft Labs with, you know, virtual clusters and kind of customizing that experience that allows people to deploy their own full cluster experience without needing to actually provision nodes, right? That was kind of what the power of VCluster gave people. So we take it a step further and we basically either we give you the same experience whether you don't want to buy nodes and pay for nodes or if you need a dedicated isolated environment, we control the control planes the exact same. So all this demo is doing is showing you how quickly we can spin up about 25 completely compliant Kubernetes control planes in our environment. We use a lot of custom, we use a lot of operators. We use a lot of custom stuff, but overall it's nothing fancy, but it kind of gives you the idea of like we can deploy clusters pretty quickly. This is about two minutes to get the first one or two ready. Yeah, it's basically that at least. Yeah, so we run so we run at CD. We have a custom operator for at CD. We deploy the API server, the scheduler, custom components that are required. And this is basically what would start your experience on on core. We basically. And then the more fun one I think you guys are more interested in is we go to the next slide. As we have a quick little K and eight about a limb demo for the Mistral seven B, which was actually an open source. It's an open source model that was actually trained on core. We've seven billion parameters about 14 gigs in size. And it's actually pretty awesome. So this this one's just a real quick demo of basically taking one of those clusters. I just provisioned. I went ahead and pre-installed cert manager, K native, the, you know, the small things you need. As you can see, we're running on RTX a 5000s with men replica set to five. And as it goes to deploy, you'll see on the bottom, that's the remote cluster that I'll talk about in a second. And the top of the screen, you'll see the revision and the pods coming up. This is going to take about 60 seconds because we're not using our accelerated object cache, but it takes about 60 seconds for the models to actually go ready for inference, which is not bad. But with our accelerated object cache and our tensorizer product, we can load about a 20 gig model, about 12 to 15 seconds to get your inference pods up and running. And that is confirmed. You can talk to us this week about it. So we'll just skip ahead. So once they're ready, you can see the revision has become ready. And then we're going to go and do an inference request. Voila. So in a matter of one to two minutes, you've got your inference up. It's going to scale to zero. You can scale up as much capacity as we have. And you didn't have to provision any nodes. You didn't have to do anything. You just come with your application, come with your weights, come with whatever you need, and you're good to go. So that's kind of really the power of what we're trying to present today in this environment is just, you can really do whatever you want, and you don't have to care about any of the boring stuff. Unless you want to, then of course you can. But so to give some context, Catalyst is just Corweave's iteration of a Kubernetes control plane enhancement. It gives you full customizability. You can keep it virtual. You can keep it where you don't care. Or you can go bare metal. You can get a full set of like 100 nodes. You can get SSH access. You get full network isolation thanks to the NVIDIA Bluefield DPUs and a bunch of stuff like that. So those are kind of the two offerings we have. And then we also combine them together that I'll talk about. And on the right, you'll see, we make use of a lot of CRDs. We've got a couple that represent clusters in control planes, which just fancy way of saying like, what's your IP space? How many API servers do you want? What's your NCD backend look like? Pretty boring. But you do this, that's all you need to do. That's all you need to deploy. It's very simple. And this allows us to do things like training clusters, inference workloads, extensive CPU compute jobs, spark jobs. It doesn't matter. You can do anything. So we like Kubernetes. We run Kubernetes and Kubernetes. Everything's Kubernetes. I know it's like a meme at this point, but it makes things simple because everything's Kubernetes. So to give you some kind of context, we've kind of re-architected our infrastructure because we do run on bare metal, right? We have no hypervisor, so it's a little bit different. We run an internal cluster of eight nodes at CD on-prem. On-prem Kubernetes has always kind of been a pain because you have to manage it. Ideal, but running Kubernetes on Kubernetes is nicer because then you get to use the power of Kubernetes to manage things. So you're like, this is awesome. Let's do that. So we have a bare cluster that kind of stands up like our day-zero provisioning, and then we have a set of other clusters on top. And the real power of this new framework is kind of in the top box here, right? So as a customer on CoreWeave, you'll get your own control plane that's deployed into basically a namespace and then... Oh, that's the wrong way. And then it'll connect to what we call these real clusters, right? So these worker clusters are the ones that have 1,000 GPUs, 2,000 GPUs, 3,000 GPUs. Today, our cluster has 5,400 GPUs, sorry, 5,400 nodes. I think it's like 30,000, 40,000 GPUs, I believe. Something like that. I could be wrong. But basically, each customer gets their own isolated control plane thanks to Vcluster Pro's distribution to be able to synchronize to that. And then you get isolation and you get to deploy as many GPU workers as you want. The other thing that's cool though is that in this environment, you can also do a bare metal cluster. Like we can give you a real thing or thinking if you want and then you can switch. Like we can do that for you. If you decide one day you get funding and you want real nodes, sure. So it's pretty cool. So yeah, it's a lot of operators. We like Kubernetes so we do a lot of that stuff. We use Coup Builder. We use Argo CD extensively under the hood for day two operations and management. We love GitOps. That's kind of one of the cool things that we do here is we're very in line with what's the industry standard. So when people come to us, we're very in tune with our customers of how should we do things, how do we do it, and it makes it really easy for us to kind of communicate and get people started off the ground. And then one of the best things about this flexible deployment model is that Vcluster Pro allows us to power our burstable on-demand environment that I discussed. But everything between that and like a true Kubernetes cluster is identical, right? So in terms of what we provision and what we give to you, it's the same. The only difference is what you tell us you need. So that makes it like really powerful that we don't have to build two stacks. It's one stack. And some of the other key things, it's like, it's really flexible. Certificates via CertManager. Autoscaling via Prometheus. We have a custom at CDCRD. We have network isolation from the Bluefield DBUs as well as, you know, currently we run Calgo so we get to use, you know, network policies so it's pretty great because everything's open source. We're in the community and it just works. And then finally, probably one of the more cool things is we also have a lot of things specific for GPU style management of the nodes. We have a really extensive HPC verification workflow framework that's based on Argo workflows for ensuring that things like the H100s and A100s are healthy. There's a lot of burn-in tests you need to run. You need to make sure that they're actually performing, especially with new systems. And we take care of all that for you and make sure that that works. The two other cool things that I mentioned was the accelerated object caches. We have regional co-located object caches that can increase your model download times that I kind of hinted at. So that comes standard and we can integrate that into your inference services. And then we're going to be open sourcing a Slurm on Kubernetes, which is basically called Sunk. And that also is coming out, I think Q1. But basically it's, we can run Slurm on Kubernetes and it's really nice, so it's really cool. And then now I'll kind of hand it over to Lukas to talk about how does the syncing actually work. Let's dive a little bit under the hood. It's really great to have a cloud provider like Corvif here be so open about their internal workings. You rarely find that folks go out and actually tell you how their cloud is built up. And we're diving a little bit deeper today. So what Corvif is using internally is a project called Vcluster. We got Vcluster started in 2021. It essentially allows you to create virtual Kubernetes clusters. In fact, it's the only certified Kubernetes distro for creating virtual Kubernetes clusters. We've seen over 40 million virtual clusters created. In the first 12 months that we launched a project, there were only a million virtual clusters around. Now it's two and a half years later and we just celebrated 40 million virtual clusters created. So that's what we're doing. You'll see down here the GitHub link. If you ever want to check it out yourself, you'll also find it at vcluster.com. There's about 3,500 people who start the project, so definitely make sure you check it out if this is for you. What does Vcluster do at its core? It essentially allows you to create secure multi-tenancy for Kubernetes. And I don't envy you in having to translate it into sign language. Essentially, when you're looking at Kubernetes clusters today, you have a lot of replication. Each one of these clusters has Cert Manager, Policy Agent, Istio, Vault, all of these different components. And a lot of enterprises today are essentially spinning up Kubernetes cluster after Kubernetes cluster. Some of our customers, they come to us and tell us, we have 500 Kubernetes clusters. We have 1,500 Kubernetes clusters. Just because every time somebody needs a cluster, AWS told us to spin up a new EKS cluster and that's what we did and handed it out to our engineers. Initially, they all look nice and uniform and over time, they really don't. And then this becomes a whole mess. We've identified this problem and we essentially created Vcluster to solve it. Vcluster allows you to create multi-tenant clusters. That means you have one cluster that runs your platform stack, so you run Cert Manager, OPAR, et cetera in one cluster. And instead of handing out namespaces, which would be the default multi-tenancy way in Kubernetes, instead you're launching these Vclusters. And a Vcluster is nothing else than a pod that runs a Kubernetes control plane and then you can make that available via load balancer, ingress, et cetera. So people talk to this control plane now rather than the real clusters control plane. And that essentially virtualizes Kubernetes and allows you to use the underlying platform stack across these virtual clusters. There's a lot of sharing that's possible. When we're looking at the standard Vcluster, it runs virtual cluster workloads, the tenant workloads inside the Vcluster and the control plane alongside each other in one namespace in the same Kubernetes cluster. That's the default. The underlying cluster we typically call host cluster. What CoreWeave does, though, is a little bit different because obviously as a cloud provider they're very advanced users. You can tell from Brandon, right? And what they're doing is this here, right? I'm not sure if you saw that, so I'll go back and do it again. So essentially they run a control plane only inside one Kubernetes cluster, inside the multi-tenant cluster. And then we call that isolated control plane. That means the workloads that you're scheduling on these Kubernetes clusters, they may not be in that same cluster. There may be somewhere different, but the control plane is essentially in the same Kubernetes cluster, right? And that's a really advanced feature. That's something that we have in our commercial distro. It's called Vcluster Pro. It's essentially open-source Vcluster but packages some advanced features like this isolated control plane into a new distro. And why are they doing that? Essentially security and resilience, right? If you have a faulty workload running, right, or you have a malicious user, they could try to attack your control plane, but if the control plane runs on a separate cluster it's easier to do it. So it's easier to ensure SLA is for the control plane which obviously as a cloud provider you want to do. And it allows also advanced workload topologies. That means you can sync the workloads to different locations. It doesn't have to be the same cluster. What does that mean? We'll take a look at how core we've done this. There's essentially two deployment options they have for their customers. And Brandon was alluding at this earlier. So let's take a closer look at this. Option one is shared workload clusters. This is closest to the standard V cluster where you are sharing the resources of the underlying cluster with the exception that core we've is actually using a separate cluster for this. Again, isolated control plane, we have one cluster running all of our control planes, the V cluster control planes, and then we have another cluster that runs our workloads. And that happens via so-called sinker. I'll dive into that in a second in what actually does that mean syncing workloads to another cluster. But essentially, it means that we start workloads in a different cluster than we're currently in. Notes in a V cluster is a very interesting question. I front-loaded this because every time I give a talk about V cluster, it's the first question that comes up. How do I see it? What does it run around? And the answer is like everything in IT, it depends. There's multiple options that you can configure in V cluster. You can configure it to see all the nodes in that connected cluster. We have a shared workload cluster. We could expose all the nodes to the V cluster. We could choose to show some nodes via node selectors, for example. We could also join dedicated nodes, which is something really interesting. And we could also change the level of node visibility. So depending on which nodes we're syncing, we can either say, give me the full copy of the node and expose everything to the user, which may not be something that a cloud provider can do. We can also use the option here, modification. That means you can remove metadata, you can alter metadata, you can add additional labels. That means you may have some internal logic in terms of tagging and labeling your nodes and you don't want to expose them to the customer. So inside the V cluster, the tenants don't see that, but outside the nodes actually have these labels. So if you want to do that, then the actual node would look. That's the standard options. All of this works in standard V cluster. And then the second option that CoreWeave provides for folks, as Brandon said, if you have funding and you have the money to get your dedicated GPU nodes at CoreWeave, really excited, then you can also schedule workloads to dedicated nodes. So we're seeing here one option is syncing. That's a default in V cluster. And then we also have the regular scheduling mechanism where we obviously have to join that node into that Kubernetes cluster, into that virtual cluster, and then it's really dedicated to that virtual cluster. What does syncing mean? How does it work under the hood? So essentially if we take a real Kubernetes cluster, right, you would talk as an admin to that API server and then there's an LCD controller manager scheduled behind a regular Kubernetes control plan that creates a group context for me, right? So I have a context, there's nothing else in that name space. I could hand that name space out to one of my teams now, but instead what I'm doing is I'm using virtualization, I'm deploying a V cluster. The V cluster is nothing else than a pod running in that name space. Inside that pod we have an API server, a data store, controller manager, and a syncer, which is essentially our equivalent to a regular scheduler, right, that allows you to sync workloads to other clusters or to the underlying same cluster running it scheduled. So the V cluster doesn't necessarily need to know the nodes, right, which is a really interesting concept in terms of how to share and allocate resources in a very flexible matter. But we can essentially see this is a separate control plan running as a pod. And we can now have our tenants talk to this API server instead of the real API server. That means the tenants have another group context. That means the V cluster pod and the real name space that we created down there, they don't see. That means the Istio, the Opera, anything we're running in the underlying cluster, they don't see. They can't touch it. We can make things available to them by exposing CRDs and enabling sync for CRDs. So for example if we're saying, hey, we have a shared ingress control and we want it available for all of our tenants or for just some of the V clusters, we can essentially enable syncing for ingress. But by default we actually just sync pods under the hood. How that looks is if I'm now creating, if I'm talking as a tenant to my API server here and I'm creating a name space, that's an entry in my data store, right? And you can see we support multiple data stores at CD SQLite. We have an embedded at CD option that is going to come out really, really nice feature coming out in a couple of weeks. And essentially when we create that name space it doesn't exist in the underlying cluster. It's only virtual. It only exists in that virtual cluster that is running in that virtual cluster. Those CRDs won't exist in the underlying cluster. So a lot of autonomy for these tenants inside the V cluster, you can make them cluster admin, which is something that CoreWeave obviously wants their tenants to be for full flexibility. And then you obviously can create a deployment, just another entry in our data store and then our controller manager sees, okay, replica number is one, I'm going to create a pod for this, right? And typically you need a scheduler for this and the scheduler needs to be aware of all your pods and that actually creates a lot of that problem and Kubernetes of like managing that and essentially having workloads distributed, multi-tenancy making this really hard. So what we essentially have is a sinker and the sinker copies the pod from the virtual context to the underlying clusters context or to different clusters context as in the case of the isolate is we actually certified Kubernetes distro. So we went through all the compliance check from the CNCF, right? We run them with every new version of V cluster. So you can be sure if you're using a V cluster or a real cluster, you won't be really able to tell the difference. Sometimes we see companies actually switch to handing out virtual clusters and then their tenants don't even realize that change, that they don't get an EKS cluster anymore, that they actually get a virtual cluster. So what we talked about with the isolated control plane also offers this really nice UI that you can check out. There's a free tier available for it as well. There's lifecycle management for virtual clusters baked in. So you have some CRDs and a controller for managing virtual clusters. You have templating capabilities and what we call apps. So essentially what CoreWeave and others can do is define what should run in each virtual cluster. There may be a virtual cluster, you can do that with this templating mechanism, you can manage and upgrade. It's kind of like an EKS, but you're running it yourself to essentially manage these virtual clusters. And then you can monitor virtual clusters. There's cost optimization baked in. One of the biggest features is sleep mode where you can actually say, hey, if there's no traffic coming into that virtual cluster for 20 minutes, turn it off. Schedule it down. And all the time you can reallocate these nodes or you can auto scale your clusters if you're in a public cloud or with folks like CoreWeave. And then you can also use this dashboard as admin access for internal teams, support tickets, folks debugging virtual clusters. And everything they do is hooked up to your SSO, there's audit logging in place. So you have a consistent log of what everybody's running, any Coup clusters. Yeah. Cool. So, yeah, a lot of stuff. So kind of what's next for this? So 2024 is going to be a busy year for us. Basically, we talked about option one and option two. You got virtual on-demand, kind of like what CoreWeave does today, like you pay for just the GPUs. You don't have to preserve nodes. We also have the isolated environments if you need, I don't know, you need 10,000 GPUs, you can get that. But in the future we're going to combine those. We are working on a hybrid environment that allows you to have a dedicated bare metal environment, but also be able to schedule to that on-demand burst environment through the use of the DPUs and VPC stitching and a lot of really cool networking technology that's going to be coming out next year through a custom CNI. So that's kind of really what's on the track for us is what we're aiming for is giving you a single interface into anything you need, whether it's an isolated environment, whether it's a shared environment, you should have one and one cluster, and that's kind of what we're aiming for with this setup. So you can have a training cluster, you can also spin up inference workloads, you can also spin up one off like Redis clusters or whatever you guys need for any developers who need something random and you don't have to buy the nodes for that. So that's kind of what we're aiming for and that's kind of what's coming next for us. So thank you for listening. These are our boosts if you want to swing by and talk to us. Also this is posted online if you want to get started on CoreWeave if you're curious about some of the inference examples or the ML examples we have and then obviously V cluster as well as up here to check out. So thank you.