 All right. Good morning everyone. This is a huge room feel free to move forward so it doesn't feel quite so big Hope y'all had a good first or maybe second day of cube con this year We're gonna talk today a little bit about GPUs with Kubernetes and virtual cubelets and Well, let's get going My name is Dean Troyer. I'm with salad technology We're a relatively new GPU cloud provider Distributed clouds and all sorts of locations Distributed GPUs. I I should say This talk was actually proposed and primarily written by Gotham Venomah who is unable to be here today So it is just me we're gonna talk a little bit I've got it I've got a story about a mythical app that Needs some AI capabilities and we're gonna kind of walk through how How you might add that capability to an existing platform We'll look a little bit about what they've got the back end and What the what the exact needs are? Then we'll examine one split one specific solution that spoiler alert, it's a virtual cubelet and Look at how that can can solve their problems and if the demo wins are Favorable for us today. We're actually gonna walk through bringing one up So our story is around An app called slackers at work This is a mythical app that has a feature called the slacker tracker Everybody's worked with a guy probably named Joe Doolittle who takes 30 minutes to get his coffee in the morning He spends two hours at lunch. He's got to make the rounds of all the departments before coffee break And this app is for people to keep track of where he's at at any given time We've all worked with somebody like him probably more so before 2020 than now, but Anyway, that's that's the situation that we're in it's just your basic mobile app and They've decided product has decided that they want to add QR code tracking to this app So all you got to do is point your phone at Joe to mark exactly where he is at any given time Like like too many product teams they leave the implementation of getting the QR code on to Joe as a detail that someone else will solve Now more realistically it's a it's a cloud-based app So they don't really have any local hosting abilities, you know, they don't they don't operate any of their own service It's all it's all cloud-based Which means they also don't have any data center resources any of that usual stuff and their existing cloud provider Well, let's just say they don't have an affordable GPU offering without Minimum spend commitments and all sorts of other things that a that a small app like this doesn't really just can't can't afford to deal with Which also means that they're not heavily invested in in too much beyond the base the base orchestration of their stuff and Of course, their app is currently all based around Kubernetes Now if unlike slackers you do have Some local hosting abilities, you know, one of your options is to to get your own GPUs That alone seems to be part of the trick at least to do it affordably but the considerations are pretty much all the same as for Hosting anything else yourself, you know, you need your physical facilities pink power pipe and You've still got the problem of orchestrating workloads on to those GPUs In terms of looking for a cloud provider, you know, obviously you're gonna look first at leveraging whatever existing relationships whatever investments you may have Some folks are eligible for some free credits that are useful free credits expire and You've got like I said the minimum spend commitments as an issue and you may have new APIs to deal with either with an existing provider clearly with a With a new provider, you're gonna have probably a different API So being able to use your existing orchestration tooling, you know your Ansible's your terraform or whatever Is of a great benefit So in summary Slackers are looking for for a solution they've got Existing Kubernetes clusters They're managing it mostly with helm right now. They're pretty basic and they're fully buzzword compatible otherwise and They need these additional external GPUs to build their QR codes They want to use stable diffusion with QR monster and make fun codes or more Specifically, they want to try and disguise them so you don't see them on Joe quite so easily So our solution as I mentioned is a virtual cubelet that will connect to a GPU cloud Today, we're gonna use one that oddly enough. I'm kind of familiar with and That offers this particular salad offers thousands of distributed GPUs running containers on gaming PCs around the world So what is a virtual cubelet? It's essentially a translation layer That takes Kubernetes control plane events and translates them into something else another API most often To allow you to orchestrate external resources via the commands and the tooling that you're already used to You can run more than one so you can tune and schedule things in particular ways You can control what gets scheduled on to those and The virtual cubelet here shown is a Process not a physical node. We've got the three nodes running the regular cubelet process this is a Just a process and you could run it in your cluster You know, we've we've all probably heard the stories about people who brought up their first cloud And they put their DNS in their cloud and then they couldn't cold-start it We're not gonna have that problem because The virtual cubelets not part of the critical path and bringing up your cluster So we're gonna extend Kubernetes to use the existing primitives the existing tooling to orchestrate the workloads out to these GPUs and Most external providers like this are gonna have provider specific metadata that clearly Kubernetes doesn't know anything about there are mechanisms of course to get that into your specs and Pipe that information down to the external API that you're talking to One of the other things that that's a benefit of doing this with external Stuff, especially if now in in slackers cases isn't it? But if you're dealing with something that's a publicly visible endpoint putting these out side of your cluster even outside of your entire network Can be a benefit and I mean it also can be a drawback for some workloads You you want that close network connectivity and that is one of the downsides of virtual cubelets is It's if if anything at all it's weak in being able to orchestrate your network back into your regular cluster But for things that fit that model well That don't need, you know a tightly coupled network. This works pretty well basically we're gonna we're gonna take multiple external things and and in this case GPUs and Present them to your Kubernetes cluster as a single pool Basically cloud 101 type stuff But we can also label, you know, if we're running multiple virtual cubelets. We can label them There's you know all the usual ways of determining what workloads run on which systems So this is your basic Kubernetes diagram. The left side should be very familiar to all of you It's a you know a control plane with some nodes around it and and a user an application out front The green box in the middle is the virtual cubelet And as I said earlier that is just a process This is a logical diagram physically. It's probably gonna run on one of those other physical nodes It is taking the control plane events meant for it and translating them into in our case Salad cloud API calls that takes those GPUs They look like pods to Kubernetes and they look like they're all running on one giant node the the virtual cubelet process itself is Probably a single go binary virtual cubelet is The the project there's a repo Is a library that defines a set of go interfaces That you have to implement to to write your back-end provider But that's what provides the standard Kubernetes side of the of the connector I'm not aware of this being done in any other language at this point Would be interesting to to know but I think pretty much everybody's gonna do this and go So it makes it a relatively lightweight thing to run If your budget is being soaked up by by paying for GPUs then maybe you need to run it on a Raspberry Pi, but again No need for that. We can just run it in the cluster itself In our case we have a little helm chart included in the repo for our virtual cubelet that does a very basic deployment of it that's a Sample helm chart, but basically this is just going to tell you that We're using helm to bring up the virtual cubelet directly, so Let me check one thing real quick That's green awesomeness So let's try it Let's see what we can do. I am running Docker desktop with the Kubernetes on my laptop here So this is all very local Demo And I've scripted a few of these things just for simplicity So when you start that up clean, you're only gonna have one node. That's the control plane and no pods the Let's do a start and because the helm install command is is a little long Can I highlight this for you? That's the helm install command that we use the configuration at least in our case is all done via command line options, so you'll see some, you know some Solid specific things in there. You'll see like the namespace that we're going to use and naming the naming the process itself and what that looks like is Now we have at the bottom is the pod running our virtual cubelet process and Up in the nodes. You'll see the KC demo vk. That is the cubelet agent ready to receive Things Let's see It looks like I put the wrong tab up on this side This is oh come on There we go This is salad cloud portal This is where we're going to look from the salad side at what we're running Container group in our context maps to a cute Kubernetes pod So back over here cube cuddle apply demo qr.yaml and That's pretty straightforward. Let's see what it did so we have two pods That are coming up and again if the if the winds are blowing in the right direction. We will see those pods appear over here soon Nice I should know better than to try this I should have recorded it Lesson to us all you don't let yourself get too cocky Just for fun. I actually have This isn't just to just to show you what it might look like. This is what it should look like Once you get in to look at what what is running We've got a we've got an instance of a QR code generator and fortunately because this is the one we're actually going to look at This image is six gigabytes and takes a few minutes to get set up And I didn't want to wait on that But this is what it looks like. We've got five GPUs running behind this thing right now Still nothing dad gum it I didn't realize this is gonna be so close. So anyway Obviously slackers at work would not be using a a form like this But oh I've already got it in there Joe do little We're gonna make his QR code. So that's what the code would look like by itself but Yeah, plaid flannel shirt Joe likes to wear flannel. So let's Whereas there it is build one there's a little checkbox down here too to validate it to make sure that it's actually readable Generating generating generating I don't know does that count as a plaid flannel shirt for Joe do little Let me see if it works. Oh good grief. Yeah works for me Now and I say that because we have had some people on Android have these things not be quite as reliable. There's There's tuning. There's all kinds of things that go into that yet This demo app was put together by our solutions architect Sean Rieszewski and we were talking about this last night and it's just We're not gonna spend we decided not to spend that time Trying to get it, but it also gave us here. This ran on a 30 60 ti 8 gigabyte of VRAM You know took about three seconds and down here at the bottom We have the data content So anyway, you think you think somebody could print that on a sticker and get that on Joe's back without him knowing it Let's see oh just for grins I want to see if this thing ever came up of course not but at least to show you that Even if it doesn't work we still have standard Commands at our disposal Let's get rid of it And now they're terminating So That is that is what it looks like In a very in a very quick sort of application Now I gotta find That window go back full screen This is another just simplified diagram of what it actually looks like You know, we've got the Kubernetes on the left the virtual kubelet in the middle and then the salad cloud api and Like I said GPUs around the world Waiting to to do fun things and by golly without a demo that didn't take as long as I thought it was going to um We are here Booth E27 if you want to know more about salad itself And if you want to know more about the kubelet stuff and especially the proof of concept that we built get ahold of me and We can go from there do Geez Do we have any questions? We've got a microphone over here. This is a huge room if you wouldn't mind the can we get the Question mic up Well speak up and I'll repeat your question. We don't need to wait All that at the level that we did here today. You won't have that you can always inside your container Build that in you would have to build that into your can container to tunnel back into your internal infrastructure unfortunately virtual kubelet itself doesn't do much for networking the The implementations that I've looked at that the claim to support a little bit of networking They're all pretty much doing it themselves one way or another So yeah, this isn't necessarily going to work for every job, but for the ones that fit it it works Question I forgot to repeat the first question. I apologize. The question is what sort of network isolation do we have? For the for the GPUs in our GPU network The GPUs our network back to our internal Hubs and things via wire guard so we've got that all they're they're completely isolated the We've got level of organization Organization and project so we can isolate that at a project level and we can isolate it individually So the GPUs may or may not be able to even talk to each other if they're in the same thing like the five that I showed Right now. They can't even talk to each other. They're completely isolated. You can of course always build that into your container But by the default it's it's completely isolated to just that GPU Back to our our front-end our load balancing front-end. I think it's working now. All right My question is about the GPUs. Are you able to schedule multiple? Clients, let's say on one GPU. Can we split the GPU with the way that you're arranging things? Right now. I'm gonna say to clients. No, you're gonna get one workload per GPU if If you Have a workload that can handle multiple things at once at that level. Yeah, you can do that Basically, what we're doing is taking a container That needs a GPU to do whatever it's gonna do and running it, you know on that machine locally So I mean like this demo app, of course, you can have multiple people hitting that endpoint at the same time You know, it's not it's not single-threaded or anything like that. It's what you would expect. I Guess what I'm looking for is scheduling fragmenting and running multiple workloads on a single GPU. Gotcha. Thank you Hi, I was a little bit curious about the Relationship between one virtual cubelet that then maps to all these different physical Machines under the hood physical GPUs Just intuitively it seemed to me that I would want to have a virtual cubelet per physical machine to avoid running to situations where You might request Resources that like cross a boundary to multiple physical machines that then would incur high latency overhead Whereas if you would just scale your request down a little bit to what fits on a single device You could pack it all together more efficiently So I'm curious is that something that's very easy to configure to have like multiple these virtual cubelets or is that Like the best practice for this approach I'm not sure I can speak to what is the best practice, but you can run multiple virtual cubelets You know if you wanted to say in our example if you wanted to segregate all of the GPUs with 16 gig or more VRAM You could configure it so that you know, that's in one virtual cubelet and all the smaller Maybe less powerful GPUs are in a different one Our system has the ability to do that through the filtering You know the stuff that we would add as attributes to the workload and then we'll handle that sort of thing We can select what class we want to run on But anything else like that you could run multiple the other thing is you could run one virtual cubelet per provider In fact, I think there might be one out there that does multiple providers built-in one per process But the same code will handle multiple providers Okay, so virtual cubelet itself the library doesn't restrict any of that But is there any particular reason to prefer the way that you showed it where you have just one virtual cubelet mapping to a potentially infinite pool of physical machines That's the I Guess that's the common way Partially because if you're gonna run one virtual cubelet per Physical node now you have to bring those up and down That's that's your control plane. You don't want to be managing that on workloads or anything like that I mean, it's it's like adding a node to your cluster You know, that's not something that well, I guess it is in some cases something that you just scale out automatically But I don't think you have to worry about it the virtual cubelet isn't actually Yeah, whatever your control plane load is is the volume of data that handles I mean, it's not like it's gonna scale there badly In this particular case, it's just simpler and the way I've seen I think the azure one works like this It's just one pool now Like for us we configure the virtual cubelet for Down to the project level. So, you know, we've got a like a an organization and an organization can have multiple projects We have to give the virtual cubelet those two bits of information. So if you want to run multiple projects, you're running multiple virtual cubelets So in terms of breaking up what's behind it that would be in our case That would be the way you would have to do it below that. Yes, it's your choice. Got it. Thank you Morning. Good morning. First off, I feel attacked about the flannel comments. I'm really upset about that Apologies, I hope your name isn't Joe. No, not Joe, unfortunately How does this look for like an on-prem if I have on-prem GPUs in virtual cubelet How do you can you reach out to an on-prem another kubernetes cluster for instance or You know, did you've leveraged GPUs in a different cluster in in what I showed you cannot you would have like I said earlier You'd have to build that into your container via. I don't know set up your own wire guard or something to come back into your infrastructure If you're pointing the virtual cubelet at another internal cloud, you know, let's say you've got a proxmox or something setting over to the side Whatever it's got available to it network wise again The virtual cubelet isn't doing much for your networking if you can set up your network so that it the Cloud that it's talking to can do what you need. That's how you'd have to go about it. That's external from the cubelet Okay, I was thinking more along like the Jeep thinking more like the GPUs I mean can I just have GPUs in one cluster and then have another cluster that just calls the GPUs across Do I have to use something more complex like salad or some other virtual GPU? No, you can you can do it yourself I think I might have buzzed past it pretty quickly. There are other solutions for doing things like this natively within Kubernetes without doing the Virtual cubelet to get there and that might make more sense for that situation Okay, like again because this is this is what we are we tailored it to that. All right. Thank you Hello. Hello. I'm quick follow up question to the previous previous question. It's a technical one This cubelet the virtual cubelet presents me with like 10,000 GPUs, you know, right potentially so let's What if I scheduled a pod that requests a hundred GPUs? How do you solve that that that pod will never get scheduled? But I also know about it you use an admission controller or what prevents it from trying to do stuff like this as Our code sets right now you would get a hundred pods It's one GPU per pod That's one of the things we just haven't gotten to yet Salad itself can run I think I showed you in that in that one workload way had five GPUs sure underneath one endpoint I think my question is if I schedule a single pod requesting 100 GPUs, right? We can't right now. This can't do that. Of course not but my question is How do you prevent it? It just it stays because it will try to schedule it on the virtual cubelet because it has 7000 GPUs If it doesn't have the resources it should fail But it has the resource technically, you know, I'm I'm afraid I'm not gonna be able to answer that well, okay That seems to be more of a of a Kubernetes Internals thing. I am not a Kubernetes expert. Unfortunately. No worries. Thank you. All right. I'm apologies Nice talk is the engine from Apple. So first a comment on previous question about the shared a GPU So as some of you may already and yeah, be aware so with the recent and NVIDIA GPU device plug-in and also Kubernetes support and actually We can share or virtualize physical GPU right and they are time slicing and sharing and multi-process sharing or Probably most interesting is called a big and a multi instance GPU. Okay So let's come to my question and so the virtual node that definitely is a Good solution right for the flexible capacity from provider. You mentioned server this and edge computing But when we compare right with the native and GPU support now, right? They are already in the device plug-in most recently in the community have something called the dynamic resource Education resource claim so I want and so can you comment to write compare this native support of the GPU node Solution with this and using the virtual and the Kubernetes or virtual nodes So in particularly what about the complexity and the administrative overhead that introduced by The virtual Kubernetes if we are I want to run a GPU cluster, right? What would be the Better solution or it's depending on the use cases. I'm unfortunately like I said, I'm not a Kubernetes expert So I'm not sure that I can compare the two The management overhead that we introduced is basically the virtual kubelet process and Then whatever you have to manage on on the back end that you're using I'm not familiar with the other the other things that you mentioned unfortunately, okay It's fair but I am just measuring is you can run definitely supported a GPU node right and natively without yeah I would guess yeah, my personal opinion probably really depending on the use cases. Yeah, I think both will have their applications Okay, thank you very much. Yeah, not everything is gonna fit for this. Yeah Thanks, thanks for the talk and it's great idea and I'm a towel from Google. I have a question about And do we need to change the workloads to specify like I want to use star not a cloud in the pot like other as a So in my pot specs, do I need to make some change? Yes, if Let me show you that You've got a couple minutes yet Also to share some background and I think today the Kubernetes GPU API is it's not that standard and it's like specify how many GPU I need for this part there's no GPU type and no GPU driver version and How can I know the GPU I get will will be installed when the desired driver version and designed GPU type That's a that's a hard problem and that's one what we do have already We've got a relatively standard Management container that we push out in terms of the drivers that are pre-installed on our GPUs You know, we're gonna give you we can tell you what version of CUDA has for example So in this case, you know, that's that's unknown as far as this is the this is the yaml that I used to attempt to launch it And those annotations there in the middle. That's the salad specific information That we needed for the for our workload Basically, we're the GPU classes. Unfortunately, that's a you you ID But that represents in salad a a class called stable diffusion compatible So it's a wide range of of you know cards and memory capabilities And the rest of it is is just the configuration for the app itself The other bit in here that is different is at the bottom the tolerations I Don't have in my head exactly what it was we set on the virtual cubelet itself to match that But that's how you control which virtual cubit. Yeah, that's my question. Thanks. Okay. Yeah Anyone else? Great. Well, thank you everyone for attending. You really appreciate it And like I said, we're in e27 and hopefully that scans for you Anyway, thank you