 Hi everyone and welcome to notes from the field the discussion with the cube root and users. I'm a lot of pause I'm a principal software engineer at redhead. I'm part of the cube root networking team and a maintainer at redhead So before we start I wanted to ask the audience a small question Can any of you that heard about cubit in the past raised their hand? Okay, make sense you're here and they Can the ones that actually used it raise their hand Okay, nice nice So I'll briefly present what is cubit and then we'll go to the panelists So cubit is an add-on to Kubernetes. It extends Kubernetes with resource types for virtual machines It enables running virtual machines alongside pods in a Kubernetes cluster actually the pod Having the virtual machine inside it and the virtual machine is connected to the pod networking So the virtual machine is connected to the pod networking So the virtual machine has communication to the cluster network any component in the cluster can communicate with the virtual machine And the virtual machine can communicate with any component in the cluster that Is communicating with the cluster network managing the virtual machines is done using the same standard tools that the Kubernetes has like Cube CTL. So let's go to our panelists. Can you please present yourself, where are you working, for what company, what is your role in the company, and what your company is doing? So I'm Ryan Halsey, I'm a software engineer at NVIDIA, and I'm sure everyone knows what NVIDIA is here, but if you don't, NVIDIA makes GPUs, provides various services around those GPUs, and so I specifically work on a product called GeForce Now, and GeForce Now is cloud gaming service, and so if you what you think about is if you ever wanted to get access to like a 3080 and you wanted to stream a game on it over Wi-Fi and any device anywhere, you could do that using GeForce Now. And so specifically I work on the infrastructure as part of GeForce Now. Hi, good morning. My name is Dinesh Madraka. I am CTO at Civo. We're a cloud native service provider who are providing Kubernetes clusters in infrastructure that we manage around the world, focused on developer experience, being really, really simple, cost-effective, and kind of reimagining how cloud computing is being provided today. Yes, hi. My name is Kim. I'm the founder of Killer Quarter. We are a very recent startup. We only exist since one and a half years, and we provide sandbox environments, so you can just imagine it. You open your browser, you get direct access to a Linux VM, you get direct access to a running Kubernetes cluster, and from there on you can use it as a sandbox environment. You can use it to show users your tools. We, for example, have a lot of CNCF projects running that you can just test out in your browser. And yeah, we do this, oh, now you can hear me properly, right? And yeah, we do this using KubeVirt, running another hood. That's why I'm here today. Thank you. Unfortunately, Peter from CoreWeave couldn't join, and we also have Howard from ARM that couldn't join as well, but since NVIDIA are end users of ARM and they collaborated together, Ryan will try to represent them. Sure. So Howard Zhang was the engineer that was going to be represented here. So Howard Zhang is a senior software engineer at ARM, and ARM, I'm sure many people also know what ARM is, but if you don't, it's a CPU architecture, and if everyone here has probably come across an ARM machine in some way or another, if you probably have one in your pocket right now, that's your cell phone. Most cell phones use ARM cores, and if you're in that five percent that don't, you probably have the Raspberry Pi, and that's an ARM core. So Howard's been working for probably about the last two years or so on KubeVirt to add support for the ARM architecture in KubeVirt releases, and for about the last six months or so NVIDIA has been collaborating with ARM to actually make it a fully supported architecture as part of KubeVirt releases. Thanks. Thanks for presenting him. So all of you are our end users. We know you chose KubeVirt. We're really curious to know why did you choose a KubeVirt over other virtualization solutions. So at NVIDIA, for GeForce now, we like virtual machines. We've, for a long time, we've had our infrastructure, our first generation infrastructure used a lot of virtual machines, and we like them. And so we wanted to move to the Kubernetes world where we can get microservices, we can get containers and get the orchestration as part of Kubernetes, but we still wanted to hold on to the investment of using virtual machines. And so for us, it made a lot of sense to look at KubeVirt. KubeVirt provides a way for us to onboard our virtual machines and run them in a Kubernetes cluster. And so specifically like our use cases, we want to provide GPUs. We provide them, we lease them to our customers, to our end users. And so we want to take, we want to launch virtual machines, we want to attach them to those guests, and we want to make them available to our end users. And so we do this with Kubernetes and KubeVirt. Okay. But I know GPUs, PESRU can be done to pods as well. So why virtual machines? Sure. So there's a lot of reasons. And I think one of the ones we say most common is we like the security layer. We like having a kernel that sits between us, our control plane, and the end user. And you can imagine with, you know, we're providing a service, we're releasing a GPU to play game, and we don't trust the end user. And so we want to have that kernel layer there. It helps us, it's more for compliance and a lot of other reasons, it helps us sleep at night. So I don't get paged to wake up with someone taking over our entire data center. Thanks, Ryan. And yeah, so at Civo, we're providing end users with virtual machines themselves. So that's the product that we're selling. We did have other technologies before we started moving over to Kubernetes, but when we did that, and the main move for moving over to Kubernetes was again making sure engineers were sleeping at night, because the kind of the power Kubernetes gives you with the auto healing and workload placement means that we're getting less pages, but we still don't trust end users. I mean, I can see all of you here and I'm not sure how many of you I trust with root access to any of our clusters. So the virtualization layer that Kubevert provides via Libvert has been thoroughly tested for years and years and years. So the ability to leverage that security isolation, the ability to run any kernel by you as an end user, and also give our engineers more sleep at night. I think Kubevert was a great choice for that. Thank you. If I understand correctly, like most of our use cases are people that have Kubernetes and have pods and also need virtual machines, so they choose Kubevert. But in your use case, I understand that you chose Kubevert and got Kubernetes with it. How come? Yeah, I think that's another way of looking at it that, yeah, Kubevert provided that and Kubernetes maybe was a nice add-on on the side that provided some nice features. Yes, kind of similar here. There was already a lot of Kubernetes knowledge where we came from and then we were looking how can we maybe use this to provide also isolated environments. Same thing. We don't want to trust our users. They should have access to completely isolated VMs and not contain us where there's direct kernel access. So that was one thing. The other thing was Kubernetes. So with Kubernetes, we can now create VMs through the Kubernetes API. So we have, for example, goaling applications, and we can simply use the goaling client library to create virtual machines. And this one is really well maintained. And so that's really cool from the software development perspective. And then one thing is also that we can kind of simply run multi-cloud. So we run on GKE managed Kubernetes clusters where we enable nested virtualization so that we can create multiple virtual machines per host using KubeVirt. And we also run on dedicated servers which are not that scalable, but much cheaper so that were the main reasons. We were also looking at Firecracker at the time. I was looking at it. But there the whole aspect of the integration in the Kubernetes world was not available. Thank you. We would really like to know how do you use KubeVirt? I mean, what scale do you have? How many clusters? How many VMs in a cluster? What features do you use? How do you deploy KubeVirt? You build it from scratch or you use one of our operators, a hyperconverged operator or a KubeVirt operator. Did you have to customize KubeVirt for your special use case or you just took it as easy and it wasn't up? Or any other interesting information about your specific use case? Our cluster sizes, they vary a little bit in our production data centers. We were in the hundreds of nodes, many hundreds of nodes, like about 500 up to over a thousand. Our VM, our total VM count can also vary based on the node size because obviously we know the nodes will only get so many GPUs. So that can vary from also a few hundred to up to about 2000. And then in addition to that, we have a lot of pods. So our kind of overall size will go from in the largest zones and it'll be maybe 2,000 VMs in 10,000 pods. We deploy a vanilla Kubernetes, we don't have any fork, and for KubeVirt we just deploy it and manage it ourselves using existing tools like for an operator to roll out and deploy and upgrade. We do have a fork of KubeVirt. We're based on KubeVirt 050. And we're actually in the process right now in our zones. We run Kubernetes 1.23 most of our zones and we're moving our way to 1.26 and then we're moving from 050 to 0.59 for KubeVirt. I think we're in a very similar situation with at least how we deploy it. We use the KubeVirt operator to deploy. We roll that out actually using Argo across all of our regions. So we're managing KubeVirt currently in five different data centers. We try and cookie cutter them as well so that they're all the same. And we do have a custom build of KubeVirt that we run internally just for some of the strange storage requirements that we have that realistically we do need to commit back upstream, but it's for us it's getting the time to do that because we want to give back good quality tested code back to the project. And while the test coverage is good, I don't think it's good enough for the project because KubeVirt's got some really, really good test coverage and we're really, really confident pulling the upstream images down so we don't provide less than we're getting from the project. In terms of density, I think we're running between 60 and 100 VMs per compute host. We have changed out of the defaults of that. I think there's the 210 pod limit. So we've gone outside of that so we can run up to 400 pods on each of these compute hosts as well for additional services around the side. Thank you, Kim. Yes. So we also use the KubeVirt operator to run KubeVirt. And we have right now eight Kubernetes clusters in three different regions. And we are kind of on the lower end. So we maybe have at most 500 VMs across all running at the same time, because our VMs are not long running, right? Our VMs run at most maybe four hours and then they're gone again. So these are like disposable sandbox environments. So that's why we don't have many VMs at the same time, but we have a lot of cycling. So we always have creating and deleting of VMs. And we also want to do this fast. Everything has to be fast. No one wants to wait. And this is why we also have a customization when it comes to the storage handling of KubeVirt. So the STCDI, the, what is the CDI called again? Yes, correct. Container device importer, right? Interface. Yeah, maybe it's something else. I mean, you can use the CDI with KubeVirt. Container is a data importer. Yeah. Container is data importer. Thanks. And you can actually use it in a combination with KubeVirt to provide the images on volumes, which is really comfortable and which works really good. But we actually moved completely away from this just for speed. We have a local image cache on each host and we actually modified the VIRT launcher to load the images directly from there for speed. So that's like one, one thing that we did for our use case. Yeah. For infrastructure, for GeForce now is actually really similar. Kim, we, we use local storage to cache our images. Our use case is also very similar in the way that we lease machines. And so like you can imagine, we only give people some amount of time to play games. So it's, you have a workload that's running for whatever you say, six, eight hours, whatever it is. And then eventually it disappears. And so you can get, you can imagine there's a lot of churn, right? You get times or periods of time where people are like, Oh, I really want to play games. You know, it's midnight. And, you know, it's, this is my time to play. So we get a lot of people, you know, you show up at midnight and want to play games. But then at like four in the morning, seven in the morning, it's time to go to work. So we don't get a lot of people playing games, for example. So we get this, this constant churn where we see lots of inflows of traffic during periods of time, lots of pressure on our zones. And then times we have it off, but our workloads are very similar where we get this, this constant churn and VMs moving in and out of the data center. Ryan, can you please answer for ARM? Yeah, sure. So, so ARMS, ARMS got a really interesting use case for Kuvert. They, so ARMS is a CPU architecture, right? So they, they need to test against a lot of different operating systems and kernels. So the way that ARM does, the way Howard explain ARM is doing this is they, they use Kubernetes for their orchestration. And they use Kuvert to actually spawn a bunch of virtual machines to test against, you know, launch of different OSes and different kernels on ARM cores to validate them and test them. So they use it for their test framework. Thank you. I was just going to add that obviously the two use cases here are for more short running VMs and short running workload, but Kuvert is still able to run more long running workload. So we actually have some VMs in our infrastructure that while have been rebooted and moved around, kind of two years old, the original volumes that are there. Customers have been putting updates or whatever they need to be doing on that. But the actual underlying PVC is two years old and has been rolling through our infrastructure. It's been going through Kuvert upgrades and Kubernetes upgrades as we've been upgrading our infrastructure and customers have been really, really happy that everything has been, been stable. Thanks Dinesh. It's really great to hear your experience. And did any of you had the opportunity to contribute to Kuvert? It may be called contribution, of course, but not just we as a community really want to highlight the importance of a non-code contribution. It may be updating docs or even reviewing PRs. We have release notes on our PRs and the users may know better than us if the release note is clear or not. It may be hosting meetings, open issues, bugs, answer to questions in the different channels, mailing list on Slack or any other contribution. So any of you happen to contribute something? So I think we were working quite closely during the release of hot plug volume attachments, which I think was about a year and a half ago. So hot plug volume attachments is where the running virtual machine is able to have effectively a USB stick plugged in and out of it while the pod is still running. So the way that works under the hood is a separate pod is started that is co-located with the running VM that updates the mount points on the underlying compute host and that is bind mounted directly into the running VM and then some libvert magic so that it appears as a disk inside there. For us that was a really, really important feature and we worked closely with Alexander Wells in the Red Hat community for that to get that tested, get that pushed through and we got some really, really good feedback on the Slack from Alexander to get that tested, get our feedback into it and it was a really nice experience working closely with the maintainers. Thanks for this contribution. Yes, so we don't have any direct like code contributions, but we are linked in the KubeVirt documentary at some places. So I worked with the KubeVirt team together to kind of provide like an interactive display of KubeVirt. So you can actually go to killercoder.com slash KubeVirt and try out KubeVirt in the Rosa, which is kind of cool because killer Coder runs on KubeVirt, but you can then also try out KubeVirt. So it's kind of like a sandwich. Cool. So in video, we've made a few contributions in the community and we've had a really good experience. I've maintained a few pieces of the codebase. So I'm actually one of the KubeVirt maintainers. And so a few different areas that I've contributed to like, we care a lot about scale and performance and probably these guys do too. And so we, being that we were a large end user, this has been a particular interest of mine. And so we actually run a, we have a six scale group that we run, we meet weekly and we talk about how we can improve and maintain performance and scale and KubeVirt. So it's been a big area of focus for me. And then we've, I've also had a lot of good collaborations with them. So we've contributed VSOC into the community. We worked with collaborative with Google on actually building VSOC. We have also worked with Red Hat. We collaborated on actually a new API that's in KubeVirt. And those released, I think it was in 050 as an alpha. It's called virtual machine pools, something that we really are excited about consuming and think about it as like, you know, in AWS, you have auto scaling groups, kind of an idea like that. And you might also be thinking, like, okay, this maybe sounds like a few other components in our APIs and Kubernetes. And you'd be right, it actually, it's inspired by a few things like stateful sets and replica sets and deployments. And we took little pieces of that and we made it into something that's specific to virtual machines. So the kind of a one liner of what VM pools is, it's like, it's a way for you to manage a large number of similar virtual machines that are pets. So that's how we, that's one of the useful case for us since that's how we, you know, want to manage our workload. Thank you. Did any of you migrate to KubeVirt from another virtualization platform? If so, from what and why did you choose to move to KubeVirt? And how did you do it? Did you build the new cluster from scratch or maybe use some tool to do the migration? And what can others that maybe want to do it can learn from your experience and maybe the difficulties that you had? Sure. So our original first generation architecture, like I mentioned, is was heavily VM based, exclusively VM based. And so for us, it was when we were looking around and we saw Kubernetes and the idea of the shift towards containers and microservice models, a lot of appealing things about it. And so what was important, though, like I mentioned earlier, is maintaining that investment. Like we really like VMs. And so we really want to maintain that investment. But maybe we'll consider, you know, moving towards a container based approach at some point. So for us, we looked at this and it made a lot of sense to take our existing investment and actually move towards this new technology that we like. And KubeVirt sort of provided that opportunity for us to continue using virtual machines while also entering the Kubernetes ecosystem. And then slowly, you know, we build more pods, microservices and eventually kind of try and make our way towards, you know, more cloud data approach. Great. So we actually started about now six years ago with an open stack deployment was the first way that we were deploying virtual machines and giving customers access to that. We didn't really migrate a huge amount of those VMs over to KubeVirt. A lot of them we were able to build net new, which was really, really handy. But a few of the kind of long running customers that we had, we did a really boring project with them where we provisioned a new VM and r-synced some data over and restarted some services. So luckily it was small enough and a really good technical customer set that allowed us to migrate over. There was some marketing and messaging around it's more secure and wouldn't it be great rather than us having to work out how to get open stack style images over into KubeVirt. I'm pretty sure it's possible, but it didn't feel like the engineering time was going to be well spent versus r-sync. Yeah. So we didn't have to migrate from someplace. I mean, we just started development like two years ago. And so we only had the fun time of we weren't able to use KubeVirt directly from scratch. Lucky you. How can we all learn from each other? It may be KubeVirt learn from the end users, maybe features that you want us to add, like the most exciting feature that you want us to add or some bug that really disturbs you and you want us to prioritize. And also end users from end users because you have a lot of a comment and you can learn from each other. I think I heard Dinesh talking about some collaboration with NVIDIA about PESRO in GPU. So maybe we can start with you. Yeah. I think getting involved in the community calls, I think for me, unfortunately, most of my collaboration with the community happens at KubeCon. It's a really nice place for me to actually talk to people face to face. And I'm not being bothered by Slack or what I am, but not as much as when I'm in the office and things. So Ryan and I had a really good conversation at the Solutions Showcase. We were talking about what we're doing. So at SIVO, we're looking to introduce GPUs into the virtual machines that we're offering. And talking to Ryan about his experience about already doing this and running it at scale. It was really, really valuable knowledge. And then I think some of our ideas and use cases are slightly different to yours and being able to think about the stuff coming up in 127, you said. Yeah. So one of the things that specifically we were discussing is, so his use case is he's got machine learning workload. Yeah. And you're passing through GPUs to guess. Yeah. So, and do you for us now, Nvidia, we do, we pass through GPUs, we also will provide guests VGPUs. So, and one of the things I would recall is that at some point you may also want to pass through VGPUs or devices. Absolutely. So yeah, in our use case, we've got a data center full of some A100s, which we're really privileged to have access to and offer to customers, but not everyone needs the power of A100s in one go. So we're interested in, can we create some VGPUs and pass that in to AVM, but only as a customer requires it. And then afterwards we can pull that GPU out and offer it to someone else. And that's currently not possible, right? Well, so it is possible. And there are some challenges though, and I can speak to them. So specifically today, if you want to do this, and as you use the device plug and framework, and in the use case I'm talking about here is what you want to think about is like, we want to be able to dynamically switch devices from different drivers. We want to, you know, we want to be able to set it up for pass through. We want to set it up to as a media device or whatever to set it up as a VGPU. So different drivers for this stuff. So the issue it becomes when you use the device plugin is if you can imagine we have two users that want access to this one physical GPU and user one comes along and they request a physical GPU. So we pass it through to the guest. So for a period of time they are using this device and then they're finished. So at that point in time when they're done, the next user comes along and they request, let's say, a VGPU. So now what do we have to do? We have to change the configuration of the device to make it available. So you can run into some problems here because as part of the device plug and framework, there's an allocate API, which is great. It gives us the ability to provide this device for the guest. The problem that you run into is when you want to do this dynamically is there isn't a deallocate API. So there isn't a way to say, oh, okay, we need to, at this point, the workload is done. We need to change the device to look like something else so that we can pass it through to our providers of VGPU for user two. And so this becomes a problem where you can actually, for a period of time, we actually have an internally, we have a technical term for this, we call it leaking the GPU, where basically there's a period of time where we actually lose track of the GPU because we're we're trying to switch the device driver, but we actually don't know the right time to do it, even though we have a user asking for it, we haven't really advertised that this device is now available to use so we can switch it. So one of the, one of the, there's some exciting work. There she's sitting right here. So our colleagues of mine, Evan and Lazar and Kevin Clues have been working on DRA and the community and something that we're excited about. And this is something that would allow you to actually, to do this dynamically, where in 127, a dynamic resource allocation, and in 127 is where you'd want to look to consume DRA so that you can do dynamically provision and allocate and de-allocate these resources so you can switch the drivers and set up the devices so that you can advertise the devices correctly without having this, this leak. You can leak the GPUs. Yeah. And I don't think I want to lose a GPU just in case it goes walking somewhere and I never see it again. But, you know, I think these conversations that we're having at the solution showcase were really, really interesting for me and I learned, I learned a lot from that one conversation. So the community around this project is really, really friendly. They're really, really open. And I think for me, I know I need to get more involved so I can put my feedback and almost my requests and my spin on it so that the project can be used by a large set of users. I mean, for me, all I can say that if anyone is interested in using CubeVert for a new project, for use case, I'm happy to talk about it. Also, why we decided for it or against it or compare it to other technologies out there, et cetera. So I'm definitely up for anyone to chat about it. No requests for us. Feature you want. Nothing annoying. Cool. We're best. You are great. So we have just five more minutes. So I think it's a good opportunity for questions from the audience. So if any of you have any questions. Thank you. So you said you have a couple of long running virtual machines. Can you or do you live migrate those machines during maintenance? So currently at the moment we have a limitation on our side that live migration is not supported on our particular implementation. So at the moment we cold migrate them. Fortunately for us, 95% of the workload that we run and our customers run is our Kubernetes nodes. So we have the ability with our managed product that we're able to reach into the tenant API and do a graceful migration of workload before we cold migrate the node for VMs themselves that make up that 5% workload. They receive a graceful shutdown and a startup. Our kind of thought on this is it's a cloud. So we can get away with it, which is very fortunate, but we're working over the next 12 months to give that live migration support. It is supported in Kiva at the moment, I think. Is that right? I don't know if anyone is using it from the panel. Yes, it is supported, of course. So we don't use it. We don't use live migration right now. And there's various reasons that you can imagine. I mean, you have a hard time live migrating physical GPUs, and so there's a limitation there. We also use bridge networking. There are also a little bit of limitations with doing that. There is an ability to, so there is actually a proposal to actually do this with bridge networking, but because we want to maintain low latency, it's difficult for us. So we sort of live with the idea that since our workloads are short-lived, so to speak, we're okay with that churn. We just kind of take advantage of when there's low churn. We make our notes unschedulable, and then we do maintenance on those notes when all the workloads have been evacuated. So just wanted to say that, yes, we do support live migration, but it depends. For example, if you use a bridge networking for your main pod networking, so currently it's not supported, yes, there is a PR that was sent from the community to maybe enable it. But if you use masquerade, you can live migrate. If you don't use the pod network, just secondary interfaces also, you should be able to live migrate. Hello, how do you manage the virtual machines creation? How do you actually create the virtual machines? Can you tell a little bit more about that? I mean, so in the end, you install the KubeVirt operator in the Kubernetes cluster, and then you have custom resource definition. So you can create new resources, for example, a virtual machine instance. It's kind of like creating a pod, but you provide some other details. For example, let's just for simplicity say the path or the URL to a disk file for the virtual machine. And then all the magic happens in the background, and KubeVirt will create a pod, a virtual launcher pod in which the virtual machine runs. So on what we then simply do on KillerCoder is we simply create a Kubernetes resource. Yes, in our case, we have a Golang application that uses the Kubernetes client and simply creates a resource object. Yeah, they're very, very similar for us. So we have a build process where we create raw images for things like Ubuntu and Rocky, Debian and Kubernetes distributions. We push them to an OCI registry. So very similar. We distribute those to all of the local computers for fast build time. And then we use CDI, the containerized data importer, to just create a new PVC and spin up a VM. We have operators that run inside our Kubernetes clusters, which respond to a custom resource to create a virtual machine. So the way we create virtual machines is we actually have a, I guess you call it a service or an operator, I don't know, controller. And that will be responsible for creating services, creating the VMIs. And the way you can imagine this is like we have different loads on our system at different times. So we have the service that understands, like when's the right time to start ramping up and getting these sessions available to us. And then when's the time to sort of scale back and bring the load back down in the cluster. So we have another service that handles it. It interacts directly with the API. It's an authenticated client and similar to what you guys were saying. Yes, maybe you can take the mic. Yeah, thank you. I would like to know how do you roll external traffic to the virtual machines? Which solutions do you use? Are they load balancer services or do you use a layer 2 network? We use, we've got overlay underlay networks. We use an open Kubernetes for our network and we will attach different interfaces to our target and this is how we'll try to know. Like you use Kubernetes services? To attach them. So yeah, we use open Kubernetes to do this and we use network attached definitions to actually attach the... Yeah, yeah, got it. So you use external CNI plugins, additional layers for the motors. Yes, and yes, we use motors, yes. Yeah, got it. We've written our own CNI to create isolation and do routing in from the public internet and give isolation. So that's how we've done that. How is your CNI called, maybe others? May I use it? It's a kind of proprietary at the moment because of what we're doing and providing that isolation. So happy to say it is a CNI. It interacts directly through the CNI interface and it just is very similar to any of the other ways that you call it, but it does some obviously magic behind the scenes to give that isolation and the routing in from public networks. Do you use like main CNI plugin or a traditional attached through the network attachments? So it's something, it is a network attachment that we use. So you don't use the primary pod network at all? No, we don't use the primary pod network. Thank you. Yeah, for us we use the default Kubernetes networking or like we have in some clusters, I think running Weave and others Calico and not too much customization other than isolation with, for example, network policies. Thanks a lot. I don't know if we have more time. Sorry. Well, we'll stay here at the end if you did want to come up and ask the questions.