 All right, so welcome everyone to another session of our research user group. So today's topic has been requested for a long time, but we never actually got to cover it. So we'll be in a moment and we have Jamie and Scott from G Research to present how they are handling this and teach us how we should do it. Yeah, one, two, three. All right, thanks. Right, let me see if I can work out to share my screen. Don't think I have shared on this platform before. Well, there we go. All right, can you see that before we kick off? Yeah, looks great. Great. OK, all right. So yeah, this is me and Scott are going to talk to you a little bit about bare metal Kubernetes app G Research. So not necessarily telling you exactly how to do it or how the only way it can be done, but just talking a bit about our adventures with it, what we're up to, what we've learned along the way. So in terms of introductions, we've actually both of us been to CERN to visit Ricardo and Co and have managed to get the obligatory photo in front of, I think that's Alice. So we put those next to each other. But yeah, I'm Jamie Poole. Most of you probably know me because I co-host this with Ricardo every other week anyway, although I haven't been around for a few weeks. So apologies for that. And I'm the Compute Platform Engineering Manager here at G Research, so responsible for all things, Kubernetes and batch compute and Calc Farm and that kind of thing. And then I've got Scott here with me, who I'll let you introduce. Hi, yeah, I'm Scott. So I'm a cloud engineer. I work mostly on OpenSack, but yeah. So a lot of that is at the moment, it's ironic. Cool. Very, very brief bit about G Research, for those of you who don't know. So we're a fintech company based in London. We run a large distributed research platform for teams of quants to look for patterns in real world noisy data sets of financial data, looking for patterns for our clients. And currently, we're still, I've been saying this for a while, but it's still true, migrating large amounts of our batch compute workloads from Windows and HD Condor onto Kubernetes and Linux and containerization and all that good stuff. So yeah, without further ado, we'll go straight into the ironic portion of the presentation, which I'll hand over to Scott for. OK, so, yeah, a little bit of a background first of all. So what is ironic? Ironic is an integrated OpenSack service which aims to provision bare metal machines instead of virtual machines. So Ironic supports using vendor specific plugins, which implement additional functionalities such as moving machines between different networks. So yeah, and the main thing for this talk really is to focus on the different states we have in Ironic. It's not limited to these, but the main ones are enrolling, cleaning, holding, and provisioning. So how does it work under the hood? So Ironic is pretty straightforward. So it does IPMI and Pixie and a mix of that and a round disk image, and then it turns machines on and off, moves them between different networks as they move through different parts of the build. So Ironic can be deployed standalone, but most common way to do it, and probably in a production environment, it sort of sits beside other OpenSack projects, such as Nova, Neutron, and Glance. So a bit of background. So Nova is used for deploying VMs like virtual machines. Neutron is your networking, and then Glance is like an image catalog. So yeah, Ironic will use those different services to get images or change networks or whatever it needs to do. So the good thing as well is when a bare metal machine is deleted by the user, it's cleaned, and then it's just returned back into the available pool. Then someone else can just pick it out of that pool. So this is really high level diagram, just to show the sort of enrollment stage that we've got. So if you look on the left there, there's a few open source products we use. So one is Kyobi, which is a sort of sub-project of the Color Ansible project in OpenStack, and that's used to deploy, also use it to deploy new bare metal nodes into Ironic as well. So essentially it's just a bunch of Ansible, and we use Jenkins to sort of orchestrate that, I guess. So if we look at what the enrollment phase actually does, so we go through pre-inspection, first of all. It's just like the pre-stats before you can actually look at the nodes and work out what's going on in there. So we create a record of it in the OpenStack API. Then we set the resource class. We apply some baseline BIOS and ILO settings. The important bit here is the resource class. So a resource class essentially is just like defines like what a node, like a type of node. So you might have like a certain type of GPU node or CPU node or like specialist hardware. You define that as a resource class. It basically says, this is what my server should look like. This much RAM, these disks and all that kind of, all that stuff. So we define all that and we say, this is what I expect these new nodes to look like. And then we go through to the next phase, which is inspection, which is an ironic sort of state. So it will turn the server on, pick the boot into the RAM disk, and then it will discover what hardware is there, check for things like cabling issues and identify the switch it's plugged into. So when we move it, we know which switch to log into to actually move the port. And then it will create those ports in ironic. So what that allows us to do is then basically cross-check between the resource class and what the server actually has inside it. Because if you've got a big pool of servers and when you hand one back and you take a new one, you want to make sure that you're getting the same server back. Well, not literally the same, but one that has the same spec. So once we've done the inspection, we can, we then know where it's plugged in and which switch port. So neutral then move it to what's known as the cleaning VLAN, which is essentially the same as like the cleaning state. And then we go for like a, what's known as cleaning. So cleaning, it runs inside the RAM disk image again. And what it does is it will just boot into it and it will have like a set of steps. They're basically Python scripts and it will just run through those in order of priority. So we do things like update firmware, verifying that the ILO settings are all correct, NTP, set up the storage. We wipe the hard disks and then we check if there's GPUs in there, we check their health as well. Once it's done that, it should be good to go almost. So we just finally run some tests on it. So we run some burning tests and then we move it to the holding VLAN. And then it goes from cleaning into the holding state. That essentially means it's ready for a user to pick it up the other side. So if we just look back at that diagram, you shall understand this a little bit more now. So nodes come in from the left through our automation into ironic API. We inspect them and then they get moved into where they are in the data center. So a conductor basically is a microservice in ironic which its purpose is to sort of look after a group of nodes. So you might have like a common set of nodes or like an area in a data center, like an availability zone or something like that. So they all just get bunched up and yeah. And then it's all ready for people to use the other side. So moving on onto the deployment side. So there's a person there, they will pick a flavor and a network and an AZ and an image. And then that transforms into some novus some stuff happens in our own in OpenStack and then out pops a bare metal node the other side. So if we just take a little bit of a closer look at that. What happens is the user requests the new bare metal machine via Terraform in our case, but you can just do it by the API if you really want to. And then the flavor selected is I think that maps to the resource class. So earlier when I said you got a resource class as like a type of server, the flavor is basically the user defining what type of server that they actually want to pull from the pool. And then the network and the availability zone that they select maps to some sort of location within a data center. That allows you to sort of scale this to quite a lot of servers. So yeah, flip back here. That's just that first bit just up here. So the user selects and they go into the OpenStack process. So the exact side of the process, you hit the Nova API, then that will talk to placement and the scheduler. And that will basically look in the pool and it will say, what's available? Give me the first node off the top or the first hundred nodes or first thousand nodes over many select. Neutral and then go and move all of those into the provision in VLAN. So then they go into that provision in state. So go from holding back into provisioning and then this is a state where we get ready for the user to use. So in machine provisioning, we turn the server on, use an IPMI, we pixie boot into the round disk image and then we apply a few bios settings. That might be like hyper threading on or off. That's probably the most common one but you can configure anything you want as long as it's available via the API. And then we pull the user image from glance. So the user will specify that image when they actually build the machine. They don't want to run the round disk image because that's got all our tools in it and not their tools in it. So in our case we flat car, we could get pulled from glance which is the image service and I can stack. So that basically explains the box there on the right. So request comes in, schedules, Nova compute will coordinate some stuff in Neutron to move it to the right VLAN. And then the ironic conductor will pull the image down, put it on the node. And then from there all we need to do is move the VLAN again into the requested VLAN that the user wanted. And then we just restart the server and then the server will just boot into an OS and then present a prompt screen that the user can log into. That means yeah, then hopefully we've got, everyone's got their metal servers and then they're happy to go and use their fleet of servers now. That is all good until the user then is finished with the server. So the idea between ironic is sort of cattle not pets. So you use a server for a lot of time or however long you need it. And then you can hand it back, go to a cleaning and then it goes available ready for someone else to use. So it's really, really flexible. So yeah, the user deletes the server, Neutron goes and moves the server to cleaning. We go through those same cleaning steps. So if the firmware has changed since users handed back the machine, then that will get updated, wipes all the disks. So it's all nice and secure when the new user gets given the server. And yeah, we checked it. It hasn't been tampered with anything like that as well just for extra security checks. And then yeah, Neutron will finally move it into holding and then it becomes available again. So just to recap on those states. So first of all, you enroll the node into ironic, then it sits there and it's ready for the user to use. And then we've got cleaning between holding and provisioning, really. So yeah, enrolling, cleaning, holding, provisioning. It's about it really, over to you. Thank you. All right, and then more onto the sort of how we use this for Kubernetes and research purposes, sort of things I suppose. So we historically, we've always run Kubernetes in G research on OpenStack. For a long time we've been doing it on VMs, but more recently we've started moving on to building clusters using bare metal. But this process is pretty much the same regardless we just use a different flavor as Scott was talking about. So the way we tend to do it is we define our clusters in Terraform code in GitHub. We then use Terraform Enterprise in our case to build the clusters into OpenStack using ironic. Machines are built and configured using the flat car operating system. And then flat car uses Ignition to pull down, use the data and configure a very minimal Kubernetes installation. So our initial bootstrap task basically gets us a small bare metal server, sorry, collection of servers running a Kubernetes cluster, pretty vanilla. Currently we still do the LCD nodes as virtual machines, but for our larger and higher performance clusters we now use ironic and bare metal for the master nodes and all the work nodes as well. Once we have a minimal cluster, then we apply our more detailed Kubernetes configuration on top. Typically we do that these days using Jenkins or RKCD, or combination of the two actually, sort of undergoing a bit of migration at the moment. And that's just how we then deploy all of our sort of desired state Kubernetes configuration on top. So things along the lines of ingress controllers and Calico and all the other bits and pieces and things that we want to have in our clusters to make them look and feel like our desired GR clusters. Once we've done that, we then deploy Armada. So this is an application which I talked about a bunch of times in this forum. So I won't go into too much detail now, but this is just the overall architecture diagram of the application which we typically deploy on top of these clusters. So you can see here the blue boxes at the bottom of the screen are Kubernetes clusters in this sort of new world of bare metal. These are all high performance bare metal clusters, quite a large number of nodes. We tend to scale up to about a thousand and then one Armada server sitting on top, which allows our users to submit jobs to run on the hardware. A couple of notes just on some benefits we've seen so far. So this is really the reasons for us moving to this model in the first place. So it's still early days, but the sorts of things we've seen are increased stability. So certainly for things like GPU intensive workloads, we have seen some issues when we were running on virtualization that have just completely evaporated since moving to bare metal. It's certainly been a lot simpler than trying to debug sort of kernel level issues within the virtualization layer just to move to bare metal and not really worry about it. Some other benefits we've seen are things like increased network throughput between nodes and external resources. Being able to use BGP peering very easily can be done with VMs, but it's a little bit more complex. For us as well, typically we end up with much larger nodes because your bare metal servers tend to be a bit bigger than your average virtual machine by definition, I suppose. And for us, it's just simpler state management as well. So we have far fewer layers between our workloads and our hardware. Fewer machines, fewer bigger machines tends to be slightly easier to manage than tens of thousands of smaller virtual machines. However, there are some limitations as well. So some of the things we've noticed so far are certainly a slow provisioning time, which is actually completely expected. As you can imagine, when you're provisioning a bare metal server, you're actually basically turning on a real machine and you have to wait for it to power on. With a virtual machine, all of that sort of abstracted away from you, you don't really see it. There's a lot more precise quota management required. I think you can be a little bit more fast and loose when you're running a large virtual estate. You can over-subscribe things and over-subscribe CPUs and things like that. It's much harder to do in a bare metal environment. You're very much constrained by the physical resource you actually have. It is a little bit less flexible in some ways and there's some features of virtualization, which we don't get as a side effect. So things like being able to snapshot of VM are quite useful. We can't do that natively using a bare metal server. So you have to sort of roll something or use some other tool to do that. And we have also noticed in a Kubernetes world, it can be a little bit tricky sometimes to mix and match in cluster, having virtual machines and physical machines. So we've tended to take the approach of just starting from scratch and building new clusters as bare metal from the beginning. We're often trying to add it into existing virtual clusters. But yeah, in summary, for us, we're now using bare metal Kubernetes for our highest performance workloads. We're still also making heavy use of virtualization where appropriate. So for our more sort of classic Kubernetes clusters, if you like, for services and so forth, we're still making good use of VMs. But for the clusters where we really care about performance and we're running lots and lots of high throughput jobs, then we are now moving to bare metal and OpenStack Ironic is our MetaLiz service choice. And I think, is it a little whistled up to her, but are there any questions? Awesome. Thank you, Jamie and Scott. That was a nice, nice summary. Anyone has any questions? Feel free to just go for it and ask. I will have a couple, but I'll link the floor to others first. Hey, I have a question. Did you look at any other tool suites besides Ironic or were you set on Ironic? Because I believe it's an OpenStack project, right? It is, yeah. Okay. We have looked at some other things. We're relatively opinionated about it, I suppose, because we've already got a foothold in Ironic, sorry, in OpenStack using lots of other OpenStack. services, as Scott mentioned. We have actually on, I think independently ahead of Ironic rolled our own MetaLiz service system internally, which does work as well, but it's kind of nice to be able to use the off the shelf open source tooling that fits in nicely with the rest of our ecosystem. Okay. But I know there's other things as well, like MAZ and others that we haven't evaluated at depth, but yeah, Ironic seems to work well for us. Thank you. All right. Okay. So Alex, did you have a question as well? So you want me to add some points. I was just wondering whether they had looked at the, you know, you mentioned the provisioning times are slow, whether there was any looking at pre-provisioning, sort of expected images that you're going to spin up. I know that when we had the on metal service at Rackspace before we switched over to Ironic, that was part of the whole plot, was to pre-spin up these bare metal servers. It was supposed to come back in Ironic, but then that was years ago, but I don't know whether that's actually come back. So it's not something we've used yet. I mean, certainly some things we've looked at in our processes where we can save time at the, maybe you can cover this, but some of the things like bar settings and things where we want to make sure we eliminate the requirement for reboot some things and I suppose during the process. Yeah. So, yeah, there's a little bit of fat we can trim around, but yeah, so the way it's designed is you have one big pool of data of nodes and you can sort of have multiple tenants using that. Where we don't have that, there's some stuff that we can sort of pull out, like for example, the bar settings, they're applied at like a provision time. If they're static, then you just don't have to reapply them every time. You just have to make sure and cleaning that I was tampered with them and then that you can save a bit of time there. Also, there's lots of things you can do around like caching images and all that kind of thing. And actually for us, that's something that has been relatively easy because it's, because these Kubernetes clusters tend to use the same image and then we have lots of them that use the same image. So everything gets cached and it's all kind of hot at all times pretty much. So there's more things if that becomes slow in the future, we can move glance closer to the actual metal nodes, but at the moment we're finding that cache is pretty warm. And yeah, it's performing pretty well. One thing which I noticed, which surprised me actually in my own reaction to it, I mean, is the first time I saw it, it took 20 minutes to build a server. I was like, oh, shit, this is the nightmare. This is gonna make everything really slow and difficult because I'm used to a VM spinning up in 30 seconds or something, but actually when you're used to it and you're doing things at large scale and in bulk, it doesn't really matter if one server takes 20 minutes if you can build hundreds or thousands simultaneously. You actually end up caring a lot more about reliability and being comfortable that your automation will just work and you can walk away from it and come back later and everything will be up and running. It will be much worse if it was faster but less reliable. So I always sort of are on the side of reliability over performance, personally, of the build that is. Once it's up and running, we want performance as well, obviously. I mean, for us, if we have long-running jobs that take days, 20 minutes is neither here nor there. Indeed, yeah. Yeah, I mean, 20 minutes is slightly, out of those who I would say, that's our current experience for a certain type of flavor. It's sort of the order of minutes, no longer seconds, but that way. So is that the mode of operation that everybody want? So if somebody submits to run a particular workload, they get provisioned a particular resource or resource type. It's not that some things are long-lived and people kind of swap or interchangeably use the same standing resource. It depends how you choose to use it. In our model, what we do is we have a bunch of hardware built into clusters ahead of time, which sit there and are used relatively constantly by a collection of different users. So in effect, the hardware is all being, a large pool of hardware is all being shared by lots of different people. We're quite lucky in the sense that we've got relatively, in the ground scheme of things, a relatively small pool of researchers all doing quite a similar thing. So we can be quite prescriptive about the hardware that they'll get. So we have a smallish number of flavors of CPU nodes and similar GPU nodes and potentially in the future, other accelerators. It might be the case that in other companies who do, or other organizations even, who want to do it more like, I guess, offer metal as a service or cluster as a service up to users to actually create their own, that that would be a possibility. But for us, we take the more sort of we provision it, we being the infrastructure and platform teams, and then our users within our organization then just use what we've provisioned for them. But this would let people provisioning their own if they wanted to. How does it work that they submit a ticket and you take care of that? It depends what you mean. So generally speaking, the way we have it is we have these pools of compute, which we understand the sort of flavors and qualities of, and then we have a bunch of tools and software which allow users to then run jobs on them. So it's not a ticketing system. It's really a case of they can just, they are already set up with access to this large pool of compute and then they can submit jobs to then use the hardware as they see fit. So it effectively run jobs as pods and Kubernetes, ultimately is what happens on top of the hardware that happens to be provisioned through our Runeq. Okay. So to get a sense of their time scale, how long do the clusters live and, or if it's like a dynamic cluster, how long does the nodes typically live? Like, six months on a cluster and every few weeks, things shrink and grow. Yeah. So it looks loosely like this. So we actually have multiple of this whole picture, in fact, but if we just look at one of these as an example, imagining this is a data center, we have many of these clusters under here, each one of these clusters itself, but the cluster, I suppose in, in, of itself may even last for years. We might create it, you know, a couple of years ago saying it's still running now and we'll still be running jobs on it. The nodes themselves, we tend to quite frequently rebuild because I think we actually have a bit of a fetish for sort of rebuilding stuff in GR and making sure everything comes back clean and tidy. So we actually have a separate project at the moment going on to ensure we're constantly rebuilding things and making sure there's a maximum lifetime of the actual nodes, but clusters themselves can last for quite a long time. We probably also eventually, I think we'll move into a more rolling cluster rebuild process as well, because obviously that's long lived state itself which could get dirty or out of sync somehow. It shouldn't, but that's possible. But no, generally speaking, the clusters themselves live for quite a long time and then the nodes within them are of the order of tens of days, maximum couple of months. So this architecture is really for at that facilitator level where you're building environments for individuals and you're keeping that, you're moving with whatever the ongoing research is and that the resources are short time scales on the pods and whatever. Yes, the pods. Yeah, exactly. So we've got like time scales then time scales, the pods themselves are anything from seconds up to a couple of weeks, say, and then the nodes and the clusters last for a lot longer and they're just sort of running this primordial soup of user workload. But also just using ironic or any kind of mental services also just a useful thing if people have sort of high performance requirements or just have a different estate management process I suppose. So I know Ricardo and the guys at CERN don't do this model. So we have a model where we create clusters and then effectively offer, you can think of it as like namespace as a service. So that tendency is the thing which we offer people on the existing clusters. Whereas I think, well certainly last time we were talking about it over in CERN they're doing more sort of cluster as a service so people can ask for their own clusters which then may use something like ironic. In fact, do you do that Ricardo? Do you have ironic as an option? Yeah. Underly clusters as well? Yeah, so you do exactly that. You can even have like mixed clusters with node groups or node pools in VMs and additional node pools using bare metal. Makes sense. I had a question because you mentioned it's a kind of follow up for the last one which is you described the workflow with GitHub and then the provisioning using Terraform. Do you also use this for like cluster upgrades or is this like you just really bully from scratch a new cluster? That's a good question. So we tend to use the cluster bootstrap thing is kind of a one-time thing to build a cluster. If we have quite a long live cluster then we can actually do all of our upgrades then from this point onwards, if this makes sense. So things like upgrading Kubernetes itself we have a bunch of tooling to do that. So we can do it in place cluster upgrades even the kubelet on all the nodes as well because that itself is containerized. And similarly then operating system upgrades we can do in a rolling fashion because we have this model where here underneath the long lived cluster the nodes get rebuilt sort of sequentially underneath the cluster with error budgets and so forth so that we don't do the whole thing at once. But we have options. We can also if we want just completely blow, coordinate, wait for stuff to drain and then blow it away and rebuild it all if we want to do upgrades. But yeah, we tend to just, I guess the separation we have is Terraform tends to be used for the node, the cluster slash node build process and everything afterwards is through. All right, Jenkins. Oh, okay, that's so cool. Okay. And the other question I had was like there's quite a lot of activity in the thing to kind of manage the clusters from as if they were Kubernetes resources and then just build on things like Argo to kind of make everything uniform. Is this something that you've looked into and is because I was just searching now for the integration of like metal as a service components into a cluster API. Is this something that would simplify or that you would not consider? I would definitely consider it and I'm very excited about it and I would like to do it at some point but it's just never quite been up the priority list enough for us in our world. I think what it would end up doing is effectively replacing Terraform. Yeah. Basically, we would be going straight from GitHub. Well, I suppose we'd have something to bootstrap our initial cluster somehow and then cluster API would then go off and talk straight to OpenStack. But hopefully everything we've already done would then continue to integrate nicely and we would just use that directly. So I think it's really a question of how well supported OpenStack is by the cluster API. Great. I haven't checked recently but yeah, it would be very interesting to do that. Certainly a limitation we found, not of bare metal specifically but as soon as you get to large scale Kubernetes or any configurations in fact within Terraform, it is a bit slow. It has effectively maintains a big graph of resources which it has to walk the graph every time you make any kind of change and especially when every resource is actually a remote thing has to go off and be checked then you can imagine that ends up translating into a lot of API calls which can be quite slow and expensive. So if we could turn that to something a bit more elegant using Kubernetes itself, I'm all for it. Makes sense. I'm ticking here if there's other questions in the chat of someone on that. There you go. When you, I have another question. So does your team manage the networking equipment and do you have like kind of broad control over that or do you work with the networking team? We have a networking team who's more responsible for that. So within our organization we have a few different functions and different areas responsible for different things. So we've got an infrastructure function and a platform function, me and Scott from both of those respectively. There is a team within the infrastructure function who deals with networking specifically. But what we're definitely finding is having more cross-functional teams is really powerful. So I've got people in my team who have got really strong networking skills and understand that kind of stuff including down to the hardware. And I actually suspect over time we're probably gonna need to develop some kind of special cross-functional team that just looks at performance and tuning of the estate basically because we need to be able to do it all the way from top to bottom really especially when we're now dealing with metal. We actually need to understand how everything's configured all the way down to the BIOS. Yeah, we use a lot of, but we have a lot of bare metal and VMs and there's this friction with the networking team, professional disagreements maybe over how the switches should be managed. And I saw like in your presentation you were switching VLANs at the beginning it seemed like you had a decent amount of control on top of the network. Yeah, I think we do. Our networking team is quite sort of up speed with everything that we're doing as well. I mean, like you say though there was always friction sometimes between teams because different teams sometimes operate at different rates and then we've got responsibility shared across groups, it can be tricky but we've got quite a sort of singular purpose at the moment. So there's a large project happening at the moment which involves a lot of this stuff. So we've got a lot of people with different teams all working together to make it happen. So it's quite nice. And these clusters are fairly large. There are dozens or hundreds of servers. Is that what you were saying? Yeah, I mean up to about a thousand nodes in the given cluster. We could go, we've actually decided I would generally sort of stop about there but that was in fact one of the reasons for the Armada architecture so that we could have many of these things because we're aware that past a certain limit Kubernetes can't really scale much further. I think the official limit is still 5,000 nodes but I know from, I think anyone knows from going to conferences and things that you have to do quite a lot to get that far and then over backwards. So we have a model where we just go to about a thousand and then just plug in more clusters horizontally and scales quite well that way. And is Armada a G research project or is that? Yes, yeah, it's open source but yeah, it's come from G research. So yeah, there's a probably further back in the list of meetings, there'll be a recording of some stuff we don't specifically know this if you're interested. Yeah. I have one more. You mentioned the issues with GPUs and stability and improvements by moving to bare metal. Yeah. Were you doing PCI pass through, I guess in VMs and do you remember which specific issues you had and how did they? I think we were, yes, yes, please. I can't remember the specific issues but we were basically getting unexpected errors, things being reported as not a number and that kind of thing, just mathematical errors which shouldn't have been happening under quite niche circumstances as well when we were running out of memory and you'd have to have a few different failures happen in a certain way but somehow we managed to always hit this scenario quite frequently we just thought, well, hey, look, rather than try and debug all this, let's just see what happens if we run out of bare metal and lo and behold, the problem went away. So sometimes it's just not worth sort of. Yeah. I ask is because we have been seeing simulations with virtual machines recently. Right. And yeah, that is a tempting solution. I guess it makes sense, doesn't it? If you think you're just going through this whole extra layer that maybe you don't really need to and there's a lot more software involved, isn't there? Ultimately. Yeah, the issue is how many GPUs do you have per node on average? For us, up to about eight. All right. It's the issue is that for Kubernetes clusters, this is easy to handle but for if you have a mix of VMs and Kubernetes clusters using those GPUs, actually virtualization allows you to like expose it on a multi GPU node quite easily. Yeah. While if you just dedicate like bare metal nodes to people directly as you would do with VMs, then you basically like potentially giving them a really nice way to waste pressure resources. Yeah, that's true. Yeah. I think that's the reason why, like for Kubernetes clusters kind of no-brainer that you can go bare metal for GPUs and just schedule directly on. Yeah. All right. That was pretty cool. I have one question. I don't see anyone raising like, I guess the question that a lot of people will have is if I have a bunch of nodes arriving on a new data center or whatever on premises, what would be the suggestion if I just want to do Kubernetes on bare metal, like what's the best option and the least complicated option to get stuff up and running sort of? Yeah. Quick question. I mean, I don't know, because I know what we do and we've obviously quite opinionated about using OpenStack and Ironic. I'm sure, I think one thing that probably can be said of Ironic and OpenStack is it can be quite complex and it's probably quite difficult to get up and running. So there's two projects that you probably, if you want to have a go to look at, which sort of lower that barrier to entry. So one is called Bifrost, which will allow you to just sort of run like Ironic from just laptop. That's good for bootstrapping new environments where you don't already have a control plane. And then the other is what we actually use here to deploy all of our OpenStack, which is Color Ansible. That is basically a collection of Ansible roles and basically a lot of the hard work's been done for you. So a lot of it just kind of works out the box and you can deploy it in an OpenStack really easily to like a couple of ends on your machine or if you've got a couple of bare metal nodes, you can deploy a control plane there with relatively little OpenStack experience. Tuning it and getting it to a large scale is that takes a lot of time and experience and working through issues and that kind of thing. But yeah, if you want to get started, barrier to entry is not really that high on either Bifrost or Color Ansible really. I'll be really interested if we could do some kind of questionnaire for a wider group, obviously our group, but as wide as we can get it to find out when people are using metals and service products, what they're using. Because even knowing what's out there is a challenge sometimes. Yeah, I think that comes back to this idea that we've had for a while, which is to do these recipes for different sorts of workloads that are kind of specific for research environments. I think the deployment on-premises and on bare metal is something that is not super common maybe because like most users will be using public lab providers or some sort of commercial virtualization solution that is already available. So I don't know, for research institutions that have a bunch of notes and want to get something up and running, maybe there's something we can provide with some ideas or pointers. I guess we don't even have to have a recipe for it as I do this week, but we can say we as a collective have done these things. We know that they work, or it can be made to work. She wouldn't mention that in this presentation, all of our compute is on-prem. I suppose anyone using a cloud provider can also just use bare metal through whatever they support as well. I think they all do now. I think that'd be very valuable for the university community. From what I can tell most of the bare metal clusters are hand-created on various methods. So that's what works and what works well would be definitely useful. Yeah, I think as soon as you do it, any kind of scale in the hand crank method just sort of doesn't work. Spend your whole time doing it. I guess the dream is really this idea of the cluster API where you put some effort into the bootstrap cluster that you do by hand, but then everything else is kind of coming out of magically via the cluster API. I don't know how far it gets because then you still need this kind of metal as a service component somewhere. Yeah, there's enough bits of surrounding infrastructure you need still. Would have to be set up by something. Maybe some of this will become more ubiquitous as time goes on, I don't know. Yeah, but yeah, so maybe we take this as an action just to send around like a survey like we did the last couple of times with asking specifically about bare metal deployments. Yeah, that'd be good. Also, for those going to QCon actually to do some research there will be really valuable. Yeah, actually we don't have a talk this time for the group. I don't know if there are other questions on the topic. I'll give probably got time for one more if there is one more otherwise I'll stop sharing. Think we're good. Five second rule, fine. All right, thanks a lot again, Jamie and Scott. That was pretty interesting. We don't have anything else today. So one thing I was gonna mention is for QCon we still have another session in two weeks but we don't have talk this time but we should probably just circulate like a lot of lunchtime or something where we all get together. Yes. It all started in Barcelona so we might as well get together. It is, yeah, I'm not actually going, I'm a bit gutted. I've got three people from my team who are gonna be there. So I'll make sure I send them your way. Still escaping this jam session, I can see. I know, I'm sorry. Saving it off for Detroit. Oh, that sounds good. It blues. All right. That's, I don't have anything else for today. And if anyone else wants to raise something. I was just gonna say, I've realized I need a haircut as a result of this. Cool as well while looking at myself. So I'm gonna go and get that one next time. All right, otherwise we have a container stage in two weeks and after that QCon. So yeah, thanks everyone for attending and we'll follow up also in the Slack channel. Thank you everybody. Great to see you. Yeah. Thank you everyone. Bye bye. Peace. Yeah.