 Cool. Well, so I thought I would talk today about a nice project that some of the people in our team have been working on, relating to how to do Slurm, which is your conventional batch cued environment, but how to do that in a cloud native environment, but on bare metal. So there's plenty of form for why and how people do Slurm in the cloud, and we'll briefly skip through that. But I just wanted to talk about a couple of the driving motivations. I guess a lot of people will be familiar with this kind of siloed data center, where we have hardware specifically assigned to HPC hardware for big data that the doops and sparks and what have you. And maybe hardware for AI and deep learning as well. And the thing that OpenStack brings us in the research computing space is this concept of consolidation. But if we were to consolidate everything, then we also need to be able to run a Slurm service on this kind of cloud environment in order to bring HPC, AI and big data hardware together under one system. We need to be able to provide a kind of service for a batch queue as well. So this is by no means novel work. There is a huge amount of prior art in terms of creating Slurm in cloud environments, ranging from the kind of snow globe environments where there might be a four node Slurm cluster and it runs some basic MPI or hello world jobs. Up to these sort of fairly dynamic and fluid almost functions as a service approaches to queuing jobs and going off to create things. But when we are focusing on creating a service, I think that there's something interesting in this work, which is why I wanted to talk about it today. It's based on a set of very detailed discussions with the HPC services team at Cambridge University. They manage a set of supercomputers, the largest of which is called CSD3. It's one of the biggest in the UK. And so they have long and extensive experience in how to and how not to manage a service for providing Slurm to a wide range of users. They were very patient with us and they put together a very useful set of requirements for if you want to provide this service, you need to be able to do this. And the team at StackHBC were able to implement something based on cloud native methods that meeting the requirements of the operational experience at Cambridge. So I've got a couple of things to talk about, but I guess that you can assume that all of the usual features of a Slurm cluster are present here. Time doesn't really give us the opportunity to go into any of these at depth. All of this is open source. It's based on OpenHBC v2 and CentOS 8. You can find everything on Galaxy or our blog or other places. It does all the usual things that you would expect of a Slurm environment. It integrates with the other services of your data center. And it also integrates well with things like Infiniband, bare metal compute and high performance virtualization. So all of those things exist. I've got a couple of minutes to talk about the cool stuff. The first bit that we really liked and was probably the major point from operational experience was if I have a thousand nodes and there is a zero day vulnerability announced, I need to be able to patch those thousand nodes as fast as I possibly can. And running a playbook over a thousand nodes is it soon becomes unwieldy and a bit of a bottleneck. So there's a slightly different approach taken in this system in which we use image based deployments. And those are triggered through an update process in which a high priority job jumps the queue in the batch scheduler. And we'll run in a sort of a phased roll out across the system. So we can say that we have a concurrency of 10 nodes. It has to run on all the nodes. And at the front of the queue when those nodes become available, Slurm will execute this job, which will then take the nodes out of the configuration, perform the reboots, which actually behind the scenes is an open stack reimage, brings back the node with the updated master image installed on it. And then it rejoins the configuration more or less immediately. So this was one of the benefits of working with a team who had long experience of operating at scale, is that they knew that this was the sort of approach that had to be taken and everything was shaped around fitting that. The other point that I wanted to show you is what we can do if we deploy things in a cloud native way is that we can bring in some fairly cool other pieces. Stick them together using a sort of parameterized configuration deployment. So we can use things like the Prometheus node exporter on our compute nodes and elastic search or open distro for collecting job queue data from the machine. And then we can actually link that together in order to provide to the users of the service dashboards of their jobs and the times in which they ran, which are then interactive in that they can click through to a dashboard of node exporter telemetry, which is time boxed and only presenting the compute nodes that their job was running on for the duration of the job run time. So the way that looks is if we have a look at the presentation of the data that's gone into elastic, we can see the list of jobs that a user has run. And these the URLs on the screen, they are constructed into this fairly deep linked thing down on the bottom, which my colleague will has hopefully elaborated or annotated. And then we can see we can click through this this deep link into a dashboard which is created on the fly, which will then show things like the typical node exporter stats. But you can see these red boxes, and that is where the job started and completed. So everything is in a fairly user friendly way. The telemetry and the sort of the coarse grained telemetry of node exporter is published back to users who can then see what that what their jobs were doing during the duration of the job. So it's a fairly user friendly way of getting another view on performance telemetry in the system at the place where this is all running. And a lot of the development work has been done in a Cambridge, but also at another system, which is the scalable metal service. And this is a project we're working on with with Verne Global, who have been providing hardware. So what you might call second life or maybe third life hardware, but we have 160 blades for a bare metal compute farm. And the system is available for what you might call good causes. So open source projects that are interested in having access to compute resources, things like perhaps the the CI node pools for open Dev Infra, but other things as well. It's running a bare metal cloud with a sort of a restricted trust tenancy service. And the idea is that that users who are qualified in their projects vetted and approved can make make use of this service and for their own good. So the way this is done is users get to create bare metal VMs or better instances or or virtualized VMs on their project, but they don't get access to things like networks or routers or or other things. They simply get to create compute resource for the system. Coming up next on the on the scalable metal service cloud. We have the ability to create hypervisors out of the bare metal compute nodes so that if there is a requirement or sort of an overwhelming overflowing urge to to create lots of VMs, we can actually create hypervisors on the fly in service of those. We've been doing a fair bit of work with the blazar project to enable projects to book for example 10 10 bare metal nodes for 10 weeks or something like that and then have access to that resource with a guaranteed commitment availability. And we've also been working and evaluating the adjutant project with our friends at the mass open cloud. And I think Christie Nicola is going to give a demo on adjutant in the next session. So I think that's probably everything that my my time will permit a bit of a whistle top and I probably will stop and I probably squeezed a bit too much into into the time. But all of this is going on with our system. The other item of note is that this is 20 kilometers away in Iceland. It is truly boiling over with excitement. Thanks everyone and any questions. Personally, I'd like to see the URL where you stores a project. I'm going to stop the boarding by the way. Sure. I'll post it into the.