 So what am I going to talk about today? So to begin with, I'm going to introduce you to something called micro VMs, explain what they are. I'm then going to talk about the liquid metal project, which is maintained and sponsored by Weaveworks. I'm going to talk about why the liquid metal project and micro VMs are useful. And then I'm going to demo the thing. So let's begin with an introduction to micro VMs. So let me just expand my notes a little bit bigger. There we go. Micro VMs are exactly what they sound like. They are smaller VMs. They are smaller subset of virtualization tailored for a specific need so you don't have any unnecessary overhead. This makes them almost, almost, as fast to put up and tear down as containers. So if you think of your standard VM, it has to be ready to run any operating system for any possible use case. You don't know what people are going to throw at it. It's going to be ready for anything, which means the hypervisor has to do a lot of work and allocate a ton of resources to accommodate any of these possible scenarios. With a micro VM, you tell the hypervisor to basically half-asset and only set up the exact things that you need. So if you don't need any hardware devices or you only one or two, you only do those things. So micro VMs are designed to give you the best of both worlds. So VMs give security. Containers give you speed. So in this case, you have the speed of almost as fast as a container with the security of having your own kernel, your own operating system. I know containers have your own operating system, but here you have your own sandbox environments to do whatever it is that you like. And because you're excluding a lot of unnecessary functionality, you have a lesser attack surface anyway. So those are micro VMs. So what is the liquid metal project and why a micro VM is relevant for it? So liquid metal is a set of tools to declaratively provision Kubernetes clusters on lightweight VMs, so micro VMs. So this is built and maintained and sponsored by Weaveworks. And it's comprised of these four components that I'm going to go through right now. So the first one is Flintlock. And I'm now regretting not having any notes here because I have to keep looking up to see what's on the slide. So Flintlock creates and manages the lifecycle of micro VMs. So it's a GRPC server it's written in Go. And it runs on a bare metal host that can technically run anywhere, but here I'm talking about bare metal hosts. And you say, I would like a micro VM. I would like a ephemeral throwaway environment started out really fast to run a very specific use case. And Flintlock will handle that for you. See, it's too small here. So this Flintlock can technically be used independently of liquid metal. And I actually do this myself sometimes if I want to create a micro VM real quick to run some tests. And I don't really want to use a container because it doesn't give me enough functionality for the test I want to run. I'll quickly spin up Flintlock. But it is primarily designed to run Kubernetes nodes. And it works with cluster API provider micro VM. So CAPMVM is a CAPI provider. It's an infrastructure provider. So when you're creating a brand new CAPI workload cluster, you say, I want to use CAPMVM. I want to create my Kubernetes nodes in micro VMs on bare metal hardware. So this as well as the Flintlock components, they're both open source in the WeWorks liquid metal organization. And yeah, that's all I've got on that slide. Sorry again. And then that comes to Firecracker and Cloud Hypervisor. So these are both open source VMMs. Firecracker created by AWS, Cloud Hypervisor by Intel, they're written in Rust, and they're basically the core process executors. They're the things that are creating the micro VM. So Flintlock is wrapping either one of these two. So you can choose the one you want. And this will, they're based on the KVM in Linux. And so Flintlock will call out to either one of these and create starter process, which will run as a micro VM. And then once that's booted, you can then start your Kubernetes node. So the last component I want to mention is ContainerD. You all know what this is. You've seen it in Kubernetes and Docker running containers and stuff. In our case in liquid metals case, we use it to pull down images. So when you start your micro VM, you say, I want this kernel. I want these kernel modules. I want this operating system. ContainerD goes on fetch as those and prepares them for mounting into the micro VM. So let's just go through what I said, but with pictures. So this is me, you, whatever. Talking to a CAPI management cluster. Let's pretend it's on my Dell and it's a kind cluster. I've got CAPI controllers running. I've got CAPM VM running. And so I apply my manifest and I say I want my three Kubernetes nodes, whatever. I want a cluster. CAPI and VM will look at my list of bare metal hosts and it will go and talk to each one of them. And it will go and talk to Flintlock, the GRPC service running on each of those hosts and say, hey, I want a micro VM. Flintlock will say, okay, it will go to ContainerD. They say, all right, I need this kernel. I need these kernel modules and I need this operating system. Go get them. ContainerD does that. Once that's down, one too fast. Once that's down, FlintlockD will talk to Firecracker and Firecracker will then boot the process. So we'll start a process. It looks a lot like container actually, because as you know, containers are just processes. They're not actually real. So when you're looking at it on the command line, it looks like a process. So Firecracker will start the micro VM process. At which point, I assume we'll know how CAPI works, but CAPI will have set up some user data to bootstrap the machine into a Kubernetes node. That will run. The Kubernetes node has started. And then I can use it as like a cluster doing stuff. Whatever you use a cluster for. So use cases. Why would you want to use liquid metal? Why are micro VMs cool? Why all of this? So these are all kind of related to the first obvious one is edge computing. Micro VMs have a lower footprint, so you kind of can just use them at the edge. We have lower resource environments. That makes sense. Similar again to number two. So I've got like a homelab here. This is my Raspberry Pi setup. I'm going to get on to that in a minute. I know this is why you claim you didn't come to listen to me talk about it. Like it's coming, it's coming, I promise. Bare metal again. It's all really related I know. I was just trying to write the list, right? And the last one is an actual practical use case, which is CI cell-posted runners. Because if you want to maybe run test more efficiently, but containers aren't quite doing it for you, this is actually a very practical use case. You don't have to do it on Raspberry Pi again. This is a toy. You don't have to run on Raspberry Pi, but you can do CI runners on actual larger systems. So demo. It's a fake out. There's more stuff, don't worry. There's more slides. We'll get to the demo in a minute, okay? So yeah, this is the use case. This is a proof of concept, and I cannot stress that enough. It is a proof of concept that I wrote just for this conference. So I'm using cell-posted give hub actions, and I've got a liquid metal cluster running on these Raspberry Pi boards. So a bare metal liquid metal cluster, and what I'm going to do is I'm going to run some example CI jobs on a mixture of ephemeral ports running inside my liquid metal cluster and dedicated ad hoc micro-VMs. There are some links people who are watching the slides later. So what components am I using? I'm using something called the actions-runner controller. This is an open source project. I can't remember these stuff, so I believe it's sponsored by HelloFresh. It's now part of the official GitHub Actions organization. At least I can see what it is, because I can see the URL moved. And it's quite cool. It lets you create a runner deployment where you can have a pool of ephemeral runners that are just in pods. They will be registered as runners in GitHub Actions. And when you start a job, it will run in the pod, and then it will self-destruct afterwards, and you'll get a brand new pod. Very exciting. The other component I'm doing, this is the one that I made for this demo, is the micro-VM action-runner. It's a proof of concept. It's an HTTP service, and it responds to a GitHub webhook. So it will create ad hoc micro-VMs. So whenever you run a job saying that you want a dedicated micro-VM, it will spin up a new micro-VM on one of my boards, run the job, and then kill it afterwards. I had hoped to do this really nicely, like have a controller with maybe hot pools and some scaling, but I quit. We've worked like a month ago, and then I haven't actually had time to do the thing, so I'm really sorry. You get the HTTP service, but dream big. Imagine it's going to be really cool. Well, cooler. Okay? So benefits. Why would anyone want to go through the hassle of setting up a separate CI system? Why not use Jenkins? So there's a lot of benefits. I've only had a certain number of space on the slides, so I'm just crushing old one. So CI infrastructure is kind of, I don't know, when you think about CI infrastructure, it's kind of a bit of an add-on when you are in a cloud-native software shop. We use Jenkins, we set the CI, we use GitHub Actions, and so it's kind of old-fashioned. There's a lot of traditional legacy infrastructure going on there, which is a bit of a bottleneck. Micro-VMs are actually sort of more performant, and because they have a lower overhead when it comes to setup, that you can get higher, faster feedback on your tests, and you end up using, obviously, because they're smaller, you have a high utilization of your runner infrastructure. You're using more of the box that it's running on rather than starting up a huge VM, which takes up more room than it needs, and you end up, your tests end up using very little of it. So traditional runners have a long wait to the spin-up times, Micro-VMs reduce that. Traditional CI provides speed at the cost of complexity and or safety, because yes, you can run your CI in Docker, I know GitHub actually gives you this, but it kind of encourages Docker in Docker, or encourages privileged host access and other work arounds, which isn't massively secure. As I said before, Micro-VMs provide the flexibility of container builds with actual sandboxing, so you don't, it doesn't share, they don't share kernels with a host and they don't require any privilege. They have everything that they need to test all the crazy stuff that you're doing without anything interfering with the host. CI infrastructure often requires that runners are available from a hot pool of nodes, because I know I said earlier, traditional CI takes a long time to spin up, but you're thinking, no, it doesn't, if I have my Ubuntu VM, it ready immediately. Yes, because there's a hot pool running all the time. And if you think of, so GitHub Actions, for example, because that's what I'm using, but I think they're using Azure or EC2 VMs, I can't remember which, they are running constantly to accommodate hundreds of thousands of builds a minute, probably. So this is stuff running all the damn time. Micro-IMs, you don't actually need that because they start fast enough that unless you need an answer within three seconds, you can wait 20 seconds for the thing to start, then run and you're golden. You don't need all these massive VMs just waiting, spinning for something to pick it up. Micro-IMs let you test things that require low level things like init system, system D, other low level kernel things. Containers and many other CI runners don't let you do this. For example, when I was testing Flintlock itself on GitHub Actions, I couldn't because they don't enable KVM, I couldn't do that. So that was a pain in the ass. But now we don't move Flintlock to test Flintlock itself. So I can test Flintlock on this, which is actually really neat. And other things if you're running tests and kernels of eight, like you've got an eBPF program. So you can run those in Micro-IMs, but you can't run them in containers or a lot of traditional CI. So I need to drink now. Okay, there we go. Nikki, am I speaking slowly enough? Sorry, she needs to check for me. Okay, other benefits are environmentally related. Because you've got faster build and start times, you're actually reducing your average time to results. So that's always nice. As I said before, warm pools aren't necessary unless you desperately want them. Even if you do have them, they're taking up less space so you're using your equipment or efficiently. And again, because you're using the right size environment, the smallest environment for your tests, you end up reducing the overall cost because you can fit more Micro-IMs onto the same hardware as you would regular VMs. Build caches can be shared between runners because they can be mounted, which ensures we're not constantly writing on changed dependencies to disks. I know that HAC can get her back to do this, but it's not by design. It's HAC. And finally, just that Micro-IMs require lower disk usage in general because they basically act like containers. So you've got all the kernel layers, all the operating system layers being snapped off and shared. So you end up having less on-disk to begin with. Okay, here's another picture, still no demo for you yet. I've just reiterated what I've shown you before, so it's fresh in your mind. So what we've got here on the left is my laptop, which is the Dell, running my kind management cluster. It's running Cappy and CathMVM. Then I've got four Raspberry Pi boards. These are all Model 4Bs. They're running Ubuntu 20.04 because that's one I like. They're four gigs of RAM with microSDs, so they're a little bit slower, simply because I didn't want to bring more stuff like SSDs on here, but yeah, we'll live with that as long. So three of these boards are running a liquid metal cluster, which I triggered before I got on stage just because the demo was quite long already. I've only done one node per board because I don't want to, it's very, this is a very low resourcing, four gigs of RAM, there's not a lot going on there. Really small microSDs. So I've done one node per board, not just not to push it. And more importantly, this is fun. Again, this is fun. This is a toy. Don't use this as a professional setting. Okay, so interesting stuff about the network if anyone cares. So when micro VMs start, they need an IP naturally. I could just connect it to something to let it connect to the Wi-Fi and get something automatically allocated as they come up from the conference Wi-Fi, but I wanted to control it, so I have a DHCP server running on my Dell. And the IPs are allocated from a private range in my dedicated VLAN, which is configured by this managed switch. So each micro VM gets a Mac VTAP interface with that IP, which is parented to the board's VLAN interface, which is parented to the board's Ethernet interface. Traffic from and to VMs is then forwarded via NAT rules, which are configured on the Dell. So any traffic coming in on the VLAN subnet on these boards will get sent out through the Dell Ethernet and then back again. More pictures. So this is a zoom out of the hackery to get this working. And this is what nearly broke me this morning when I tested it out to 10 a.m. and it did not work because the network would not connect because my hack originally began because I've done this at previous conferences before. So the hackery originally began as a Wi-Fi extender which would connect to conference Wi-Fi and then I could plug in an Ethernet cable to the switch, then further Ethernet cables to all the boards in my Dell. But the Wi-Fi extender trick wasn't working. Unfortunately, it was saved by the AV guys who showed me there was an Ethernet cable here. So the day was saved. You don't have to watch a recording because if you watch a recording, then how do you know it's using boards at all? It could just be using EC2 and it was all, could have all been a fake out. Fully, no, it's a fake out map. Actually, I could be using EC2. These could be lights that I just turned on. Like you don't even know. You can come find me after just to verify if you really want to. So why did I go through all this drama? Why did I not just connect to the conference Wi-Fi and be done with it? Four reasons, one, I wanted my own private network. I wanted to control the NAT traffic. I wanted to control the DHCP, the IP pool. I wanted it all to be private. Second one is when the liquid metal cluster starts, it needs the API 7.8 and IP. I need to know what that is in advance because I couldn't be able to do DNS or whatever. So I couldn't exactly go and ask the conference people, can I look at your router to see where free IP is in your pool? That would be awkward. So this way, it's come from my private network. I know what it is, that's fine, that's fine. Third reason is I wanted to power the boards over ethernet. So I'm using a power over ethernet hat for each board. I don't want to have a mess of cables turning on. So each board is getting power out of the same cable, which is really cool. And the last one is, as I said before, the microvm's, when they start, they are created with MagVTab interfaces and MagVTab interfaces do not work wirelessly. They require a wired connection. If you want to know more about that, come find me after, bring a cookie, and I will answer the question. Also, by the way, I apologize for this really attractive shoe box. I kind of wanted to lift up to show it. And my running shoe box is kind of all I had. So, sorry about that. Okay, so I created the cluster ahead of time. If you want to see how that actually works, I did, I demoed that in Kubernetes Community Days. It was in the UK in November. And I should have linked it here, but I forgot, really sorry. So this is what I'm going to be showing. So the first job is going to be running in a pheromone pod. So I'll trigger a job in GitHub Actions. I'll trigger a job in GitHub Actions, which will go to the action queue, which will be picked up by one of the runners, which was started by the action runner controller. It will run the job, stuff destruct, crazy you are. Easy. The next one, I will ask the job to be run in a dedicated microvm. So I'll trigger the job in GitHub Actions. I'll get picked up by a webhook, which will be sent to the microvm service, which will talk to Flintlock on the top Raspberry Pi board. And it will create a dedicated microvm registered as a GitHub action runner for the job. It will run the job, it will self destruct, and we're done. By the way, the jobs that they're going to run, I did have them initially running like Go tests, but then you have to wait for like the Go stuff to download. And so it's just echoing like a string congratulating your test pass. So we get to feel good about it. Okay, yeah, let's demo the thing. Let's do it. That's the backup video, which you don't have to watch. So let me just, you know what? Actually, I'm gonna mirror now, I think, so I can see what I'm typing. No, that's not what I want. Can someone like, hum a theme tune to something while I'm doing this? Just like some mood music? Where are my display settings? There we go. Mirror. All right, I'm gonna, how do I make it bigger? Like that, no? Does anyone know how to work Ubuntu desktop? I don't usually use this. Not that. Oh, there we go. Zoom in, be bigger. Is that big enough for everyone? Can people see stuff? Cool, okay. So lucky you, this is all scripted, so nobody has to watch me type. So I did this ahead of time, but I'm just gonna show you what I did, so if you're following along at home, these are the commands that you do to get this up and running. So I've created my kind cluster, and yes, that's exactly how fast kind gets started on this machine. I then set some cap MVM variables, and I've done my cluster cut all in it. You all know how this works. And yay, that happened really fast. Not even cert manager took that long to install this time. Amazing. I'm setting some more settings now for my workload cluster. So this is the cluster that will be applied. Let's pretend I haven't done it yet. That will be applied to my Raspberry Pi cluster here. So there we go. I've set my IP, I've set my worker machine count, set my control plane count. Just one control plane, two workers, nothing massive here. I've applied it, and yay, it works. Cool. Let's have a look. At what that is. So this is your workload cluster manifest if you're creating a cap MVM cluster. Most of it, you already know if you've done CAPI. Key points are this is my list of bare metal hosts. Right now, we've only implemented static pool allocation. So I had to actually name my addresses of where my flintlock host is running. In future, we're gonna have some sort of clever scheduler where it will know exactly when new machines come up and down so it can find those bare metal ones and put more nodes onto it. So it's gonna be cool. Other things to note is, oh yeah, I am using an image registry because this is not my first time doing a conference demo. I'm not doing a single image from the internet today. So we see that we've got our replicas, we've got one control plane, we've got two workers, and here are some settings for our micro VMs. So you can see that I've specified my kernel, I specified my operating system. So that's all very exciting. Let's go back out of there. What's next? Oh yeah, so this already exists. So I'm gonna go get the secret, and there are my nodes, and let's use canines to look at our stuff. So there is my liquid metal cluster running. We can go have a look at that actually. So these four panes are SSH onto these boards. Again, you're gonna have to trust me on that. That's not EC2, it's not anywhere else. It's literally right here. It's these ones for real, okay? So that's definitely what's happening here. So the top left is the bottom one, so let's Raspberry Pi zero, then bottom left is Raspberry Pi one, top right is Raspberry Pi two, and bottom right is Raspberry Pi three. So I've got the control plane running here on Raspberry Pi one, and you can see these are the boot logs of Firecracker starting the micro-VM process. It's not massively exciting, it's just Linux booting. But here you can see that it was registered as a Kubernetes control plane. And then these are the two things that, these are actually showing the directory structure of Fluent-Norke State, dear. It doesn't prove anything, it's just a directory, but this is how I sort of check to see where things exist at any given time. So we've got a worker node there, and a worker node here. This one I've got, so Raspberry Pi three in the bottom right, I've got nothing going on there because that's the one I'm gonna use to do at hot micro-VMs in just a moment. So let's go back here, and let's use our cluster for stuff. Obviously I'm gonna bootstrap it with Flux, I'm not gonna sit here manually applying manifest to my cluster. So this is where we actually put the network to the test. So what this is gonna do, it's gonna install the Flux components, it's gonna install the action runner controller or the arc, oh, there it goes, stuff doing. Again, local registry. It's also gonna install the micro-VM service, and yeah, so talk amongst yourselves, it should be quite quick. And luckily it's behaving today, cool. So all components are healthy. So I did this with Flux because I was in a hurry to get it working, you could use something else. There's lots of GitOps tool like Weave GitOps that you could use to do this, which will give you some nice visualization as well. So that's the cert manager coming up, the action runner controller will use that. And let's give it a moment. I should have had some background music for this. At least my K9's color scheme is pretty right. So there comes the action runner controller. I think down the bottom, yeah, so the micro-VM action runner down there at the bottom is already running, so that's cool. I just need to wait for that to begin, and then it will deploy some runner pods whenever it gets round to it, there we go. And now if we jump back to, not that, go away. Where's my internet, there we go. We go back here, what do I actually want? So I want my, not that, action runners. So there are no runners configured yet. So if we keep an eye on this, we'll see runners eventually pop up. How am I doing for time? Okay, I'm fine. What are you doing? Okay, nothing, it's doing nothing, great. Are you up too? I don't know, it's got everything it needs. Just thinking about it. These, I wanted to show you the pod one first because it's actually exciting than the micro-VM one, but the micro-VM one is actually ready sooner. Conundrum. What else can I show you while I'm waiting? Do you wanna see some network interfaces? Yes, no? Yeah? Okay, let's do that. Let's look at that. I'd better as soon as I click over, it's gonna magically become ready. Okay, so if I control C out of there, and I look at that. So as I said before, so this is all, we're on Raspberry Pi one here. So this is the VLAN interface there, which is parented to the general ethernet port. And then here we can see there are two interfaces that were created for the micro-VM. So one is the Mac VTAP interface here that was created and parented to the VLAN interface. So this is what the micro-VM gets, it's IP and internet access and all of that. And this is another one, which I think is actually the metadata service. I can't really remember. I think it's that. There we go, now it's not ready. God damn it, what are you doing? Fine, I'll do the micro-VM one first. Be like that. Oh wait, no, it began. Okay, fine, fine, cool, it's fine. Everyone focus up again. Okay, stop doing, get off your phones. Cool, so we've now got a micro-VM, we've now got a pod runner. There it is, it's offline, get online. So this is actually something that I learned while I was doing this, is GitHub is, I mean, you all know this, not massively reliable, right? Which is a great thing to use for a live demo with lots of moving parts, very high risk. Anyway, let's trigger a job and hope it comes up eventually. So we go here, I wanna run an Arc workflow. So what this is gonna do is gonna create a pod, no, a pod already exists, it's gonna run the job in the pod that should be online. Be online, there we go, it's active, it's running, it's running the job, how exciting. So we look at that, it's actually gonna be really quick because it's literally echoing a line. So this is running inside the, yes, thank you, I was quite pleased with that. So yeah, okay, the job already ran and yes, I am awesome, so they passed. Right, let's go on to the next one. So if we go back into here, you'll see that the original pod is destroyed and now we get a new one coming up. So that's fine, it's running tests in a pod on a liquid metal cluster, that's less exciting. We're gonna now look at the micro-VM, the ad-hoc micro-VM one. So we're gonna go to run test with micro-VM, we'll run that, do a thing. So that's gonna start. And if we go to here, oh yeah, the previous one died, don't worry about that. It should come back though. So that should be running there. Oh no, you know what I forgot to do? I forgot to port forward the service. I could have been doing that while it was waiting. God damn it, do the thing. I don't know if it's gonna pick up what I did, so I'll just try it again, why not? Do it again, do the thing. Please work. I still look like Ngrok, by the way, I'm doing a hack, I'm using Ngrok to manage the service and the web hook because I can be able to set up DNS. And I really hope that works. It's done something. I wouldn't worry about that 502, because that's just because I, oh yeah, there we go, created micro-VM. You'll get to watch me have a panic attack in real time. And now we look at Raspberry Pi 3 down here, we can see that a new micro-VM has been created just for this job. If we go back to our runners, we can refresh that. And now we have to wait for the micro-VM to boot, the GitHub action runner to register itself, which is reliant on the GitHub action API, which is just what you want. So we go here, we can actually watch it boot all the way down to Firecracker standard out, follow that. It seems to think it's fine. Go up. Yay, it did a thing. Runner successfully added, though, where are you then? There we go. Oh, it's active, it's already been picked up. See, literally, I click away and it's fine. It's doing its thing. Do the next bit. Wait, did it already run? I didn't, there we go. Yes, there we go. My test pass on a dedicated liquid metal VM. The runner should have self-destructed, but don't worry, that's the pod that came back. Really slowly, what was that doing? And if we look back here, my micro-VM should be being deleted. Oh no, it failed, never mind. Let's pretend that didn't happen, it's all cool, right? Don't worry about it. Okay, I think that's the demo. How do I get to the next screen? There we go. Okay, you did that earlier. I've actually got a slide that tells you when to applaud. Okay, so learnings. As I said, GitHub is not in the least reliable. A couple of times when I was testing this, I had a known issue where runners were available. Jobs were very good and yet they didn't notice them at all. So I was sat there and I was like, okay, when's it gonna pick up? Then I would go and do a tour in the house and then 20 minutes later, it's like, oh, now the job runs. Excellent, well done, GitHub. So there are a couple of merits. I was like, should I be using GitHub to do a live demo like this? I went for it anyway. When I was experimenting with all this, I wrote a 40-page document analysis on how this could be more efficient than running, not this, not Raspberry Pi, but a general bare metal liquid metal setup could be more efficient than using your standard enterprise CI package. And theoretically, it is more cost-efficient if you set it up carefully. But again, there's then the overhead of maintaining it. So it's kind of swings around about so whatever your organization can afford to do. But it is, I don't know if you noticed, but I keep seeing loads of comments saying, we're moving away from cloud. We're back to on-premise, like really? But going back to that time, it seems to be hot these days. So I think this is a really interesting thing. There's lots of other tools that are coming out that are using micro-VM technology. And liquid metal is one of those options. Now you can clap. So here's some documentation. The one on the left is how to get this billed. The one on the right is the liquid metal documentation. You can run liquid metal on any bare metal service. I test it on Equinix sometimes, or I test it on my Raspberry Pi setup. But yeah, that's all I've got for you. And I am literally 30 seconds until the end, so there's no time for questions, which I'm devastated about. Come and find me afterwards. Thank you so much.