 Hi, everyone. My name is Anton Smith, I'm Director of Product at SpectraCloud. I'm going to talk to you today about something I find extremely interesting, and I hope that you're going to enjoy it too. It's going to be all about on-prem costs and how to slash them and unleash the beast. So let's jump straight into it. A little bit more about me. I'm a networking nerd. Basically, all my career, I've spent working with networking and other infrastructure-related technologies. Last three years or so, I've been working on Kubernetes and OSS. I spent some time at Canonical where I was actually the product manager for MAS. That's why I love talking about MAS or Metal as a Service. I love Stone Temple Pilots, and I'm currently the Director of Product Management at SpectraCloud. Okay, so looking at the agenda, what I've got for you today is, first up, let's actually explore the control plan conundrum. Why do we have this problem with bare metal as opposed to what we have in the cloud? After that, I've got a solution section where I want to talk about some of the different ways that you can solve that, and then I'm going to show you a demo of one way to solve it. So in order for this discussion to be more meaningful, we should also just revisit why are we here? Why are people doing on-prem and why are we talking about on-prem and bare metal? I won't go through all of these. I just want to touch on a few reasons, some of the biggest ones. I think the biggest one is probably around data sovereignty, data security, and compliance. This is simply where some companies or organizations don't have any choice. They actually need to have it on-premises. Take, for example, COVID data in Iceland. They don't have any data centers from Google Cloud or AWS in Iceland, so they actually had to process it on-premises. They were not allowed to take it out of Iceland. There's performance and latency as well, processing things near to where the data is generated, intellectual property protection. It's another big one. And then finally, organization, culture, and skill set. In short, there's many reasons why an organization might choose to do on-prem. Now, the trick is when you do on-prem, how do you do it well and get all the benefits that we get from the cloud? Now, just taking that one step further, we should also talk about why bare metal Kubernetes. There are quite a lot of reasons and performance is not the main one. There are actually several other reasons that are more important to do or why it's better to do bare metal Kubernetes. Now, the first one is that resource efficiency is better. You're not slicing up hardware into VMs and then putting them together again and making a cluster out of it. You should be able to run around 10% to 20% more workloads using bare metal instead of VMs. The total complexity is a bit lower because you don't have multiple orchestrators at work and in some cases, perhaps competing with each other. So take, for example, VMotion trying to do things at the same time as your Kubernetes cluster is trying to organize resources. Every host is guaranteed to be a physical machine and that's good because you have more direct access to the hardware. The cluster is able to read data more meaningfully. There's no sort of shield between it and the actual hardware, meaning that when you're on-prem and you have to manage all of that hardware, it becomes easier to get visibility on it. You get very fine-grained control of resources, so you can use the Kubernetes resource management capabilities as they're intended without something under the hood, perhaps CPU oversubscription or noisy neighbor where you have VMs next to each other that are actually competing for resources and you don't have to do things like CPU pinning in order to deal with that problem. So with that said, I think that there are many reasons to do bare metal Kubernetes. It doesn't mean that there's a right and a wrong and that you have to pick one or the other, but in some cases, people will choose to use bare metal Kubernetes. Now finally, there's another one there, Accelerators as well. It's also easier to pass things through like GPUs and stuff or just use them directly in Kubernetes rather than having to fiddle around with the VM layer. Okay, so there are definitely reasons to do bare metal Kubernetes or bare metal in general, but there are also some big challenges and what I'm going to talk to you about now is one that I call the bare metal control plane conundrum. And if we look at that, that arises because of the way Kubernetes is built. By default, it uses something called etcd or etcd, which is the key value store for Kubernetes and this is found on the control plane nodes in the cluster. These are the ones that decide where all the workloads in the entire cluster are going to go. Now you have two options here. You have a single control plane cluster where you have only a single box. You also have high availability clusters which most people want to have in production and you always need at a minimum three nodes for that high availability control plane. So what does that mean? So let's talk a little bit about what that means in different types of sites. At edge, it doesn't really mean much. You're going to have to run your control plane in your workloads side by side, especially if you only have one or three nodes, for example. However, for medium and large, it's a bit more meaningful and a bit more nuanced because there are different things we can do. At first, I want to illustrate to you exactly what happens when you try to run multiple clusters on bare metal in a situation like a medium or a large data center. Okay, so let's take a look at large sites first. Now, in the large site, I'm just going to go with a contrived scenario of 900 machines and 100 clusters. Now, let's say we go there, we create 100 clusters. We want them all to be highly available, so there's 100 highly available control planes. And that means that in total that 300 of those are going to be control plane nodes. So out of the 900, we lost 1 third to the control plane, and only 2 thirds is available for the workers. And if we dive in a little bit more detail there, then what we're going to see is actually a lot of resource wastage. So just to hit that home, assuming that these servers each have 128 cores and 512 gigabytes of RAM, then that would mean that there's 38,400 cores allocated just to control plane and 150 terabytes of RAM. Now, clearly this is not the best way to use these resources, right? We want as much as possible of these machines to be focused on actual workloads that are related to the business. So what about medium sites? Here it's even more difficult, I would say, because medium sites could be somewhere around 9 to 12 hosts, and I'm going to use 9 here just to really illustrate the point. So at a minimum, 3 of these machines have been lost for the control plane, and we have very limited flexibility and as a trick question, how many HA clusters could you actually build here? Now I'll just show you. So if we take one control plane here with 3 nodes and we take another one, there's only 3 nodes left and we can't actually build any more highly available control planes. So we'd only end up with 2 clusters and only 3 out of the 9 machines would be doing workloads. There's actually a much better way to do this, and I'm going to show you that now. So what we'd like to do is move to control plane pools. So instead of all those red nodes being taken fully one third, we do this. We end up with approximately 5% of the nodes being allocated for control planes. But how can we do that while also observing things like availability zones? Because in a data center of this size you most likely want to have those. So the wanted position in an abstract way looks like this at the end. About 5% of the nodes we want to hold back for control plane and that's entirely up to the organization. I've just picked 5% for today. It could actually be less when the machines are so powerful. And you want the majority of it to be available for worker nodes. And so what you'd end up with here is this slice of the machines would host all of the control planes and this is what we call control plane pooling. So I could put them into a resource pool and then all the rest of the machines will stay as bare metal worker nodes. Okay, but how can we do this? I'm going to talk about three different types of solutions today. One is virtualization, the other is heterogeneous hardware and the other is allowing workloads on control plane nodes. Now with virtualization there are pros and cons to it. So the pros are that you have full freedom to allocate resources flexibly and it's easy to move the control planes around. The cons are that you need a virtualization solution and there's no bare metal management. Most virtualization solutions don't give you a bare metal fleet management system and of course you have licensing and other types of costs associated with the virtualization solution. Now heterogeneous hardware, here you can write size to hardware. So what you're saying, today is I'm going to actually allocate or buy separate machines like smaller boxes which are right sized for the control plane function. They're not big 128 core things. The problem with that is that it requires managing different types of hardware which is going to mess up your logistics and your spare parts management for example as well as being inflexible. So if on day one I said I want to have 100 clusters and then day two I say I want to have 200 clusters I can easily go and just deploy another set of control plane nodes. And then finally allowing workloads on control plane nodes which is a common technique and is absolutely necessary at the edge where you don't really have any choice. It does give you more flexible usage of resources but at scale in a large data center this is actually untenable. It will create very messy operations and it can create resource contention requiring you constantly to be performing very careful management. So what I'm going to do now is dive a little bit more into virtualization. That's a bit the topic of the day today. How can we do that together with bare metal management? So imagine that I told you that you can just have this sprinkling of virtualization plus full bare metal lifecycle management for all of the machines and without the costs and the complexity. And here's the solution that I'm going to talk about today in a demo. So it's called MAS Metal as a Service. Now the thing that a lot of people don't know is that MAS can actually do VM management as well in a lightweight way. That lightweight way is using LexD another canonical project that allows you to manage VMs on machines on Linux boxes. Now the setup then that we're going to end up is we're going to build a control plane pool which has VMs that are managed by MAS together with LexD and we're going to have another pool where we keep bare metal workers. So this is going to give us the control plane pooling and avoiding the overutilization of hardware for control plane. Now MAS also has an API, a very rich API. It has Terraform. It has a Cappy provider that was built by SpectroCloud and is open sourced as well. And it also has a beautiful UI that you're going to see later on. So when people normally ask me what MAS does exactly which was interesting party conversations I said it does this. It puts operating systems on to and debopses. So imagine I gave you a USB stick and it's your first day on the job at a data center and I asked you to take that USB stick and put Ubuntu on all 900 of those machines. Well there is a better way and that's basically what MAS does. MAS works with a wide range of different hardware it's open source and you can build and provide your own custom images or you can take some of these operating systems where they publish nice images or ways to build images for you. But bare metal is not only about provisioning. Day zero and day one matters but as we know from Kubernetes it's not enough. Day two is actually where it gets really hard it's kind of like running a marathon. So MAS doesn't just provision operating systems onto boxes it does a really wide range of other functions that are all about day two. And the thing is with hardware you have to manage the hardware it has physicalities to it obviously but there are things like hard drives failing or new boxes coming that need to be set up or boxes replaced. So MAS offers all the day two capabilities that are needed to manage bare metal. So the first is obviously that it has automation it has a number of APIs or it provides an API upstream but it also works with a number of other APIs to work with devices so it can manage say the power or do network booting and provide network services with IPAM, DHCP and DNS. It's also very fast it only takes a couple of minutes to completely provision a new box it has inventory management so that you're able to see every single device that is in the data center every single graphics card every single network card the serial numbers, the drives etc and this is very important when you're dealing with say hard drive failures it supports storage layouts and it importantly also supports hardware testing which means that you can test boxes before you declare them ready to become part of the data center. Now it also supports bare metal DevOps so as I mentioned it has Terraform but it also has Ansible, Chef and Juju support which is also from Canonical it has a Python binding which is very nice to work with it provides network monitoring it provides LDAP integration it also has support for composable systems like Cisco UCS, HP Moonshot and more and it also supports cloud metadata or cloud init which is actually some of the secret source for supporting CAPI with Kubernetes. Now lastly it also supports this KVM micro cloud integration which is LexD. Now that KVM management is provided by LexD so there's an integration out of the box when you deploy boxes with MAS you can actually choose to simply enable LexD and so this gives you virtual machine and system container management it's Linux based it has itself a very rich API which is what MAS uses to integrate with it it has very flexible storage and network configurability it's easily deployable by MAS and it's very easily managed by MAS which I'm going to show you. Alright time for the fun stuff demo time so what I'm going to show you today is based in my home lab so I'm going to be keeping it small unfortunately I don't have 900 servers under my desk I have six. What I'm going to do is I'm going to create a resource pool bare one, bare two and bare three these are three physical machines that are going to be added into the control plane pool and on each one of those I'm going to create three VMs and I will use these VMs to draw or create a control plane three times there will be three control planes one, two, three and I will take one machine from this resource pool down here called WP or worker pool M1 through M3 and I will allocate one of those machines as a worker to each one of those clusters so at the end of it what you're going to see is three clusters each one will have a highly available virtualized control plane and each one will have one worker node in it now I will also be using CAPI together with MAS so the CAPI Open Source CAPI project by Spectro Cloud I'll be calling that using our palette tool and this will integrate with MAS API and do all the magic and so at the end you're going to end up seeing this you'll see as I described one cluster here drawn across the availability zone so it has one virtualized worker control plane in each of the three availability zones you'll replicate that three times and they'll get one worker node each okay so the first thing we're going to do is take a look at the MAS interface I'm going to give you a really quick tour what you're looking at right now is the machines page so this is showing all the machines that we have available to us bare one, bare two, bare three as I said I've prepared them I also have M1 through M3 available now all of them are in this status called ready which means that they're actually available to be used and provisioned we can see things like how many cores and how much RAM they have available on them as I mentioned this is my homelab so they're not gigantic boxes but they're big enough for today we can see the KVM section over here which is where we would be able to manage the hosts and create the M's if we want to and if you click on one of these boxes let's take a look at bare one dot MAS we have some information about storage memory how many CPUs it has we can initiate tests on this we can see how the power because we need to power cycle the boxes we can see how that's managed in this case it's done through webhook there's networking information available here and also other system information here now of course we have more detail if we want to go into storage we can see as I mentioned the inventory, all the PCIe devices we can see USB interfaces we can see the results of what we call commissioning in MAS these are a bunch of tests you can also define your own tests but even things like capturing LLDP information from the network to see what else is there is saved here and so all you would need to do is go and look at the details of this test or the outputs and you can see everything that's going on now in this case there wasn't so much because I don't have anything on my network so as well as logs for this data of the machine as you can see I've been preparing this demo so this machine has actually been used and has been recently released back to the pool of available machines so yeah this is the MAS interface and so now I want to start showing you how to actually provision some of these machines so the first thing I want to do is also show you how to filter things so I'm using the tag today demo so I can easily filter on a whole bunch of information here which is super nice these are the six machines as I mentioned now let's grab all of them one two three bear one through bear three and we simply say I want to deploy those machines so remember I need to deploy Ubuntu on them to like bootstrap lexd so we can then create the virtualized machines so all I have to do actually is select register as MAS KVM host with lexd just like that and I'm totally fine with Ubuntu 20204 for today I could also select other kernels if I wanted to do something special or other types of operating systems including custom images I'm not going to do that right now if I did want to provide some cloud init data I can do that here but right now I'm just going to start these machines deploying and straight away what you see is that it's starting to power the machines and as part of this boot process because MAS is managing the network it's actually going to intercept their DHCP request and it's going to initiate pixie boot now all of these machines are configured to use pixie when they boot up and that's how they're going to get their operating system installed okay so after a while about five minutes all these machines have been deployed with Ubuntu which is really cool they're all on and they're all available now they've also all had lexd installed on them automatically for us so we can go straight to the KVM tab here and if we click that we actually see each one of those hosts is available as a lexd host and we can see that they have some resources available that we can use to create VMs so let's check out bare one and go to KVM settings the other thing I'm going to do is I'm going to over commit the cpu a little bit just so I don't have any problems with number of cores I probably wouldn't do this in production but you could if you wanted to I just have to do it because of my lamp now that we're all set up with that I can go to virtual machines and here I can actually start to create machines so I'm going to add a VM and I'm going to call this bare one cp1 I'm going to take two cores, two gigs of RAM and I think 30 gigs of space should be more than enough and I'll compose this machine and then what's going to happen is that mas is actually going to automatically commission it and we will see a new machine pop up in the machine listing page that says it's ready for usage okay so back on the machine listing page we can see that machine that we just created that virtual machine and here you also see very nicely that it's lexd because of the power driver is currently commissioning so mas still does this process that it calls commissioning which is discovering what it needs to log out the machine even though it did actually created itself it was actually created by lexd so mas still needs to go through the same process that it treats physical machines with lexd machines now I'm going to run away and configure all the other machines and then we're going to come back and we're going to see them all in this list when they're ready so here you see I'm looking at the three control plane nodes that we're creating on bare one two of them are currently in progress now if you want to allocate those to a resource pool you simply click here and you can allocate it now also as zones I'm grouping by zone right now so all of these are going to be in az1 so as I allocate those oops that was not the right one to allocate there I want each one of these control plane nodes so I take this and I put that into az1 and then I also take this and put that into az1 now we're going to use the zones later to do placement of these and ensure that the clusters are built across the availability zones the actual hosts themselves I'm not going to put into a resource pool or a zone because we're not going to be using them directly now as for the worker machines that's m1 through m3 here I am going to allocate those so let's put all of them into the worker pool like this like so and they're not in an az yet so they don't move around but I will so let's put them into availability zones as well so m1 let's put that into zone 1 m2 let's put that into zone 2 and m3 let's put that into zone 3 and so now you see these availability zones are starting to populate and so here we actually have everything in zone 1 ready to be used to be part of a cluster now I need to go and complete the rest of this for the other availability zones and I'll be right back okay I've completed that and you can see all the machines currently going through commissioning and testing phase they've all been created and added to the right resource pool and the right az so in a little bit these are gonna be ready and we can start building clusters okay so that's finished they are all ready right now and they're all turned off and just ready and waiting they're in the right control plane in the right pools and they're in the right zones now just a quick flashback to the slide so you can take a look at that what we've done is we've prepared our bare one two three we've got three VMs on each one of those and we've got our worker machines our bare metal machines ready so it's time for us to look at the cappy part of this and start creating some clusters alright so I'm going to use pallet the tool built by SpectroCloud for managing Kubernetes clusters this is the console you can see my testing from the previous couple of days a bit of historical information there about my clusters really quickly here we have the concept of cluster templates cluster profiles we call them I've prepared one in advance called mass bare demo this is going to deploy Ubuntu using mass it's going to deploy Kubernetes 1.28.3 it's going to deploy Calico and I'm going to use Longhorn for the storage so a fairly basic cluster profile here so in order to create a cluster we have to come in here and we say add new clusters add new cluster now in my case I'm going to select a mass type we also have others like AWS etc. but we are focusing on mass today and I have also created something called a gateway but we don't need to talk about that right now and let's give this cluster a name so let's give it the name bare demo cluster one and we go to next now we select the profile which is mass bare demo and we get an overview I have a chance to override some of the settings in here if I want to and now the important part we come to the cluster config so right here we have the master pool and the worker pool now I want three nodes remember we want a HA cluster we don't want the worker capability on there because we're using VMs behind the scenes and here I can actually specify the resource pool and the azs and you see it's picked these up from mass which is great so take the cp resource pool I will enable each one of these because I do want to have it spread across those azs and I'll change the cpu in the memory down to match the resources that I created earlier now in terms of the worker pool configuration here I just want one node so I take the resource pool wp and I'm going to take az1 because remember we had one of our bare manual machines in there and this looks good to me so I can just continue next there's some other options in here like keeping the operating system up to date enabling scans scheduling backups or back control I actually do want something here so I want to make sure that I will be a cluster admin on this cluster once it's finished so I'll add that there and now we'll validate this and what's going to happen is it says it looks good and now I say finish the configuration and what's going to happen now is that it's going to start working with mass the mass that has managed all of these is managing all of these machines and it's going to start creating a cluster okay great so after all of that and you've seen all the little spinning dials and things working while we were watching we'll end up with this which is three clusters all running quite happily bare demo cluster one down to bare demo cluster number three they've got four healthy nodes each which is the control plane three control plane nodes each and one worker node and if we drill into one of those we just take a look at say number three here we can see some more information we can see that it's healthy there's some things going on in the reconciliation because it works be a cappy if you recall we can see the stack here that we deployed we can see CPU and memory usage in total and if we click on nodes we can actually see all three of these here and the host names here bare three cp2 bare two cp1 bare one cp2 and m3 here now the reason why some of these are not matching exactly is because we only said that they're in different az's there's no guarantee that all of the control plane nodes are going to be numbered exactly the same they're in a pool within an az so they can be taken dynamically and so that means that these clusters are nice and happy right now and ready to start taking workloads and we'll take a final look here at mass and we can actually see that all of the machines here in the different az's have been allocated and deployed they're all running a custom image that we've created and this satisfies all of the needs of cappy these are kube8m clusters under the hood and we can also see the memory allocation storage and so forth as well as the resource pools worker pool and control plane so that's a very successful setup of three healthy clusters okay and so with that what we have seen the clusters being built we've created a control plane pool using virtual machines without any commercial software whatsoever extremely simple with all the best of bare metal management as well with complete flexibility to divide things between a control plane pool with virtualization availability zones resource pooling as well as having worker nodes running on bare metal so pretty much what I described to you earlier about a data center where you wanted to divide things flexibly and easily and even in medium sites as well I hope that you've seen in this demo that that was really nice and easy I will also add that we at spectro cloud use this technique ourselves we use it to get better utilization of our hardware and our labs during development and testing and we have a lot of interesting discussions with different customers who are also investigating the same way of approaching things without getting into purchasing expensive licenses for commercial software so again they want the best of both worlds to mix and match virtual machines and bare metal flexibly without the complexity of multiple different solutions so with that I will just pull up this next slide with a couple of QQR codes I would encourage you to try a MAS so go to MAS.io they have a nice tutorial that will let you try it just on your laptop if you're running Linux follow us on LinkedIn we're always posting a ton of articles interesting things not just about bare metal about almost anything Kubernetes and then finally you can find our resource center with the bear icon there and in the resource center all you have to do is tick bare metal and you will find all of the articles that we've written about bare metal which is quite substantial as well as all the other materials that we've created now one final tip add me on LinkedIn as well and send me a message I'd love to hear what you thought about this presentation as I'm always looking to improve as well as any thoughts and general ideas that you had around it and the very final final tip I get my hardware in case you're wondering how I have six servers under my desk I find it on ebay I just find some old small enterprise machines that the type that strap on the back of people's monitors and I just buy those and add a bit of extra memory and storage and then I'm able to manage them with mass which is really nice to have your own lab so with that I'm going to say adieu and I hope that you have a fantastic day and that you really enjoyed this presentation