 Okay, welcome everybody to this talk at KubeCon, CloudNativeCon Europe 2021 virtual. Despite having been involved with the CNCF for four years, this is actually the first time I'm speaking here, so I'm very excited about it. So the title of today's talk is Taking Bear Metal to the Clouds with Tinkerbell. And let's jump into what that's gonna look like. So I'll explain a little bit about myself. I think one of the questions that a lot of people are gonna have is why would people be interested in running their own hardware anyway? So I can share something around that. Assuming you do wanna do that, I wanna explain how Tinkerbell fits into the picture, which is sometimes a little complex. I'm gonna look a little bit into why we decided to open source Tinkerbell from Equinix Metal in the first place. There's a lot of new exciting stuff in our 050 release that I'm gonna go through. And then last but not least, I'll talk about how people can get involved if they want to or if they want more information. All right, so a little bit about me. I'm the senior director of developer relations at Equinix Metal. Some of you may remember Equinix Metal as packet. And up in the top right, you can see this little logo that I really love, which is the packet box with a sword riding on top of the Equinix Metal fortress. Now, I was involved in the CNCF from 2016 until December 2020. And I put the little wink there because when speaking with Priyanka, she said I could call myself the CNCF marketing chairperson emeritus for a little while. I'm not quite sure how long that one's gonna last. I live in North London. And importantly for this talk, until I started at Equinix Metal, I hadn't touched any hardware really since about 2001 when I was working on gaming PC. And we're gonna look at some of the things that I've learned in this talk. So why are people interested in running their own hardware again? Well, actually it's not a game. People always have been. But I think there is the perception in some quarters that those companies still running their own infrastructure are what the crossing the chasm model would describe as laggards. But this may not be as it seems. Public Cloud adoption has accelerated development of many technologies that can help companies to change the way that they deploy software more easily. Kubernetes, of course, is an excellent example. But there are actually many reasons why people would be running on-prem hardware still. I'm gonna look at some of those now. As I see here, despite the apparent hegemony of public clouds, most companies run their workloads across a wide range of infrastructure. And there's a few really good reasons for that. One is kind of obvious. Any company that's been around for a while will tend to have infrastructure of various types. They may have had large on-prem deployments back in the day. And they've moved some of that to the cloud, but some of it has remained for various reasons. Acquisitions are a huge driver of hardware heterogeneity in that if you acquire a company, you're often faced with a choice, which is, well, do we want this company to spend the next two years using our cloud or our way of deploying software? Or do we instead want to focus on making money? And very often, the answer is the latter. So what you end up with is through multiple acquisitions, companies end up with lots and lots of different types of infrastructure, including on-prem, but also multi-clouds. And I'd argue that one of the primary drivers for hybrid cloud is actually acquisition. If you're running an Amazon and you buy a company that's GCP shop, sit on your hybrid cloud or multi-cloud, I should say. And last but not least, many companies have requirements that aren't best suited to public clouds, whether that's around custom hardware, whether it's around compliance in their jurisdictions, whether it's for performance reasons, special workloads that they have. There are lots of good reasons to run your own hardware, but we'll see that it does come with some challenges. Also, edge computing is something that I've been watching for, feels like forever now, since 2017. And often it was conflated with 5G, but I think what we're starting to see this year in particular is edge really becoming a thing finally. So we're seeing all sorts of companies wanting to put hardware closest to their users, whether that's in your office, whether that's in your store, whether it's in your baseball stadium. There are many reasons why having a data center of sorts near to your user would be a good idea. And of course, what that means is that you're going to be running that in a way that you perhaps wouldn't be able to do in a public cloud. You're going to be running your own hardware. And I think what this creates is sort of interesting. You know, there's this sort of idea that anyone who didn't go to the cloud is potentially a laggard. But really what we're seeing is that now there's really an opportunity for those who perhaps were using on-prem because they decided not to go to the public cloud or because they had a special reason to, or often because they went to the public cloud and then found that actually at the scale they were running at, it made more sense to come back off the public cloud and to run on-prem. And what we're seeing is that with some of the developments, Tenkebell is only just one of them, there was a real opportunity here for those laggards, if you will, which I obviously don't agree with, to become the early adopters of a new wave of cloud-native innovation. Taking advantage of a lot of the tooling in the cloud-native ecosystem to be able to run their own hardware at scale without having to use a public cloud. And of course, I'm not the only person who's noticed this. You'll notice that all the public clouds and a lot of the major software vendors are creating their own hybrid, multi-cloud and edge tooling. So top left here, we have SAP's Gardener, Google and Fast, VMware Tanzu, Azure Arc, Red Hat OpenShift, Cubimatic, a smaller player, but an important player, there are many of those. And then of course the big news from the end of 2020 was Amazon's EKS anywhere, which as they say, allows you to run EKS in your own data center. Now there's a reason why they're all doing this and it's because of some of the reasons that I alluded to earlier. Running your own infrastructure is sexy again. I'm gonna look at some of the challenges of that. I think in particular, if you look for example, at the Google Anthos of web page company, a project that we've done a lot of work with, you'll see that when you look under the requirements, they list a bunch of servers, like go buy these servers from Dell or Supermicro or whoever you want. And then you can run Anthos, but as we'll see in a moment, it's a little more complex than just getting those servers shipped to wherever you want them to be, if you're gonna go ahead and be successful with this. Okay, so you wanna run your own cloud now. Great, how does Tinkerbell fits in? Well, first let's start a little bit further out. I wish I could tell you that Tinkerbell is a one-stop shop. I found that I was Googling one-stop shop and I found this. Apparently this is a chain of stores in the north of London where I live. I've never been in one. It's convenience store, but there we go, one-stop shop. I wish I could tell you that Tinkerbell was a one-stop shop for running your own infrastructure, but it isn't. It's a complex stack, as we're going to see. And Tinkerbell is just one part of that. And later in the talk, I'm gonna talk about why we specifically focused on that part of the stack. But for now, let's take a 50,000-foot view of what it looks like to actually run your own cloud. So I wanna first emphasize that this is a massively oversimplified view. There's a lot of stuff going on here. And you can probably imagine that at every layer of this sort of proposed stack, there are many different vendors, all sorts of options. And some of the layers I've totally missed out just for brevity. But let's start at the bottom. So one of the things that's often overlooked, if you're using VMs through Amazon, for example, is real estate. Those VMs have to live somewhere. And if you're gonna be running your own infrastructure, you're going to need real estate to put them in. Now that would be fine if it could be your bedroom. But if you're running at any kind of scale, it's gonna need a bunch of other things that maybe don't fit in your bedroom. So for example, air conditioning becomes very important. And air conditioning is, of course, an entire, dates into air conditioning, it's an entire industry that is very mature, has its own vendors, has its own best practices, and that's something that you're gonna have to get good at. On top of that, you need power. And that can't necessarily just be the power that's coming out of the plot in your bedroom, but you're probably gonna need backup generators and all sorts of things. Again, depending on the scale of your deployment, there can be smaller deployments that would work perfectly fine with normal house electricity. But when you get to a couple of racks, suddenly that's becoming difficult, especially if you need to make sure you're hitting those lines and being able to deal with things like brownouts. Networking is kind of obvious, both internal and external. So your machines are all gonna need to speak to each other, which means they're gonna need networking. You need switches and routers and everything else. But also externally, you're gonna need to get whatever that data center deployment is connected to some sort of internet connection or some sort of connections that you can get onto the internet so that you can communicate with the rest of the world. Again, at very small scale, that might work in your bedroom, but your ISP might be a little upset if you start running data center scale bandwidth out of there so you might look for a different provider. Perhaps most obviously you need servers and server racks. Now, all of you will have seen servers and all of you will have seen server racks either on television or in person. We don't have time to go into it today, but server racks, even though they seem like pieces of metal that you slide things into, are themselves areas of great disruption. And one of the projects that we're very excited about at the moment is Open19. Equinix and a few other companies have open sourced our chassis designs for server racks. And we've made the Open19 foundation in the Linux foundation so that others can benefit from that, but we'll come back to it. Once you have a whole bunch of servers and switches and racks and everything else in there, you're gonna need data center infrastructure management called DCIM software. Now DCIM software can be seen as a sort of catalog of all the things in the data center, a kind of metadata service, but they also take care of often monitoring things like heat, utilization and other stuff. Coming back to the networking idea from earlier, you're gonna need IP addresses. And the last time I run a data center, well, I didn't even run it. I was part of running it, but it was a long time ago. And we used track IP allocations in a spreadsheet. Now, again, it's not ideal, but it works at small scale. When you get to the largest scale, you're gonna need IP address management software. That means that you're not giving out IPs to the wrong people and that you can sort of defrag the IP space and make sure you're making the best use of that. If this is a commercial offering, a commercial cloud or a cloud that you want to be able to measure the usage of, even if it's an internal project, perhaps with internal billing, or just so that you can give parts of your cloud to different projects within your organization, you're gonna need metering and billing. What this basically means is I need to know how much you've used of what. Next up is server provisioning. So when you have all of the below in place, you now need to take those shiny new boxes you've got in your racks and you need to make them do something. And that basically means taking, essentially an inanimate server and getting into a state that somebody can use. Now, we're gonna talk more about this later. Then we get above the water to the bits that people are typically familiar with. You need a user-facing API that has access controls and a whole bunch of other things in there so that people can actually request resources from this cloud or this data center you've created. They're going to want operating systems on top of that. And those operating systems need to both be repackaged to make sure that they work with the hardware that you have with all the correct drivers. But you're also probably gonna need to enter into relationships with operating system providers to make sure you can get the latest images from them and so that you can conform to many of their usage contracts. And then last but not least, and of course, there's a lot more on top of this, but there's a deployment layer. Most people want to put something on top of those servers, on top of the OS. So it could be that you have a virtualization layer, you could have a containerization layer exposed through Kubernetes, whatever it is, you may also be responsible for this part. Okay, so where did we decide to focus with Tinkerbell? Well, Tinkerbell is specifically around what we call server provisioning. Internally, we often call it bare metal provisioning, but it's this layer of the stack that takes your servers, which are networked, in racks that got power, they've got networking, they have real estate, they've got everything else. And now you need to take them from being what is essentially this inanimate object and provision them in such a way that they do something useful. You may want to, for example, update the firmware. Now, all of your servers will have all sorts of firmware in them. And at any kind of scale, you don't wanna be walking around your data center with a USB stick and a keyboard, trying to update all of those. So having a central way where you can address those servers and ask them to perform update tasks is very important. Perhaps more interestingly for the end user is that those servers only become interesting really once you put an operating system onto them. Very few people are gonna be interacting with these before they have an OS, but there's a few things you have to do before the OS. So for example, you may need to format some disks, you may need to set up some RAID configurations and a whole bunch of other things, then install an operating system and then present it to the user. And we're gonna talk about why we focused on that bit a little bit later, but now we're gonna look at exactly what Tinkerbell does. So focused on the idea of provisioning servers, it's a little bit too much for this talk. But what I would point you to are two specific resources. First on the right, Alex Alice wrote an incredible blog post with a new stack in 2020 where he talks through the actual net-booting example of taking a server which doesn't know what it is, who it is or what it's supposed to do and get it into a state where we can deploy an operating system. So I'd recommend reading that. You can also go and look at our Tinkerbell 101 video which takes you through specifically how Alice that works. But what I'm gonna do now is give you a high level of view of how the various parts work. So if you imagine that you have a bunch of servers with Tinkerbell deployed somewhere in that network and they want to do something, the first thing they're gonna do is boot. And when they boot, they're gonna send out a DHCP request. And this is where Tinkerbell gets involved. Tinkerbell is gonna intercept that DHCP request through a service called boots and it's gonna start something, it's gonna use iPixie to start something called net-booting. Now essentially the way this works on the backend is that you have your fleet of servers and they're probably all in some sort of DCIM, data center infrastructure management tool. And in Tink, what we have is a list of hardware definitions. So the hardware definitions say, okay, if a server with for example, but it doesn't have to be limited to, if a server with this MAC address sends us a DHCP request, I want to run this provisioning workflow and that's the way we map that too. So in the happy path, we get a DHCP request from a server and we say, we have a workflow that matches that MAC address. We know what to do with it. The workflows are relatively simple and they all live in Tink, the element you can see at the top. Tink will then say, great, I have a workflow for you but I need a way to be able to execute commands on that machine to get into the state it needs to be in, which is where OZ comes into play. OZ is our OS install environment and it's an in-memory operating system that we run on the server so that we can run all these other actions so it's up to date. Now, Hegel is related to this, which is that it provides the metadata we need to understand what the server is, what it's intended to do, maybe what some of the adjacent factors are, but essentially what we're going to do is just run OZ on the server controlled by Tink to run actions like I mentioned earlier, which would be format the disks, I don't know, all sorts of stuff, but essentially probably install an operating system so that you can use it. Once that's done, we're going to use the service PB&J, Power Boot and Control Service. So restart the machine and it'll then come up into the fleet in the configuration that the user wants it to be in. And at that point, it will be ready for use. Now, again, that's a massive oversimplification of how this works, but you can check out Alex Alice's blog post and also go to Tink about 101 to see this in a little bit more detail. But essentially what we've done is we've gone from a machine that is waiting for a purpose to a machine that is installed in the way that the user wanted it to be and we can then deliver it to them. So everybody by now will be familiar with the idea that software is eating the world by Mark and Jason, but when I hear this, I hear something else. What I think is that essentially what's happening is that the amount of hardware that can be eaten by software is expanding and then software is extending to fill those gaps. Now, the reason why we want things in software is obvious. It means we can experiment more quickly. It means that we can share our work. It means that we can build on the work of others. But when we were looking at the stack for Tinkerbell, we specifically thought, well, a lot of those other areas have reasonable solutions already or are too far away from the user to be interesting. So how do we open up that next bit of surface area that people will be interested in hacking on? Now, a related question to this is, well, then why would we submit it to the CNCF? Well, so we open sourced it in, I believe, May of 2020, and we didn't get into the CNCF sandbox until something like October or September, September, October, maybe November, I can't remember, end of 2020. The reason why we wanted to do this is it's pretty clear. If you're going to be the piece of software that is this critical to your stack, you need to know that the project you're relying on has open governance, and you can get a seat at the table to influence the roadmap. I'm sure there are libraries out there that people use every single day that are open sourced, and they don't really think about the governance model that much, but if it's the software that's going to turn your servers into something your users can use, then you're going to care about that a lot. And I can talk later about the incredible uptick and adoption that we've seen in the CNCF, but broadly, that's why we want to make that decision. All right, so we've gone from, why would anyone run their hardware in the first place? We've seen how people running their own hardware are actually the beginning of a new curve of adoption here. They are the early adopters of a new wave, whether it's hybrid cloud, Edge, on-prem deployments that they need for specialized workloads or other things. We've looked at what Tinkerbell does and where it sits in that stack. We've looked at why we open sourced it, but now I want to talk through some of the exciting new features that have been announced in April around what Tinkerbell can do for you. So you've heard me say operating systems a lot of the time, and that is just one of those key workloads that we enable is getting a server to the point where you can put an operating system onto it. And we've done a lot of work in Q1 2021 to make the process for adding new operating systems a lot easier. And what that has meant is that we can now install pretty much anything you want. Now, out of the box, you'll see Red Hat, Windows Server, Debian, Flatcast, CentOS, Ubuntu, NixOS. I believe Alma Linux is in the works and there are a few others as well. But the reason why that's happened is because of the Crocodile project. And you'll see down there the GitHub link for the Crocodile project. Now, Crocodile is a tool that makes it a lot easier to package and deploy operating systems using think about. So the point where once we get the first few working is often less than a day's work to get the next ones working. This is a significant improvement where previously it could have taken days, weeks, sometimes months to get these things working. And now this is all open to the community so they can add their own as well. Another area where we've been focusing is the cluster API. Now, as many of you here will know, cluster API has become the de facto way to interact with Kubernetes clusters. And we wanted to make sure that those people who wanted to use think about to get them from bare metal all the way up to Kubernetes could do it in a way that they were accustomed to. The work with the cluster API in Q1 of 2020 is what we've been called, what we're calling experimental, which is at the moment it is in a proof of concept stage and it allows you to do some actions, but not all. Early community feedback has been very positive. And because of that, we're going to continue investing in this throughout 2021 so that we can get to parity with other providers so that you could essentially bring your own servers and use Tanker Bell plus the cluster API to be able to create Kubernetes cluster for a bunch of machines that before that didn't yet know what their purpose was going to be. Now, as we're working on Tanker Bell with the community, one of the fantastic things we've seen is that the community has allowed us to see where the pain points are, both getting started, but also with implementing extensions to think about whether it's workflows or working with core products itself. Hook is a drop-in replacement for OZ. So you remember earlier, I spoke about OZ, which is our in-memory operating system where we can run all the actions that allow us to run the workflows to get the server from the state. It's in to the server, sorry, to the state it needs to be in and the results have been incredible. We've made it so that we can now deploy in something like 10% of the time previously, so we're often talking about minutes that deploy new operating systems onto bare metal machines now, including a reboot, including some of the bigger operating systems. Now, you remember earlier when I spoke about workflows. Workflows in Tanker Bell are essentially the actions that Tank runs in OZ to get your server from being a server to something that people can actually use. And those workflows consist of atomic elements called actions, and each action is essentially a docker container that runs something. Now, one of the things that we were very excited about early on was making it so that these actions could be contributed to in the public. And that's why we've now have what we call Action Hub. It's hosted on the CNCF artifact hub. And what this allows people to do is to compose workflows of known public actions. They can also extend them. They can add their own. And this is already opening up a huge amount of energy around building out new actions, improving existing actions. And it's all very similar to the docker hub. You can use one of our actions, you can extend it. I think one of the other areas that's interesting is the firmware updates for servers are often a bit of a headache. And we're going to be looking at how we can work with server manufacturers and OEMs to have them provide their own actions for providing firmware updates for their servers through the Action Hub so that anybody can take advantage of those no matter what surface they're running. Okay, and then last but by no means least, we got into the sandbox in the end of 2020. And one of the things that we're now excited about is moving to incubation within the CNCF. So we're going to be actively exploring that in Q2 2021. The reason why we're doing that is because as we're starting to see more and more partners from the industry coming in and adopting Tinkbell, having that open and clear governance through the CNCF is becoming more and more important. And we want to make sure that we keep stepping through that process so that everybody involved can see that this is a group effort, which it increasingly is. And I would encourage you to go and look at our dev stats. It's been amazing to see the difference between let's say December 2020 and now in the sheer number of contributions we're getting from companies outside of Equinix Metal. And it looks like if we keep tracking the way we're tracking right now, we're going to hit parity with internal external within about two months. And we also have some big announcements around certain logos et cetera that are coming on board. So look forward to that. Okay, so I realized that was a bit of a whistle stop tour. Let me give you some information on how to learn more or to get involved. So first and foremost, we have a community slack at Equinix Metal, slack.econicsmetal.com in the Tinkerbell channel you can sign up. That's where we have a lot of our live conversations around things. We also have community calls. I believe they're by weekly and we have them in US and EU friendly time zones. If you go to the Google group that's listed there and all of this will be sent out on the PDF slides afterwards so you can get the links without them to screenshot this and type it in. You can join those. GitHub, obviously Tinkerbell is the org. You can find all the microservices within their own documentation. Twitter is good place for news. YouTube channel, you will find not only the recordings of our previous community calls but also some of the one-on-ones and other educational content that we've got coming up soon. And then if you want to email us you can email at hello at tinkerbell.org. That goes to me, Dan and a couple of other people but you get a response pretty quickly. And on that last one, tinkerbell.org is the main website. We've done a phenomenal amount of work on documentation in the last few months and it's really in a state now where people can very easily get from zero to hero. So to say, now a lot of people don't have a data center at home. So one of the other things we did is create today local vagrant installation that you can run. You're gonna need a bit of RAM, I think at least 16 gig. But then you can run your own tinkerbell setup where you can have the tinkerbell stack plus some sort of example servers. You can see how the workflows work how you can insert different options there configure some of the way that things are deployed and see how you can also take servers in this case virtual to being useful by installing operating systems onto them. Last but not least, we also have the sandbox. So there are a lot of different microservices within tinkerbell. And it was a little unwieldy for people at first to be able to figure out exactly how to get it all working together. So we introduced the sandbox. The sandbox is a known good state of all the microservices that you can run together to make sure that, you know, they're tested. And I believe this is also where the vagrant is pinned to. So you can try that out locally easily without having to crack out some servers in your bedroom. And that's it. Like I said, you've got all the contact details. If you wanna talk to me personally, it's mark.colmanet.eu.iconix.com. I'm at Mr. and Mr. Coleman on Twitter. And if you go into the aforementioned Equinix Metal Community Slag, you can find me on the mark.metal. Thank you.