 So to get an idea of how many of you here have an idea of what BMCs are, or Baseboard Management Controllers are, cool, that's quite a bunch. So that's nice. So me and Juliana work at booking.com and we deal with a lot of bare metal servers. I work part of the team that helps provisioning these bare metal servers and getting them into production. Yeah. Yeah, and our presentation is BMC-Lib and the idea is to have a obstruction layer between all the different BMC vendors. Yeah, so to simplify things and just make it accessible, the idea of a BMC is that it's this device that sits alongside your server in computational power. It's just equivalent to like a Raspberry Pi or like a Sword Auto or something like that. And you can find them, they run off this system out chip which basically has various computer components on a single die. And you find these on servers and chassis and switches and jboards and, yeah, yeah. And we have different flavors with a single function, right? So the idea of the BMC is to give you access to manage this device outside of your data and network interface, right? So out of band access to server switch jboards, last resource, the Pyrocycle Reboot and hard reset, IPMI, VMC, IKVA, zero console, inventor information, hardware locks, and it's now becoming a bit more important with ILO 5 because it becomes a rule of trust, right? So now also you can sign your firmware and make sure that it's only running an accepted firmware for that given hardware. Yeah. So to look at like the most common BMC chip is, this is the, from AS, AS speed. This one runs in a lot of Submicro and Quanda servers and it's, it basically has very simple functions like it's got 400 mega CPU and just some RAM and some inbuilt VGA capabilities and it also has its own nick which provides like out of band access. So, but what's special about this thing is that it has access to like the main board. It's able to speak over the PCIe bus to a lot of components and then I2C and SPI and it also is able to interact with the network cards directly so it can receive packets and send packets. Yeah, and about BMC, right? So what are the standards? So everyone knows about our PMI. It's old, it's useful, but it's buggy. And although it's a standard, not all of the vendors follow the same standards, right? So you always have extensions and so on. It also means that you will never have the same behavior when you're calling things across different vendors. And there is SSH, right? So we will talk about it later as well. So when you talk about SSH to BMCs, there are no standards. So each vendor can implement a function with whatever name they decide and you will need to figure out what you need for each one of these vendors. Web interfaces, they are as low and buggy. So for instance, one of the vendors that we were testing, when you try to open the remote console, if you are using Firefox, it would destroy your machine. It would soar up to the hell and make you reboot. So this gets interesting, right? And now there is something that's coming that it would help, right? So it's like an improvement over IPMI, that's right feature. So the idea is awesome, but we get in the same issues, right? So some of the extensions, you will need to go through OEM because, so one example, storage, right? So if you are using IPMI, the storage extension, but your vendor doesn't offer the standard one, right? So they will have something inside of OEM and you will need to know how to call for each one of the vendors that you have. Things about the, I read FISH, that's nice because it's an API, but you know, BMC is something really simple and when you think about using your data for that, it's a bit over queue. As we mentioned, a reliable body implementation, so I can give an example that there is a vendor that the standard definition tells you that you need to have the host name, right? A part of your right FISH payload. One of the vendors, they just don't add that, right? So then you cannot have a reliable information across all of them. Yeah, so what is the problem that we are actually talking about here? Like it's when you are working at scale and you're trying to work with a whole lot of paymental servers and not just like one vendor, but multiple vendors, multiple generations. The problem itself starts to magnify and so we try to, like the team is basically a bunch of engineers and there's so many servers, how do you deal with so many servers and we really want to treat these servers as light bulbs, essentially, or as cattle, basically. You plug them in, they should work and then you want to be able to... Reliable provision a machine, right? So when you tell a machine that you want to PXE, it actually should PXE. If you want to reboot a machine, it should reboot property, right? You want to be able to get your inventory property, right? So these are things that they sound really simple, but they not always have happened. Another thing is that to manage BMC configuration, right? So we have configuration management for a bunch of different operating systems, but there was nothing for BMC's available. Yeah, and yeah, when you want to diagnose like hardware problems, most of these are logged to an area where the BMC has access and how do you do all of this at scale, right? Without much manual intervention. Which is why we actually... Juliana here started writing a tool which gave birth to BMC Lib and currently it supports all of these different Wender hardware and it abstracts away, like for example, for configuration, it abstracts away user account creations, this log, NTP, LAP, and all of these into a single configuration bunch of methods. And what BMC Lib is, right? So we will give some examples of applications written on top of BMC, but BMC Libs gives this idea, right? So it doesn't matter the vendor that you use, there are a set of expected actions that you will have for each one of them, right? So you want to get how much memory you have for a given hardware, you want to be able to reboot properly, and so on. So one example of rebooting a machine is that depending on the vendor, and if you are using UFI, you ask to PXC and reboot, and it won't work because for this vendor, you need to ask to PXC, reboot, and power on, even though this machine is already powered on. This is a bug, right? But then if you don't have a library that deals with that, you will need to always have exceptions for each one of the vendors that you need to deal with. Yeah. So one of the tools that we built is called PNC Butler. It actually basically reads a bunch of configuration declarations and looks at the inventory, and then applies this configuration to all the assets in the inventory. It basically does some execution of actions as well, but it currently is in production and is able to manage a lot of PNCs. So this is Dora, is a same inventory and explorer. So this actually is the tool that gave birth to BMC Lib. So we wrote this tool to have dynamic data center inventory of the hardware that we purchased. And after playing with Dora, we noticed that actually all of these BMC actions, they could become useful. And from there we extracted the core components and we moved that to BMC Lib. So this payload that we have at the right, it gives you an example of the type of data that we can collect in relationship, right? So if your machine has NICs, you can call in point slash NICs and so on. And it will give back all the information that you have for that type of hardware. So it could be like HP device or Dell device. It could be even Supermicro and the information, the inventory information is just presented in one standard way. Yeah. Another example is actor. So we couldn't find a better logo. So this API, it works like a proxy, right? So let's suppose you call the API, the session point and the BMC or the chassis. It will connect and trigger the action that you expect, right? One of the useful things that we added there is that it can take a screenshot of the BMC and gives you back. It also makes possible to build more interesting applications. Yeah, so this was, this is one of the things we actually built on top of BMC Lib, it was possible because we abstract away all these different BMC. So the BMC exposes like the screen preview of the console, which is actually just around 300 by 300 pixels in size. You can't make any information, like you can't read it, you can't run it through an OCR and try to understand what is on the screen. But if a sysadmin actually looks at the screen, you're able to judge and understand, okay, this is stuck in like installing CentOS or it's stuck in the BIOS. You can't read stuff on the screen. But so we thought that, okay, since we can capture images, what if we could run it through, well, first train a model, retrain a model and get it to recognize what is on the screen. We didn't exactly want to know what exact screen, but we wanted to know what the state of machine is like at a given time. And this was possible because we were able to basically get actor to speak to BMC Lib, fetch the screenshot, run it through image classification. And then we, it's not very clear here, but it gives you the probability of what state of machine is it in, like, is it installing the OS? Is it in the BIOS or is it stuck somewhere else? So based on a bunch of labels, yeah. A good example of that. So we deal with a bunch of servers, right? Just to login to the BMC and to troubleshoot what happens, it takes from six to eight minutes, right? Because you need to login, you need to download the Java thing, you will need to open and check. And this, it can shows in like about five seconds if our issue is because someone actually didn't plug in the machine properly inside of the data center. The way that we cable things, we always know that the out-of-band will work, but data might not be connected properly. And with the OCR, it will tell exactly that there's no media detected. So we know already that someone needs to go and check the fiscal connection. Yeah, that basically ends like what we wanted to show, but the takeaways here are like BMC-Lib tries to abstract away like when the BMC is into a single API. And if you are looking to invent, try to configure update servers, you need to check out the BMC toolbox. We're trying to get more people to actually who are working with hardware to see if we can add support for their hardware. The idea is to create an open-source library that will force vendors to support that open-source library instead of them forcing us into when the lock-in situation is likely open-managed. But yeah, also that BMCs are a fundamental part of a server's lifecycle. So we must manage them by now. Shorten and repeat. Okay, so the question is that if we expose a different API or if we just re-expose Redfish, so if our tooling is close to Redfish. So what we do is that we try to find a better way, right? So for instance, if you want to reboot a machine and we know that it's much more reliable to do it over SSH than over IPMI, we'll do it over SSH and we'll ensure that it happens, right? So in the end, the API that we expose is not like Redfish. We just expose a library that's written in Go and you'll have these endpoints to call. But what we do is that we choose the better way to get the data that you require. It doesn't matter if it's exposed through IPMI, Redfish, or some HPRC endpoints. Yeah, so to just add to that BMC Lib just exposes a bunch of functions that you call, say power off and that works across multiple vendors in a consistent way. Thank you again.