 say hello to Joel and Juliano on the talk on BMC firmware management. Hey so thanks for joining us today. This is a bit outside the Linux user space. The basic talk is about how we actually manage a lot of hardware that enters the data center and lifecycle and to do that we built this toolbox and that's what I'm Joel Rubello. I work at booking.com with this guy. Hi Juliano Marchines. Yeah so what we're trying to cover here is there's there's a bunch of challenges when dealing with bare metal at scale and this this introduces like various problems that you try to solve with different kinds of software and the problems that you see with different vendors and so what we're talking about here is the baseboard management controller the toolbox that we built that tries to work with these controllers and just some projects that we ended up building on top of the toolbox. Yeah so some of the challenges is that so now this booking is in a bit of a weird position so we're not like a hard-paced scale like Google and Facebook but we are also not as small companies so we have a huge number of servers but we cannot just start using open compute and we cannot stick to only one vendor so we got in a weird position where we had like four different vendors about 50,000 and growing count of bare metal servers. We would like to treat servers as light bulbs so in the past it wasn't like that so people would treat hardware as pets so for them it would be pretty tricky to move between servers and so on but we were able to move past this this point. We would like also to simplify the vendor validation and adoption right so as I mentioned so we are growing sometimes we need to validate a new vendor and then to adopt so how to define standards to know if this vendor will be good enough to work with us. How to inventorize this data right I will mention a bit later but it gets it gets interesting that sometimes because of network connectivity and so on you might lose hardware inside of your data center. Reliably interact with the hardware that's also interesting because when you have all of these vendors and a huge amount of hardware and you try to do IPMI sometimes it gets funny right so rebooting a machine sounds really simple but can get really complex and also reduce the mono-intervention so our team to manage these bare metal servers has five people so if we just to access a BMC right so because it needs to load the interface and if it's a chassis interface it's a it's a hardware that was developed like 11 years ago so it won't be fast so it can take like 11 15 minutes maybe just to open and access it takes a lot of time. Just to expand on like treat servers as light bulbs the idea is that if something fails we don't want to spend more than even 10 minutes looking at it because this is a lot of hardware. The failure rate of hardware like in the day center in the first year itself is 5% so with that many servers you have a lot of hardware failing so if it fails put it in the broken bucket take care of it later. A bit to show how it works right so you receive a hardware you need to inventorize it first then you keep while people are not using the hardware you keep it always up to date to make sure that when it goes to production it has the latest firmwares and everything else. You install this machine people use it if they don't use this hardware anymore or if it's spent more than x days running production it should be sent back to you to be repurposed and after three to five years we would like to retire this hardware so this is how it works the life cycle of the hardware for us. Yeah so that kind of covers the challenges I hope that gives you like an idea of the environment and the kind of problems we see when dealing with this kind of hardware so the next up we're going to talk about the baseboard management controller so the idea with the baseboard management controller is it like sits alongside your server right or your jboard or jbo for this data center hardware provides out-of-band access and you've probably heard of this a lot lately in the news the chipset there is what like a lot of vendors use like super micro and quanta it's an AST all of them like use software development gets brought by ASpeed which is then modified to display like their own logo and but essentially this is like a software on a chip that has on a single die the flash and VGA video graphics controller and then it has access to all of these PCI devices over the south bridge and it can also interface with the the nick itself and send packets by itself and because it has like this access to all of these components you when machine itself is not responding or maybe the data network you through the out-of-band connection you can actually do a whole bunch of stuff so and since it has access to all of this hardware you can also invent rise the hardware the whatever is on on the server itself and you can trigger various hardware actions like in here as a last resort for things yeah and and how to access this thing right so we have IPMI that's it's kind of common across all the vendors and IPMI so it has evolved over time but not much so if you go and you check for instance the IPMI to common the source code it's pretty interesting and for a lot of things it needs to do magic right because although is a standard it doesn't mean that vendors will follow it properly right and a lot of things that you need to get data via PMI there will be raw commands right so and then it gets a bit weird that you need to know all the raw commands to get information for specific things for every single vendor these are the other issues with IPMI and I hope that in the future we don't need to use anymore that's the hope with Redfish I'll talk about it later there is SSH which is better for our case our cases but there are some actions that for instance if you are using UFI with a specific vendor and you trigger PXC and reboot through SMASH CLP that SSH it won't work so you need to use IPMI there is the web interface as I mentioned so there are different versions and vendors and generations right so now it's getting better and better but if you think that the hardware life cycle is like from three to five years we always need to manage things that are not as awesome and deal with buggy and slow hardware from the past those Redfish that's the future but not just yet so it's much better than when we started BMC Lib BMC Lib actually started using Redfish and then we decided not to use it for now because it wasn't really adopted by the vendors and there is the undocumented APIs that we use it a lot so for instance the same APIs that are called by the web interface we ended up using these APIs retrieve data right yep and about the BMC to boss right the library and tools so the library so what what we decided to do is that we started with Redfish and it was nice and so on and one vendor had support for almost everything but then we couldn't get the license of the BMC of this vendor and then we saw that on the Redfish is back there wasn't definition for licensing okay cool so then we can actually fall back this part and then we check another vendor okay so and then this vendor doesn't expose the hostname but okay cool so then so the hostname I will fall back to the to the old APIs cool so now I'm retrieving the network interfaces oh this vendor doesn't support the network interfaces yet and okay cool so then Redfish it will be a thing but at least one year and a half ago it wasn't there so then with what we decided so let's let's make a library that we'll use the best available connection method to do what you need and we will ensure that regardless of the vendor that you call the behavior will be the same because for instance some vendors you need to reboot and PXC and others you need to PXC and reboot but then we ensure that when you ask something like that we do in the proper way so so that it works so this is an interface right so we have different providers for each interface so when we want to integrate a vendor this will be this will be a common interface that we will use so for instance for the BMC interface we have configure we have BMC collection and here we have PXC ones right so how how does the providers look like so these are the ones that we have in our master branch so we have like super micro X we have a dummy provider for tests we have ILO we have iDRAC9 iDRAC8 and for CMC that the chassis management controller we have for M1000E and C7000 this is also interesting because sometimes if you are using a chassis it's much more interesting to trigger actions through the chassis than the MC itself because they become more reliable so things if you need to if you are using pizza boxes the way for you to remove machines from from the power is that you need to go there and pull the machine right if you have a chassis you can actually do a virtual power receipt so actually remove the power and put it back on and this is actually get pretty interesting for us so here's an example of PXC once with iDRAC9 so with iDRAC we saw that we can actually use the SSH and it works all the time so what we do is that we issue an SSH login we connect to the iDRAC we should we issue the command to to set the PXC and then we put it on and if everything works with PowerCycle and then this machine will reboot properly right and then with HP so with HP we saw that if you are using UFI with Gen9 servers and actually you cannot you cannot use the SSH and there is a bug also that what happens is that you cannot just PXC and PowerCycle the machine because it will fail so you need to power on the power on action will fail but if you don't do it it won't work right so then so actually so this this type of things what we did is that we encapsulated that so that you don't need to deal with that so we used to have a code base in as part of our CMDB that would need to know all of these things right and it doesn't make sense to have it as part of the CMDB yeah no so the other bunch of functions that is available in the BMC interface is the configure actions the configure actions that are actually used to manage configuration on the BMCs so why would you really want to do this right well mainly passwords so we had machines we had machines that were running for like a long time with ten-year-old passwords and also like accessing the BMC interface itself is it doesn't have a certificate and so it's it's not trivial to have all of this done in a standardized way across multiple vendors so okay so let's have this into interface that you declare a bunch of functions that should be implemented per vendor so for example like this one is the ILO the HP hardware essentially you can generate a CSR on the on the device that is the BMC and you sign that certificate and then you upload it back onto the BMC so in HP it's it's very simple you just fetch the CSR and you post the resulting sign certificate but with super micro it's different because super micro actually does not support generating a CSR on the device you have to generate the CSR yourself sign it and then upload so the whole idea with this these examples are that basically that we encapsulate all of these different differences into these functions and yeah you don't have to worry about it yep and what we support currently right so BMC Libs support data collection for Dell HP and super micro ILO 3 actually I would like to remove it because it's too old and we don't have this hardware anymore but it's it's there and for configuration support we also we we do it across all of them yeah and with some of the redfish support this will extend to most of BMC's but yes at some point the tools so after we built the BMC library the idea came of building tools with these things right so the first tool actually so the the tool that inspired BMC library it's Dora at the time it was called Thermalnator so it was a tool to collect thermal and power data from from the data center to understand how much heat we were generating across machines and then from that we saw okay so this is actually interesting right so if we can collect that what else can we know right and if you look at the payload right so there are some some interesting stuff there we have like some faulty slots so it means that one Dora connects to a chassis and tries to read the information if we have blades inside of the chassis that are failing we will have an array with the number of the slot so then for our data center operation team they don't need to keep searching for things inside of the data center they have a list with everything that should be fixed so these also collect the information about the blades fans nicks PSUs and storage place that we have there right so then it actually it summarized issues things that we know as well right so for instance it has a status so you shouldn't try to provision a machine if you know that this machine is broken right because it means that it would generate toil for us then we can use the the status information from Dora to know if this hardware is usable or not another thing that it helped us so imagine that depending on how your a cabling is done at the data center for us is that we have two interfaces right so one for data and another one for our off-band to make sure that these things are isolated and we have a chassis so the chassis receives one cable from Dora of band and it spreads on 16 connections for every single BMC that we have so the BMC is always reliable so we know that if the BMC connection is there we'll be able to collect information about this machine and after we deployed Dora we found out like old hardware that has been bought in like quite few years back and there was like wrongly cabling on data and this hardware was was missing in the data center so this also helped to generate like a dynamic inventory of what you bought and what you should have received so now we use also that information to know that the deliveries that we get from vendors are actually the things that we are buying. Also that Dora is able to scan the whole data center in like a few seconds. So the other thing is that we used to have a similar process but this process would take a week to scan the data center after we moved to Dora it started taking six minutes so it's much faster and then because some vendors are slow and so on nowadays we can do in 24 minutes but it's like 24 minutes we scan about 50,000 assets. Yeah the other tool that is part of the BMC toolbox is BMC bottler. The idea was that this will be like a tool focused on just configuration for BMCs like I mentioned earlier but in this case it essentially just retrieves data from the inventory that is Dora and starts to attempt to configure those BMCs. So in this way we are able to like try to maintain a consistent state across all basewood management controllers and it can do a whole bunch of configuration like you saw in the interface itself. Usually it's not that you cannot configure a BMC but if you think about the installation life cycle what you would do is that every time that these machine probes are installing the first time you set out the configs right and then imagine that you need to update a C name for your C log server or even you would like to change the password. To move it backwards it becomes tricky with every single vendor and now with bottler it becomes also much simpler. Another tool that we built on top of it it's actor. So the idea of actor is to be able to trigger actions across these BMCs and CMCs. So if you want to PXC a machine so instead of using IPMI or calling BMC directly actor is both an API pretty simple that you send a payload telling what you want to do and for instance if you would like to do a sleep right so you want to power cycle sleep for 30 minutes and power cycle again you can actually send this as a job and it will take care of it. Another useful thing that it can do for you is to take screenshots right so if you want to take a screenshot of the BMC to understand what the state of that machine it can do that for you. Yep so we covered the tools and the control itself and this is just something that we want to share that we did in the hackathon it was an interesting idea so you guys might find it. So you saw this picture earlier but basically it's machine life cycle right you have a machine transitioning between these states and you want to make sure that this transition is actually successful or you want to know that if the transition was not successful which state machine is in and we're talking about 50,000 machines here like and we reboot or install a hundred or two hundred other. So our flow to install machines is that now we are in a process of making sure that machines can only live three weeks so sometimes we have like from 200 to 1000 and to 1500 machines being reinstalled and reboot actions it's a lot because yeah so hardware is not reliable so one thing that you do a lot is reboot machines. So the idea was that could we grab the screenshots and try to classify what state the machine is in because to like a sysadmin I can look at that screen and I know that is the the bioscreen and when we update firmware on these boxes they would a lot of them would get stuck like that that Dell screen there is it being stuck on that so it takes a lot of time to like look at each box and why so can we automate this right so if you go back so there's like different stages there's the OS install like you could probably identify that for the so send us OS install there's also like the there's a pixie boot screen there and there's a grub screen so these are all different stages the boxes would fail in so we we grabbed all of these screenshots and tagged them and basically we trained this model completely like you know I don't know what I'm doing but here let's try and oddly enough so we got actor to like grab the screenshot run it through this train model and then give us a probability of what where this machine is stuck in or what is it stuck in and it was quite successful like there was a 97% success rate across like close to 200 different checks that we did so that was quite high and it was just fun to build this yeah it's interesting because at the beginning we didn't believe that it will work that well but it predicted a lot right so it actually so if you think that 20 minutes so if you if you can so if like 12 or 15 of these and get stuck during the day so if you cut 97% oh it's a lot of time that we can actually do something interesting yeah yeah that's cool and what are the next steps for for BMC to box so one thing that we would like to do so now we have a lot of API service and services right but these work for us but if people want to use that it's not crumbles on but it's a bit difficult right because you have all these these different components that you will need to understand every single one of them to start using so what we want to do is act to create a CLI that allows you to interact with all of the services then it becomes easier to use also is the same thing for the front end right so one thing that's useful is that if you want to generate a report of the number of machines that you have with faulty PS use or the number of machines that you see that have odd number of memory right so this is a easy one to to see that you have a fater right so if you have odd odd not number of memory it's very likely that there is one socket or one die that's failing right and also integrate Redfish for generic vendor support using GoFish so we started working on GoFish so this is actually a pretty nice Go library actually it was the best one and the first for Go that can parse decently all data format and we implemented a lot of the features that are missing so what we will do is that we will have a generic driver for for BMC Lib but it doesn't mean that it's gonna work across all of the vendors so for instance to give an idea right so when you go to to the iDirect 8 the Redfish version that's there is view 102 so with this version you cannot collect disks inside of a machine for instance okay so then for this type of hardware we cannot we can use Redfish right for a Direct 9 it's a bit better and it can actually tell you if your link of the data network is on or off so it's more useful but it also has some missing bits same thing with ILO, Supermico, Quanta and so on one thing that happens is that I think as soon as vendors evolve and starting use more Redfish it will become like a default standard but it's I think that we need to keep the drivers for all the other vendors forever. The cycle at which hardware vendors release hardware versus DMTF releasing Redfish specs is different like you get specs more often and hardware vendors take a long time to implement that spec so they are basically behind all the time and they are also making mistakes in the implementation or implementing it incorrectly so hopefully we get to a stage where has constant some constant in all of them and then it's usable. Yeah but what we will do is that every time that we integrate someone so if something doesn't work with the generic Redfish we will have a way to to retrieve or set this information as required so this is a good example with the certificate management and not every single vendor supports that but we can always do it through the through the APIs that they call on the web interface. Yeah and how can you help right so it's open source please use review submit bug reports if you have vendors or hardware that you use and you'd like to contribute it's actually it's not difficult if you follow the interface there are tests to help you to understand and we are there to help as well and getting touch if you are if you work with ironic mass or just a bunch of bare metal servers. Yeah so we've looked at some of the implementations that ironic and mass and these guys use to interface with the BMC's and they even foreman and so they have a Redfish.py and then it's just calling an IPMN command or it's doing a Redfish call but essentially there is a lot of context that is involved when you're interacting with BMC's if you want your action to be successful and that's why I think that it makes sense to have a separate library for this. Yeah in a summary right so why we started with the BMC to box right and put the message that way we want to pass so the idea is to simplify management and adoption of multivander hardware ensure that you have the same behavior and state to provided interfaces reduce the bare metal troubleshooting time more reliable bare metal interaction and single view of hardware health across your fleet. Yeah there's one tool which we didn't mention here is we have something known as BMC FW update we've not completely tested it in all hardware we do use it internally but because we need more feedback and the only way you can actually test firmware updates is by having access to the hardware so if you if there's people interested then have a look at the toolbox there's a link and maybe give us your input on that. So yeah do you have any questions? Thank you guys. So I developed some stuff in iron regarding this I spent like two years doing this and I was like a disaster after this so I had one very specific use case which was driving me crazy so I had 30,000 super micro boxes and it wasn't x10 and what was happening in my case throwing random numbers around up from 10 to 20% of the BMC controllers were just going down randomly when issuing a lot of commands into them because all the stuff I was doing to go through this workflow it was using purely IPMI tool so no SSH no anything more fancy and then you know you have a workflow which issues 10 IPMI 2 commands after the second one the BMC just dies in the box and this is something so fundamental that once it dies you cannot revive it without going there physically sending someone physically and you know if you have 10 boxes it's pretty easy but if 10% out of your 30 or 50k fleet starts behaving like that what do you do what what do you do in your scenario in your setup because I never found a solution for this and for me I still keep this vendor on the list you know kind of if I can decide not to buy them I will not. So when the hardware yes there is like bugs all over place we found some you send packets to HP ILOs and they would just drop off the network completely so this it's kind of tricky to say that which vendor you should pick because essentially all of them have bugs and this tooling tries to abstract away those issues but with Supermicro I think with most vendors you need to like have this constant relationship with them where you give them feedback and try to get that stuff fixed and you so some vendors have been trying to use our toolbox to validate the information they provide through the BMC so instead of them giving us the tools maybe we build the tools and we tell them to use this because it's more standard but I don't think I answer your question of like which one you should pick and that's but one so the library in most of the code I guess just a set of random rules how to behave you're having a specific vendor yeah do you think we as community of people doing exactly this stuff from that side so not from the vendor side have some means to impose more standardization of vendors because they do not care at all that APMI should be standardized they all do whatever they want and they they are in this position up whatever happens anyway you have to buy us so can we do something so they start behaving in a more mature way this this is what we are we are trying to do with the relationship with them it's always feedback about the issues and so on but if as you said right so if you check the code from IPMI to you see that it's tricky because there are some things that it's fixed in the code because it's easier so this is how we do as well so we keep reporting bugs and a lot of them get fixed but for instance they release cycles like three six months right so you're not gonna wait like six months for the bug to get fixed right so you need to keep installing these machines so what we do is that we create the rock around and make it work as soon as we get the the fix we change the behavior but I think that keep feeding back this information it's essential right because if they don't know they won't fix it yeah so if you buy a lot of hardware you have influence on who you're dealing with and how you what you're negotiating for and this is what we see when we have meetings with our vendors we bring all of this up all of the bugs that we faced and then that kind of gets them to like get rolling and fix all of these things and then yeah yeah but it's tricky I am Ed from Packhead we have a lot of bare metal hardware so this is near and dear to my heart we've been working one level down in replacing the BMC software with tools like open BMC have you met with that community to figure out how your tool set can validate their work and vice versa and if not let's talk about it yes so we actually so we were we were doing a proof of concept with with OCP and one of the things that we would like very much it's actually to to be able to contribute to open BMC because we wouldn't need to wait for people to fix the bugs right we couldn't fix we could fix them ourselves but because of the number so we are now we are not as big to start buying these things right so we are not building data centers so we have we rent space but it's not designed for this type of hardware so there are constraints right for instance so during our tests we we tried to to put a rack and it was too heavy it wouldn't pass through the door so with the hardware that we can actually use open BMC and we would be more than happy to use it because we can manage that and in our domain but for some of the hardware we are not able to do that just as a follow-up the other piece of things we're working on is a project called open 19 which takes a lot of the ideas from OCP but puts them in a standard 19 a track format which by design fits in the same sort of rented data center space that we have access to that you have access to the world of standard data center shaped stuff not purpose built data centers so we'll talk afterwards also to answer one of the first question was that so are we integrating with open BMC is what I saw is open BMC is implementing redfish as a service so depending on which spec they implement we will have support for them yeah yeah I just want to iterate I think you spent so much time working around the box of the vendor firmware one should really push to get open BMC and if you buy a lot of service and you have so much power to that the vendors already talk to you that they fix firmware box I think you everybody who buys service here should in the request for offers should put in there that they want open BMC or you BMC because in the end the as you said that the BMC is a fully fledged operating system and actually it's better to standardize it also and I know Facebook with their open compute stuff they I think they actually run CentOS on the BMC and then they can use all their tools and uses quite normally so yeah we should all put pressure on the vendors there's no reason to have different BMC firmware set all okay hey so just last week I was looking at another library for managing redfish it's called go redfish from Nordics which is some kind of open source foundation so take a look at that it looks cleaner than the go fish okay and I have a question about the screenshots so how do you take a screenshot out of band I know you can take a serial over land but not the graphics or so the vendors expose an API which they use to show the screen preview on the BMC's interface so it's a vendor specific yes yes it's all vendor specific and because we have BMC we can abstract so this is so for instance right so if you look at the code of BMC Lib because we couldn't get these things throughout fish right so then we reverse engineer right so so when you open this page so how does this get this screenshot where it comes from oh this is in point okay cool so then and then you see that oh actually to retrieve this type of hardware I need to set these X headers okay cool so we do that so with newer generations and talking with the vendors things get better but everything that sold they won't fix right so for instance so there is there's another thing that we had to implement because we would we would like to use LDAP but if you just check LDAP configuration across all these vendors is a nightmare so it's actually it's simpler to have a proxy right so you you make the BMC call your LDAP proxy and you do what should be done properly because your life's gonna be easier and this is what we did some there some of them don't even implement LDAP tree right so it's a software that was written like 12 years ago so how they they say so this thing it's going end-of-life next year so we can we can make it better in the next one but this one we cannot fix and maybe you should request the redfish had to have the option of requesting a snap screenshot from the VMC yeah so I think I've seen it in the OEM section for HPs I don't know if all of the other vendors should be standard yeah the same thing with the license yeah yeah cool thanks for joining us thank you