 No, we didn't go to the HB until last night. Testing, okay, cool. Sorry for the late start, everybody, thank you for coming. So, my name is Peter Martini, I work in operations tooling at Bloomberg, where one of my responsibilities is helping to automate deploying infrastructure. I'm also a core Pearl Dev, but please don't hold that against me. One of the perks of my position was that I got to work on the open compute hardware when we first got that in. So it's always kind of fun for me getting to play with new toys and see how all these things work. I also work closely with our in-house open stack team. We are doing our best to contribute back to the community. We post everything we do online. It's on github.com slash Bloomberg slash Chef PCPC. As you can kind of assume from the name, we build and deploy our stack with Chef and publish our recipes for anyone else who wants to use them. Or contribute back, yeah. So, before we can start chefing the servers, we need to bootstrap them. So, that's also something that's in our repo. We have both Chef recipes for actually deploying the servers, as well as a bunch of scripts to stand up the very first node. So, we have multiple clusters, they're independent, and each one of them needs to have a bootstrap before we can do anything else. So, this is open stack. And since ironic is part of open stack, I wanted to see if we could take advantage of ironic to simplify some of our bootstrapping issues and use that to deploy the bare metal that underlies the rest of our stack. And of course, ironic requires open stack to run. So, to bootstrap from ironic is a bit upside down. But using a normal bootstrap to get a working cluster and then use ironic to expand or create new clusters makes perfect sense and makes our lives a lot easier. And, yeah. So, using ironic is easy. It integrates with Nova, so you treat it like any other virtual machine. But they're not virtual machines. So, the very first thing you have to do is figure out how to manage the hardware in the first place. So, let's walk through the life cycle of using that hardware, getting the hardware in, testing it out, you know, open to pizza, brand new thing. So, it needs some testing to verify it. And for those who aren't aware of open compute, it's a relatively young project to open source hardware design. So, that doesn't, that means taking existing hardware, existing specs, but creating new recipes from them to make industry standard configs. It's also, because we're using, because it uses industry standard, it's something of a lowest common denominator. What ends up in open compute is what you'll see everywhere else. So, it's actually a perfect fit for ironic, because ironic uses industry standard protocols by default, and that's kind of all we have available on open compute. So, let's meet the servers. Specifically, open compute is a family of servers. The ones I'm talking about now are the Winterfell. This may look, for those of you who are familiar with servers, this may look a little long and skinny. There's actually three of them side by side in the rack. So, the front is your I.O., your system boards, CPUs, the connector in the back to a bus bar. So, all of the power supplies and cooling for the most part is outside of the server itself. It's driven from the rack. We also have open vault disk trays. So, part of BCPC is using Seth for our storage pool. And for Seth, we need a lot of local disks. This is the solution for that. A 2U tray with 15 disks in each U, one U to a server. So, plenty of disks to work with. And this is the actual rack itself. So, I actually took this picture from the spec. I kind of wish I had something that was live, it's all blinking lights. All you're looking at when you look at this is no plastic. You are looking at the bare metal. You can look inside of this thing and see it all running vanity-free. And the back is even more interesting because the power and the cooling are part of the rack, really, not the server itself. So, that has some interesting implications for Ironic. Because Ironic, one of the things you can do with Ironic is use IPMI to pull sensor data and use that for solometer. But with the open rack, since the power and fans are largely outside of the servers, there's not so much to pull. So, walking through our example, we've got this server on the floor. I want to boot this. I need to get in and poke around at the server and see how to set this up to deploy. So, I go to the server and I want to pop in a CD to boot from CD. No CD. Maybe I can boot from USB. Okay, I'll plug in the USB and now I'm going to plug in the monitor, see what comes up there. No monitor. Okay, I'm a Unix guy anyway. I'll go and I'll hook in the console port and see what happens there. But I don't see a console port there. Wait, there it is. There's a console port on my desk. The console port for this particular configuration is a separate debug card. It's detachable, hot-pluggable from the hardware. So, if you want a physical console, you have to take this little chip, walk over to the rack, plug it in, then plug in a cable into a mini USB to RS32 serial console to get the console. I don't want to do that. So, I know we have IPMI. I have the network cables already hooked up for this. So, let's boot it. DHCP will give me an address and I can use IPMI to get in and work with the server then. So, plug it in, turn it on, and something funny. I look at the switch and I see two MAC addresses showing up on the switch. It's a little weird. Why do I have two MAC addresses? Well, the first MAC address is for the BMC. Yeah, out-of-band controller. So, I can use IPMI to target it. The second MAC address is the host itself. So, I'm watching the host use the same physical nick to try and pixie boot the server. Yeah. That would mean that every host is on the same physical link and could potentially talk from one host to all the other BMCs on the same network. I don't know how comfortable I am with that. So, our network design has to be a lot more paranoid, do a lot more separation to make sure that the BMCs can't talk to the host because only my admin should be talking to the BMCs. But the network management is not my realm, so I'll kind of glide past that. Now, I mentioned we have IPMI. So, great. We can do a lot with IPMI. It's standard. I can read this back. I know how to use that. But it's also a lot friendlier if I can use other tools to manage this. So, let's check and see if maybe we have this. SSH? No. No command line for me. HTTPS? Web GUI? No, nothing there. Maybe it's just HTTP. Nope. Telling? Nah, not even that. And SNMP, if I wanted to treat it like a network device, nah. So, the only way we can manage this is IPMI, which fortunately is exactly how Ironic is designed. And technically, it's not IPMI, it's DCMI. So, DCMI is the data center management interface. It's basically DCMI 1.5 is IPMI 2.0 with a little bit of extra sugar on top. That's important to understand because if you're using IPMI tool to manage this, IPMI tool gives you two options for network interfaces. IPMI 1.5 and IPMI 2.0. So, DCMI 1.5 is actually IPMI 2.0 when you want to make sure you're using IPMI 2.0 for everything you're doing, for two main reasons. The number one reason, the more practical reason is IPMI 2.0 is when serial over LAN was added. So, if you're not using IPMI 2.0 and DCMI, then you don't get serial. So, we can't proceed. The number two reason is IPMI 2.0 is where encryption was added. So, a whole bunch of safer suites to allow better authentication and encryption of the traffic to the BMC. Which hopefully isn't an issue because I shouldn't have customers or anything else seeing the traffic anyway, but still, better safe than sorry. Except, there's a thing called Cypher Zero. If any of you are dealing with IPMIs right now, Cypher Zero means no Cypher. No authentication, no encryption. Anyone who uses Cypher Zero, if they know the right username, gets full admin to your box. So, yes, I like having BMCs isolated from the network. Now, if any of you have ever tried this or had to worry about this, this is an example of the IPMI tool command to check to make sure Cypher Zero is... Well, this would work if Cypher Zero is supported. It shouldn't be. I hope it's not. If you have it enabled anywhere, fix it, please. IPMI tool, we specify the user. Any password because it's ignored. I like FluffyWebbit. Cypher Zero hosts whatever iLanPlus means the IPMI 2.0 interface. And MCInfo is just Management Controller Info, a test command to run and make sure nothing actually happens. So, testing new hardware, I tried this on the Winterfell. It happily told me to go away. So, yay. For the rest of this talk, I'm going to reference IPMI tool multiple times. So, for convenience, I'm just going to pretend... Yeah, use this shell variable. Always use the user. The minus capital E means get the password from an environment variable, because that just makes it easier for me. IPMI 2.0 and specify the host. And I like security stuff. So, for your reference, these are the three ciphers that DCMI specifies that everything must have. It still lets you use the additional ciphers from IPMI 2.0, but they're all optional. This is really the only three you want. So, how do you check what you're actually supporting? This is an example of the command to see what ciphers are there. IPMI tool, channel, get ciphers, IPMI 1. It looks cryptic and it is cryptic. And that's kind of the point of why I want to show you these, because IPMI is a very... It's a difficult protocol to work with. Once you get familiar with it, it gets easier, but if you had any instances of trying to troubleshoot Ironic or really just troubleshoot the servers, to understand what IPMI is doing under the hood. So, this one in particular, channel 1 is the network session, and I'm asking IPMI... IPMI ciphers are supported on channel 1, on the network session. And it gives me back the same three I listed before. So, I'm happy. All right. I've got IPMI working. I had the USB stick plugged in there. I had the server powered on. So, let's see if it actually booted from the USB, now that I can get the console. SOLActivate turns on the serial overland. I'm very familiar with working with these commands. I believe Ironic can support doing the console stuff itself, but it's still admin only, and if it's admin only, by the time you need this, it may be better to just work on it directly yourself anyway. So, one way or the other. Kind of depends on personal preference. So, anyway, I turned on the serial. I thought I was booting, and I look, and I see nothing. So... Okay, what next? I know that I support Pixi, because I saw the host MAC address show up on the switch. So, let's go back and... Well, actually, Pixi boot this thing. It gives me a lot more information to do Pixi. I can see any network request going over and figure out where it stops, which means bootstrapping a Pixi server again. So, back to the drawing board. I build the Pixi server, power off the chassis, check the status, and it tells me the chassis power is on. BMCs are very tiny, very... They're not designed for server-class reliability. They're very important? Well, I won't... They can be difficult to work with. I'll say that. So, this tells me it's on. I'm testing the server anyway, trying to figure out how well it works, power off, power off. I may as well just keep hitting it until I see it turn off. Eventually, I saw it go and say, you can't turn this off, the state is already off. So, Ironic obviously does that a lot cleaner. It has to handle the fact that powering off may take a while. I'm not suggesting to power off over and over again, but wait, and the driver does all that stuff for you. But while I'm testing it out, that's what Ironic is doing. So, now to Pixie Boot. I know that this is Pixie Booting by default, because I saw the Mac come up, but if I wanted to force it, I'd have to... I can try to set the boot device to Pixie, or set boot programs, set boot flag to force Pixie. It's... I honestly haven't gotten an IPMI tool to set Pixie to work. While I've been here, I've heard that there is a trick to it. I've sent some raw bytes to the actual IPMI controller. Yes, yes, yes, yes, fun. So, IPMI can be difficult to work with. So, and of course, this is why we have Ironic to manage it for us. So, now I've got it configured, I power on, I activate, and I see this stuff coming out of the console. That's the very first thing. Those are the BIOS bootcodes. So, we'll pretend those are hex codes. It's showing me the same thing it would show on that little debug header. A little numeric... two-character numeric representation of where this thing is and how it's booting. So, yeah. Same thing I would see if I plugged in that little thing. All right, now, it's booting, it's past the BIOS, it's starting to boot, and I get my TFTP, my DHCP, TFTP, it starts to boot, and then I lose the console. Oh, sorry, I missed a step. I'm watching it boot, and it asks you to press F whatever to continue, F11 to select boot device F12 to Pixi. And I can't send the F keys through my keyboard, through my session. It's just not going over the serial overland. So, there's a neat little trick there. Escape 2 for F2, Escape Shift 1 for F11, Escape Shift 2 for F12. Little... It's quirks of the hardware. So, it's the stuff that you care about, but don't want to get too deep into. Anyway, back to my console problem. The versions I was testing for sent us, I had to add a console equals TTYS1. All of the output from the console is going into the second console device, not the first one. For Ubuntu, it was the fifth console device. Of course, that may change depending on your kernel, may change depending on your implementation. It's... The only thing that's a pretty good bet is it's probably not going to be on TTYS0 unless you go and configure that stuff yourself. And that's kind of the point. But the rest of the build was normal, so we'll just skip ahead. So, now I've got a perfectly good platform I can play with to automate this. I need a build image that's in irons out these quirks. So, now I've got my beach head. I can land on there and figure out how to work the rest of it. But I also... I'm not so comfortable about leaving the default username and password for my BMC. You know, the vendor knows that. They set that for me, and really, a lot of these things are widely known. So, the first thing I want to do is change that. Which, it's a little bit awkward in IPMI. So, this is IPMI. It's standard. It'll work everywhere. If you have something with more functionality, there's cleaner ways to create users. But the oddities here are IPMI has a fixed number of users. So, you actually have to pick the user ID first and then set the name, set the password, enable the user, and then turn them on as admin. Which had the neat little side effect of turning off the original account. So, enabling it first before you set up the privilege level was a very good thing to do. Yeah. And then there's one last step. Now that I have the account and I'm using the new credentials, I have to go through and do SOL payload enable. Otherwise, I won't be able to actually use the serial overland. And all of this is done remotely. But, now that I have an OS, there's also tools to manipulate this locally. Yeah. But those aren't core kernel drivers. Right now, I'm hoping that will be fixed. Depending on your OS and everything else. But anyway, there's a PPA open compute developers, OCP certification tools that has the MEI driver and DCMI tool to manipulate this from the local host. Which serves as a very useful tool if you don't have, if you don't want to go through the network authentication part of it. The question was before I move on the local IPMI tool doesn't work. The main problem is you need the IPMI driver. You need all the IPMI tool or DCMI tool is going to talk through slash dev slash IPMI or IPMI zero or IPMI dev zero. There's like three different ones that try as I forget what they all are. But those are only there if you happen to have the driver installed for your hardware to talk to the BMC. And that's the MEI component of this. So, it's very simple. So, the commentary here. Rackspace is supporting me. They're using the same very similar kind of hardware as they kind of alluded to in their previous presentation. Thanks for the shout out guys. We kind of need special drivers for this. But it's all open source. It's coming upstream as we go. And drivers and firmware are really the biggest pain for all of this. So, open compute. It's a wonderful thing. It standardizes the configuration. It standardizes the hardware. But the firmware is specifically left to the ODM. Which means anything you need to do with firmware, you still have to talk to your hardware partners and make sure you have good hardware partners to work with in the first place. So, anyway, now we have a host. It's time to build this and make something we can build repeatedly. Let's get ironic. Before we can use ironic, we have to have the main components for this. Neutron for the networking, glance for the images, if we wanted to use it, and of course Keystone and Nova. We're in this use case, we're just deploying and adding to an existing cluster. So the only thing we really need to provision is a single flat network for the management interface. So, we didn't have to get into any of the in-depth working of Neutron. We just needed a basic DHCP server to give it a new address and say pixie boot that. Chef took care of the rest, or takes care of the rest. And I'll further the install guide for all the rest of the nitty-gritty of this with the following exceptions. So, getting the ironic agent is only half the battle. That's, there's really not much to do that. It pushes the image, boots the image, or pushes a deploy image first, which gives you, um, yeah. It pushes the deploy image first, which exposes, if you're using the pixie driver, the root disk as an iSCSI target, and then uses that for the conductor to write down the real image. But for either of these images to work, you have to have enough drivers to actually write them over the network. So, even if OpenStack provides standard packages, those are the kinds of things that'll probably still have to be customized, because if you're using any kind of custom network drivers, you may not actually be able to use the root disk because there's no network. Um, anyway, TripleO has the disk image builder package, one of which is RemDiskImageCreate. So, from there, that tool will create a new deploy package with the pieces needed to make an iSCSI target. Um, and it's the RemDiskImageCreate is really just running through a sequence of scripts to go through what needs to be installed in this basic boot image. I went and tweaked that and added extra things there, so any other drivers you want to install on your deploy image, or we took an alternative approach to IPA, we just um, did configuration in the deploy image. So, we're talking to a bunch of external disk. ExternalDisk means that either when we provision it to the customer, it has to have any drive configuration, if there's raid, it has to have raid configuration. If the customer wants that done for them, it would have to be done early on in the process. Uh, we took the liberty of abusing the deploy image to do those kinds of things ahead of time, as well as ensuring the firmware and the basic health checks are there. So, the deploy image before it even goes through anything else, or the deploy image runs through your start-up scripts and exposes a nice fuzzy target. But if it fails, Ironic will just move on to the next one. So, I'm happy to find new ways to make it fail if any simple hardware validation steps fail. And, yeah. The only reason I didn't use Ironic Python agent was I didn't know about it sooner. So, I would love to do more with that. The main difference between the Ironic Python agent and the Pixie Boot, Ironic Python agent puts an active agent on there, which means with the Ironic Python agent, I'm talking to a mini-server that can do smart things for me, instead of having to have repeatable scripts. So, it's useful to have something to talk to on the hardware to do more intelligent discovery, or more intelligent troubleshooting if something breaks. Um, and, yeah. Managing firmware is painful. So, whether you're doing the Ironic Python agent, or doing something clever, or getting on the Pixie image itself, the deploy image itself, getting the firmware, finding the right versions, figuring out what breaks what driver compatibility, it's firmware is the big challenge of working with the hardware, and you need to work with your partners. And that, when I say partners, I do mean the customers that are using Ironic too. Because when they deploy on your hardware, they it's, well, let's say it's very beneficial for them to have some sense of what hardware they're using. Because if they know what the hardware is using, they can participate and understand what kinds of things they need to be watching out for and how the hardware can impact them. So, uh, for the extra, so we have our deploy image, or IPA, whatever, and to actually create the customer image, again, triple O tools, disc image builder, disc image create. I'm calling this one chef image because it's just being used for our chef process. And about the only thing we need is whatever network drivers we need and, uh, it's, I found it useful to have the chef client built into the package itself. It just made bootstrapping a little bit easier. So, adding in the minus P means I can add more packages to the base image and provide a cleaner, canned image, or a better canned image to customers. One other thing I had to change, in the ironic.conf, there's a parameter, pixie append params, so I mentioned before that the serial is set up, by default it'll go to tty0. And tty0 is no good for me for any of the platforms I tested, so I can either specify a variety of consoles and see how that works, or specify one or the other depending on the platform. But this is per conductor, which means that if I have one conductor managing a whole fleet of servers, um, those parameters go to all the nodes regardless of whether they were running sent us or about to. So, that's something that I came here and was hoping to address with the rest of the team, or with the team really. So, now, when we're actually creating the, um, the flavor for ironic, when we have the image we specify, or in the node and the flavor we have to have the RAM size, the CPU, the size of the root disk, the architecture to make sure that your disk image is going on to hardware that supports it. But there's also more, um, there can be a lot more parameters than that. So, this is just kind of the base OS, the bootable OS, but we're attaching to lots of other disks, for example, or we may have a variety of other hardware configurations. If somebody wants bare metal, it's usually because they have various configuration tweaks and special hardware they want. So, the flavor has to, we have to reflect the flavor, reflect the configuration somewhere. So, uh, on the ironic side, we go and add properties slash capabilities to the individual ironic nodes. And that just becomes a flavor key for, um, the flavor that we're using to deploy. So, as long as Super Fast is set to yep, then we'll get the right image and if they choose the, uh, Super Fast bare metal, they'll only go to nodes that have Super Fast set to yep. So, that's, uh, about it for actually setting it up and then we have to run the cluster. So, maintenance. So, maintenance. I'm providing hardware to my customer. I'm providing the basic level of service and this is hardware, so it will fail, it can fail in interesting ways. And I do want to give a better experience, so I want to make sure that I'm ahead of the curve as far as hardware monitoring. But there's only so much I can do from the out of band. So, IPMI gives me basic sensors. I can get thermal and CPU type of sensors. If it crashes and burns, I can get a basic log of CPU and DIN failures. Um, and even those are can be iffy. Those are, yeah, you tend to see hard failures but, um, IPMI is not the ideal protocol for monitoring servers. So, by the same token the people that are using the images, they're consuming a service from us. They're consuming this bare metal from us. So, there has to be a handoff there. I can't understand what they're using and what kinds of things it can do. I can't just give them a generic thing and say, you go and monitor this. Uh, well, we can do but it's helpful to tell them this is the kind of hardware you have, this is the ways it can fail and things can fail and be monitored in interesting ways. So, in this case in particular, the open vault tray, the open vault tray are extra disk attached stuff. It has fans on that for cooling. But the disk tray doesn't have any network monitoring. It only has the, um, the storage connection back to the CPU to the server node. Which means that when a fan fails on your storage tray it gets reported as a storage error. So, if you want to make sure you're monitoring for fan failures on your chassis you have to be monitoring the storage array for fan failures. Which is not going to be visible from IPMI that you can only see from the, um, host OS itself. There's various little things like that. It just becomes a matter of having in our case because we're this is all private cloud. The people that I'm deploying to I can walk over and talk to. Uh, it makes it very easy for us to cross those boundaries and have them understand better what the hardware they're working on looks like and acts like. And of course the last stage of the life cycle is decommissioning. Which I don't have to talk to, talk about because the rack space guys just spent 40 minutes in the other room talking to. Thank you guys again for sending everybody over here. So, I think that's a very nice note to end on. Uh, any questions? Yes. So, the question was how do we, how do we manage the electrical part? Because these open racks are 48 volt DC. DC coming in. They're not like the normal racks that you're using. Um, I'd have to just, I'd have to say I'll leave that to facilities guys discuss. So, uh, I know there are other open compute specs that are intended to address the fact that there's a lot of customers that don't want DC. They want to put things in standard candidates or with standard power sources. Um, as one of the only other remaining as one of the only publicly talked about ironic installations. Welcome to the club. Um, can you give us some kind of idea of the scale? Like I understand not talking about specific numbers cause I don't do that either. But maybe like an order of magnitude or a general idea of how, of how large you've scaled it up with the pixie driver today. That's not something I can really speak to. I'm sorry. Yeah. Yes. It's a question was how long until this becomes consumable by this. And, uh, obviously I could only speak for myself and purely hypothesize on that one. I like the direction the project is going. I'm very excited about open compute. Um, they have a lot more interesting designs now and a lot more people contributing. So, uh, I think at the last summit which happened right before this summit, they announced another Microsoft chassis. So, Microsoft is in the game designing hardware. Facebook is designing hardware. There's hardware that's going as the pure DC play like this or geared more towards standard traditional data centers. So, it's getting there. I'm excited. So, yeah. Yes. So, the question was, it looks like I'm at really pioneering here. We're at the bleeding edge to have plans. Well, I'd say I'm a little too low in the food chain for plans. But I do like playing around with the new toys and I'm happy to get involved in all the other stuff. Uh, and actually, again, personal experience only. I'm specifically coming here to make sure that the direction Ironic is going in is the direction we're going in to make sure that whatever hardware we're using, we can automate and treat like the rest of our cloud infrastructure. Anybody? Oh, yes. What could Ironic add to make your life easier? That's cheating. He asked, what could Ironic add to make my life easier? Let's just say we'll be going through that more tomorrow and Friday. Particularly, as you mentioned in your last session, the IPMI only does local accounts. Only does local accounts and passwords for ChefBCPC, we generate passwords and use random passwords because I don't want to know or care about the passwords wherever I need to. So, Ironic do password management for me? Know how to reset the password to something random and strong, especially if it was on a per server basis, would make things a lot more interesting. And of course, that's only for things that are pure IPMI like the open compute where I can't necessarily count on something more than local accounts. Yes. Okay, so the question was from your background, BMCs often have the tagged VLAN support to separate out the out of band management from the rest of the host interface. I again defer that to the network, the people with the network expertise. So, I personally am much more comfortable when I can just say it is a physically separate partition. There is no way there can be a firmware bug or someone maliciously tampering with the firmware to change that kind of separation because that kind of VLAN tagging is almost certainly done in firmware not hardware, which means if I provision someone a physical root server, I can't necessarily count on that staying the same. Anything else? It's already six o'clock by my watch. I don't want to hold everybody up from dinner. So, thank you all for your time.