 All right. Awesome. So first of all, thanks for having me here today, guys. My name's James Denton. I'm a principal architect at Rackspace Technology. We're based out of San Antonio, Texas. I've been involved with OpenStack for the last 10 plus years, primarily in the private cloud space and especially scope to networking, both physical and virtual. So today, I want to share with you some experiences in using Bifrost to perform remote server provisioning, which is the ability to deploy an operating system against a set of nodes across a VPN tunnel with no local PIXI or DHCP server or provisioning system in place. A little disclaimer, I'm a master of inefficiency. I learned some of these tools just enough to perform a POC. And so if you see anything here that looks, you know, inefficient or wrong, feel free to, you know, bring it up in the Q&A afterwards. I'm always soliciting advice. All right. So for those that aren't familiar with it, Bifrost was introduced around the metaka timeframe and is a sub-project of the Ironic project, which is a service responsible for bare-metal provisioning. It's a collection of playbooks that can install what is essentially ironic light, as well as handle enrollment and deployment of bare-metal instances in an automated fashion. So with Bifrost, the ironic bits are completely decoupled from the rest of the OpenStack stack, safe for some optional integrations with Keystone and possibly other services. So here at Rackspace, we're just dipping our toes into the Bifrost waters after having run Ironic for many years now as part of our on-metal offering in public cloud, as well as our private cloud offerings based on OpenStack Ansible and Red Hat OSP. So where Bifrost is really useful for me is its ability to answer the question of how do I bootstrap the node that bootstraps the nodes? So whether it's one or 100, Bifrost does seem to be up to the task. And really, because it is ironic under the hood, I don't think that's any surprise. Couple with some additional automation, you can find yourself standing up in an environment relatively short order if you've taken the time to develop an inventory and a strategy. And we'll kind of go into that a little bit later. Traditionally, in a remote environment, the bootstrapping process consists of a deployment engineer with an ISO and a few days or weeks on-site or remotely to handle the operating system install and network configuration. So or in some cases, the vendor might be asked to deploy the operating system and basic network config before the servers are even shipped. So tools like Bifrost Ironic and even canonical NAS, if we can say those words, help bring a cloud-like deployment strategy to bare metal. So for the rest of my time here today, I'll share with you some basic steps of getting started with Bifrost, as well as how to cover how we are using it to deploy bare metal nodes across the United States via hub and spoke IPsec VPN with no changes to Bifrost and some limited changes to DNSMask. So before we can actually proceed with the installation of Bifrost, one must know their environment as some of those details are required as part of the installation and deployment process. Pictured here is a remote deployment in a data center in Dallas, Texas, consisting of an Edge firewall and a handful of bare metal servers connected to some local switching infrastructure. The use case for solving for here is the ability to bootstrap this particular environment using Ironic-based infrastructure located in another DC. The R740 that's pictured here has a dedicated Pixie interface on EN02, which is a 1 gig interface on that two-port NIC. In addition, there's a single 2 by 10 gig bond. For production traffic, they may or may not use that same Cisco ASA. But for the purpose of Bifrost and Ironic in particular, we only really care about that Pixie interface. In lieu of the DHCP server that would exist in the same local VLAN as this Pixie client, our Cisco ASA is going to act as a DHCP relay. So it's going to forward all of the Pixie and DHCP requests across the VPN tunnel that exists in the other data center. So in what we'll call the centralized Chicago data center, Bifrost is hosted on a virtual machine that's dual-homed. One network interface is used for the management of the Bifrost VM itself. And the other network interface will be used for Pixie and DHCP client requests coming over the VPN. The former is the black line named external. The DHCP interface is the red one named Pixie relay. The management interface not only used to reach the Bifrost host itself for management, but to allow that host to reach out and tickle the out-of-band interfaces of the bare metal hosts for power-on and reboot commands. In our environments, out-of-band networks are truly out-of-band, but for some remote environments, it might be reachable via the same VPN tunnel. As long as it's reachable from this Bifrost host, whether it's local or routed, Ironic will be happy. Since we intend on these Pixie clients to hit the backside interface of this Bifrost VM, static routes or some policy-based routing rules are necessary to ensure that the response traffic is actually sent back out the same interface it came and towards the ASA and back over the VPN. So in this example, the client will source from DFW from the 192.168.192.0.25 sider over the DHCP relay and over the tunnel. And the DNS mask on this Bifrost host will respond accordingly. So to summarize, we have these two data centers, DFW and ORD, and they're linked by an IPsec site to site UV tunnel between two Cisco 8A firewalls. Everybody with me so far? Yep. Cool. So now that we know what the environment looks like, let's take a look at how to install Bifrost. So the installation of Bifrost can occur in one of two ways, either the Bifrost command line client or via the Ansible Playbooks. Both of these options are available from within the Bifrost repo, and the documentation upstream kind of gives you more guidance on using the CLI versus the Playbooks. But the takeaway here is that those Playbooks are totally customizable, and you can integrate them into whatever system you deem necessary. But in either case, you're going to need to know your environment, which at a minimum includes the DHCP address pool for handing out addresses to the bare metal nodes seeking leases, as well as the network interface on the Bifrost node used to respond to those requests. So in the DFW environment, the bare metal host's Pixie network is 192.1.6.8.192. slash 25. The gateway device is a Cisco ASA with a .1 address. The DHCP pool could be a subset of that slash 25, and then the hosts themselves can have static IP assignments from the rest of that site or once provisioned. So it's important to remember that none of these addresses are actually local to the Bifrost host or to that R&D environment. They'll all be routable and reachable over the VPN tunnel. Pictured here, I've done myself a favor and made myself a small bash script that takes some of the environment variables and inputs them as overrides for the installation Playbooks, namely the ability to define which version of Bifrost I want to install in the network interface. I have chosen here to deploy stable the antelope version of Bifrost. But sometimes there are bugs, which in testing, I had to revert to ZED, kind of avoided master for reasons. But once this installation is complete, you can run bare metal commands against the Bifrost cloud to find in clouds.yaml, which is actually installed by the Playbooks. And this is essentially the Python ironic client decoupled from the OpenStack client. So the installation of Bifrost installs configures a core set of services, including ironic API and conductor, ironic inspector, DNSMath, MariaDB, and Nginx. DNSMath serves an important role here as the DHCP server and the TFTP server directing those clients to download the initial RAM disk or VPN. The Ansible Playbooks are responsible for configuring all of these components, so care should be taken to avoid changing configuration files manually unless they be overwritten on a subsequent Playbook run. The exception to this, however, is DNSMath, as there are some changes necessary to facilitate the DHCP relay we've implemented. Thankfully, DNSMath does allow for the use of additional configuration files in a com.d folder that won't be overwritten by Ansible, but actually get aggregated with all of the other configuration files. And we'll kind of demonstrate that in a few minutes. A quick glance at this ironic.com file shows that it references the local host and directs clients to itself for the download of the RAM disk and some of the other necessary components. And then we can see here from MariaDB, it really is limited to just the ironic related databases. So none of the OpenStack components are installed. And in leveraging a separate DNSMath configuration file that won't be overwritten by the Playbooks, we can define multiple sites that will be serviced by this Bifrost host by multiple VPN tunnels using tagging. So here we have the DFW site defined, as well as a site in Hong Kong. Every additional environment we want serviced by this particular Bifrost instance needs these options with a unique IP space and appropriate routes in place on the host. A separate Bifrost host can be used per site, as well, for further segregation and customizations. But ideally, this particular Bifrost host could service 30 different data centers in a hub and spoke VPN, as long as each of these configuration options are listed for those individual site. So whenever you perform like a bare metal server list or bare metal node list, you'll see all of the nodes sort of intermingled. Once Bifrost is installed, the next step in the journey is to enroll the nodes. The enrollment process includes creating the nodes within Ironic and defining various properties, such as the out-of-band addresses and credentials, image locations, and more. The images need not be local to either environment. And in fact, we've uploaded them to our Swift-based object storage system node as cloud files for easy access. In the DFW environment, we're working with an HP ProLiant Gen 9 and have defined the out-of-band address, credentials, host name, and MAC address of the pixie interface. When enrolling the node via Bifrost playbooks, these properties are defined in an inventory file, like shown, and are used to create the bare metal node within Ironic and configure such properties as driver info, driver, and off-it-fo. Upon enrollment, the node may be cleaned automatically, depending on the configuration, and may sit awaiting the next command. Bifrost can even be configured to inspect the node upon enrollment, and that resulting information can then help the operator to be a little bit more explicit on the deployment details we'll see in the next step. Enrolling the node is as easy as using the Bifrost CLI command, as long with the aforementioned inventory file, which then triggers Antible to perform the enrollment task, and once complete, the node's available for deployment. So here we have an inventory that I've created that's for a particular storage node, but you can actually have an inventory of 100 nodes, and it will actually enroll them all at once. So for every task, you'll just see it repeated for each individual node. To deploy a bare metal node, the Bifrost CLI command will be used to deploy, I'm sorry, with the deploy directed, to deploy against the node specified in the inventory file. You can either target a single node or an entire environment for enrollment or deployment based on that inventory, but the throughput may be limited if you do a bunch at once, especially over this VPN, because remember, the RAM disk and the kernel are being shipped over the VPN tunnel. One thing to keep in mind is that the root device hints are really helpful, as it allows Ironic to target a particular hard drive to deploy the image onto, in case you have multiple disks. So this being a storage node, it's got a RAID 10 for the boot disk and then a bunch of RAID zeros for all of the storage disks for like Ceph. So doing the inventory allowed me to identify how those disks were identified by Ironic. And then I can update the root device hints to make sure I target the RAID 10. Let's take a time. Pre-recorded a bare metal deployment process that I'll just kick off real quick. Once the PXE process begins, the RAM disk and the kernel are downloaded from the Bifrost host by the VPN. It takes approximately three minutes to copy the bare metal image to disk, which includes downloading the image from object storage before the node is rebooted. And then the remaining seven minutes in typical HP fashion is the pre-boot and boot process. So here as we look at this, we can see that this 192.168.192.239, port 8080, is actually the IP address of our Bifrost node. I'm sorry, CFTP. And in this case, HTTP hosting that IPXE file. So we're downloading the RAM disk right now. It will boot. And then the Ironic Python agent kicks off and starts doing its job. And we'll download the image from Cloud Files. I'll speed this up a little bit. So if you're familiar with Ironic, this is Ironic. So it's doing the same thing that it would do on a typical Ironic deployment. As we get a little bit closer to a three minute mark, we'll see Ironic, well, we'll see the shutdown process start to begin. There we go. And then a reboot will occur. And then the node will come back. What I really want to show you is our node is now up after about 10 minutes. And we have a host name set here. So that host name is provided by the metadata and user data of this particular host, which is provided by a config drive that is actually a partition on the hard disk. And cloud init in this cloud image pulls the relevant metadata out. So let's revisit that earlier diagram and take a look at the high level process being followed. Once the deploy is triggered, Bifrost and ORD tickles the bare metal node in DFW via the out of band interface and powers on the device. The bare metal node, which is then set to boot from the network, initiates a DHCP request. And that DHCP request is relayed over the VPN tunnel to the Bifrost VM in a response it sent back. The bare metal node boots from the RAM disk, indicated in the DHCP response. And then the IPA copies the cloud image to disk and reboots the node. And then once the server is rebooted, cloud init will gather that user data from the attached config drive and a basic network configuration applied. So that IP address that we actually defined in the inventory as IPv4 underscore address is what's configured on EN02 upon reboot. The RSA key that was provided as part of the inventory is also used on this host. So I can SSH from that Bifrost VM as long as that routing is in place and access this device. So user data can be leveraged to provide additional configs and commands that can be executed by cloud init, including net plan, batch scripts, and others. For our purposes, this base connectivity is sufficient enough to allow us to run additional Ansible playbooks that will set up networking bridges, bonds, and other IPs to stage the node for an eventual open stack deployment. And by leveraging that other set of custom playbooks, we can prescribe specific network configuration details for each node that will eventually be kicked by Ironic. We leverage the Ansible net plan role and apply a very specific net plan configuration that fits our open stack Ansible deployment model, along with all of the individual IP addresses and routes that are needed to facilitate that. So once I run those custom playbooks against the IP on EN02, we can log back into the machine and see all of our routes have been updated. And now if I ping Google's DNS server, it's actually leaving the bonded interfaces and not the pixie interface. So the use of Bifrost has considerably simplified the process of provisioning remote deployments by not requiring a full blown open stack or Ironic deployment or any other provisioning system on site. And simply relying on standard IP stack VPNs and the DHCP relay functionality, we can theoretically stand up an entire environment in about 30 minutes, minus any cabling or other issues. So the takeaway here is really knowing your environment beforehand, developing a plan based on a proof of concept like I did here. And that's the key to success. So thanks again for your time. If you have any questions, I'd love to hear them. There is a question from someone on the call. Are there any architecture specific dependencies on Bifrost or does it work on R64? So really, you're going to be limited to whatever Ironic supports because Bifrost itself is simply playbooks and automation around leveraging a very compact Ironic deployment. So if you have bare metal images that support arm64 and the kernel and RAM disk and all of the other components necessary to actually bootstrap a bare metal node, support arm64, then I don't see why that would be a problem. So I think someone asked that question at the one of the opening for Summit, maybe the year ago. And community, the Ironic community, they don't focus too much on the architecture. But there were some voices from the group, I remember, stating that it just works out of the box. Like it's not really tested or just by vendor or by any kind of institution. But again, as someone mentioned, I think it was a personal for both Australia. I think it was red added, too. He said they had a use case for validating arm and they weren't able to do it. It just works with the regular Ironic, but Bifrost is just using the Ironic bits. I don't see how this would be done. That's right. That's right. Yeah. I forget now if the Bifrost Playbooks download Ironic packages for that particular operating system or if it's actually compiling from source, I can't recall offhand. I suspect it's probably pulling devs and RPMs. So your Bifrost host, I don't think there's any reliance on that being an arm64 box, but certainly the images that you want to deploy in the kernel RAM disk would need to support the arm architecture. OK. So I have a question for you, James. So since you're talking about these distributed architectures, where you have some Ironic boxes from Dallas and the other ones from the other part of the country, where do you store all of these images for the Ironic? Do you distribute them as well or is it? I may have mentioned that, but maybe I missed it. Yeah, sorry if I have to. They are an object storage, right? So the Rackspace cloud files is like a public object storage service. And so that has some built-in CDN functionality. I don't know if it automatically distributes those images across the various sites based on user demand or it's not certainly something I had to do manually. But if I were deploying bare metal nodes in other regions of the world, then it would certainly behoove me to have those images a little bit closer to those sites. And for every node that you define in the inventory, you can be specific with the image location. So if I know I'm deploying against a bunch of nodes in Hong Kong, well, for those particular nodes, I'm going to specify a location that's a little bit more localized. In the US, I could have all my images hosted in Virginia and Dallas and Chicago wouldn't really mind too much. But I can see that being a problem somewhere else. I've got a question about the choice of IPsec as the tunneling protocol for ODAP demo and potential alternatives. For example, I understand that the way you set this up with the DSP board as the only unicast gets forwarded across that IPsec tunnel, which makes it pretty simple. You could probably use any other tunneling technology in place of IPsec as well. But if we compare that to some kind of a layer 2 tunneling pros and cons, what you thought? What guided the decision to pick IPsec? Yeah, so I'll tell you, the decision to use IPsec was based on just prior experience in configuring, especially like the Cisco ASA. I used to live and breathe those devices earlier in my career. So it became just familiarity and a fairly simple configuration on the side of the ASA. I think I spent a few hours trying to figure out the DNS mess side of things a lot more time than the ASA side of things. So that's really it. I don't see why this type of solution wouldn't be solvable with another tunneling protocol or another strategy. Just for me, this was the quickest path to success. Basically, any tunnel that could forward unicast traffic play a thing we'll do for this use case? I would think so, yeah. And for our remote deployments, a Cisco ASA or a firepower running ASA code is a pretty standard part of our bill of materials. So this was another edge device, like a Palo Alto firewall or something, the strategy might look a little different. But for our deployments, IPsec is a pretty standard component. Another question about the two network interfaces on if it was Chicago where the IPROS VM was living, you had two nix, their one for external access, and the other one was responding to the dhp requests. Any reason you couldn't handle that on the ASA using a split tunnel? So yeah, I did dual home that IPROS VM. And looking back, that could have been just a single interface. Right. Yeah, there's no real technical reason why it wasn't. And the reason now escapes me. I think I mentioned earlier in the slides that once I get it working, I tend to move on without questioning those decisions too much. But if I were to redesign this for a more production use case, then certainly I'm streamlining it, I think would be in my best interest. Yeah, I guess it wasn't the extra playbook you ran at the end who set up the, but I was on the other side. I was on the bare metal side. Right, right. So with the dual homing of that IPROS VM, like you mentioned, I have to add static routes to get back out that interface. Yeah, and certainly with one interface, I wouldn't have to worry about that anymore. And moving forward, it may just be a good idea to eliminate that step. Yeah. It opens up possibility for ACMET routing without, I think without. Yeah. Right, right. Yeah, it's really not a necessary thing anymore. And that might have just been on me initially, anticipating some additional difficulties that never manifested, right? And it works the way it is. I'll touch it, but it ain't broken. Exactly, it's not broken, yeah. So, you guys go first, yeah. I was going to ask, does BIPROX have some kind of support for multi-conductor for scalability? My understanding, ironically, that the way you scale nodes is by the increased conductors. Each conductor is responsible for pulling the IPMIs and making sure the nodes are up in the state that they're in. So if BIPROX supports that model, you can scale out or zone. I mean, probably not in its current form. It seems to be written specifically to kind of assist in bootstrapping a limited number of nodes to then deploy a full-blown ironic setup, right, that would then be used to do the rest. So I don't get the feeling that that's kind of a use case that's really asked for, especially if we look at the playbooks and they've got things prescribed to local hosts, it doesn't really seem to want to fit that multi-conductor model. So you want to scale it by using multiple BIPROX nodes, something in that manner? Yeah, probably not. Maybe if this became something that was, you can set up a BIPROX environment in under five minutes. If you were going to use an ironic-based system in a similar model, then maybe you set up a full-blown ironic deployment with all the other bells and whistles, and then you get the scalability. Sure. One other question for you. Does BIPROX, if you're looking to ironic-based capability to do Kexec, so you can skip the reboots after the image is deployed? So there's a feature in ironic-based, you do that so you don't have to reboot every bare metal load a second time. You can Kexec into the new kernel at the current disk. So it's easy to reboot time and that scale that can save you a ton of time. Yeah, that's a good question. You're talking the reboot after the deployment or the reboot after the clean? OK. Yes. After a great second, you can take exactly the kernel that's on the new system. It's just been written. You know, I don't know. I would think so. I mean, I think kind of whatever's available to ironic from a conductor and API standpoint is probably true. It just comes down to your ability to toggle that and turn it on or off. Not I don't think everything is exposed via an ansible override. But if you could make it so, right? Yeah, sure. Sorry, I have no experience with BIPROX, but I've previously deployed ironic for about 1,000 nodes APC cluster. So we'll keep this capability back. And that was waived actually in all you need to see a little bit about. So it's been quite some time. Yeah, I mean, if that's just a setting in the conductor parameters, then it might be worth trying. I know that whenever a node is cleaned, it'll hang out and await further instructions, whether that's the deploy command or whatever that happens to be. Inspection will actually reboot the node into the inspection kernel and then do its thing and reboot after that. But yeah, I'm curious to try that out. Thank you. I was actually going to ask a question to you about the scalability. I think that a rule of thumb we see with our customers, like one conductor per 200 vermetals nodes, right? If you only have one BIPROX node, then I don't know. Have you maybe done any scalability at all with the BIPROX? Not with this BIPROX setup. No, it's really just meeting that use case of like, I don't have any localized kick system in place. And so how do I quickly at least spin up a handful of nodes to then maybe build a little bit more robust setup that can then do the rest? And I think that's where it's targeted. Like when I mentioned bootstrapping the node that bootstraps the nodes, I think that's the primary use case. Because otherwise, I mean, if you get into all of the other ironic things, I don't know how much dependency there is on Nova or Keystone, even with a more scalable ironic architecture. But it'd be worth testing out. Because you don't have a load balancer either, right? Yeah, I guess there is in Ginex. But yeah, there's probably a bit more work to do to figure that out. Sorry. That was cool. Just speaking from the latest version I got through was I caught it when I still ran an ironic, but one of the biggest scalability issues was the way it pulled up IPMIs was clearly. So if you had to wait for 1,000 nodes, it would take 15 minutes to pull up every single IPMI process that I had to get responses. So I don't know if it's going to be a good thing or not, but I don't know why it's not. I have this thing called Shroudactor Groups these days. I don't know if you've tried those, where you can assign a block of very little nodes to a specific ironic node. You can kind of distribute the load. It's still handled by the conductor level? Yeah, it will be for the conductor level, but it's like you have the option. We see some of the big ironic users, like CERN using it, where they have a bunch of conductors and they assign 200 very little nodes to one of the conductors and then they just go for the H&A kind of mixing of the machinita. There are some ways that right now in the ironic Yeah, and I think that will actually, yeah, doesn't the failover of that work, where it moves them to another conductor and then it preempts when that conductor returns, something like that. Yeah. All right, I think we're at the top of the hour. I have one more question. Yeah, one more quick. In your example, AJ, thanks. In your example, the storage that you set up, you mentioned that it had disks per set to use. Are those double disks, are they attached storage or does it really matter? No, so this is like a, this is a G9. I think it has 12, three and a half inch slots. And so, you know, there's a RAID controller. And so the RAID, two disks and a RAID 10, and then the rest are in a RAID zero because of the limitations of the RAID controller itself. It's not the ideal Seth storage node, but everything's, it's all integrated into a 2U chassis. Thanks. Yeah. All right. Thanks so much, James, I appreciate your time. It was a great presentation.