 Hello Welcome to our talk ironic and edgy This talk is going to be delivered by Again, my name is Dmitry Tansur. I'm a software engineer at Red Hat and upstream contributor for open stack ironic projects for more than five years My name is Ilya Itungov. I used to work at product security at Red Hat then I switched teams and I'm not now working at Ironic team with open stack organization. I mostly work on on the redfish implementation and things like that You will notice that shortly in this talk, we're going to explain what the Edge effort is and why it is becoming increasingly important and wanted by many open stack operators will go on go on explain in the bare metal provisioning Technologies and why it is relevant with the edge use case In the ironic project, we seem to have many areas to address and improve to make ironic the best tool or better tool for for bare metal provisioning at the edge It seems that the edge effort is driven by manufacturers just to name a few the growth of IOT Devices deployments pushes that data collection and processing facilities closer to the IOT swarms the emergence of High quality broadband video delivery services pushes the data storage facilities closer to the households It also seems that some Data center operators probably trying to cut costs and improve merchants They they they are looking at the unusual locations for their data centers places like near Arctic Circle where the electricity is cheaper and cold the climate helps saving on the cooling infrastructure The other factor all these factors call for better automation of of the data centers besides that otherwise unrelated trends like like Applying the data Learning machine learning data mining and artificial intelligence Technologies on the data center management tasks these things also Also contribute to this desire for better automation of the data centers Once we move Pieces of our hardware to the edge of the network it becomes any kind of physical access becomes a Problem a challenging thing or even impossible thing therefore Because there is no one to Switch or to power circle or to do anything with the hardware The only practical way to deal with with the hardware at the edge is through the network But the network at this distance becomes a lossy unreliable unstable insecure and Any kind of network access becomes challenging therefore the reliable and Sophisticated that hardware management protocol becomes more and more important on top of that smaller points of presence main post additional limits on on the size on on the wreck space cooling and power required for for hardware management harness Stretching the control plane across the globe Effectively it increases the attack surface on the control plane making the whole cloud less secure and again this calls for more secure and modern Hardware management protocol for cloud operator Setting up the infrastructure is not a is not necessarily the one time affair It's more like a continuous process because not only they need to Enroll their That the notes once they might need to phase out the broken ones to enroll the new ones Or maybe they eventually need to wipe out everything and land the whole the whole hardware infrastructure to somebody else It seems that many hardware vendors finally Converge or converging on this on the single Protocol suite for for managing all kinds of hardware The latest development in this area is known as redfish Basically protocol And it seems that many vendors are trying to apply it and implement it to manage all all their offerings Let's talk about the ironic project So ironic is a project and as an open-stack umbrella one of its main many projects Dedicated to bare metal provisioning and life cycle management It appeared that official projects since in the killer release cycle It can work standalone and can work at a plug-in for the open-stack compute project, which is known as nova It covers a wide number of features not only putting an operating system to a hard drive, but also bias management to raid management Cleaning your hardware between tenants and if you are the cool features We have a lively upstream community for example in the latest rocky release. We've seen 359 commits from 81 contributor from 24 different companies They have established relationships with hardware vendors and good support for this hardware This includes Dell HP Fujitsu Lenovo Cisco is of the latest release. We are adding Huawei in the current release We have well support for a variety of Established standards as well as new and modern standards and just to name a few it's a PMI pixie I pick sir at fish SNMP ufi. We actually have decent ufi support support for secure boots for some vendors so Yeah, it should be quite visible. This is how Ironic Feeds in the whole picture of open-stack and as I said it can also be used completely standalone without any other open-stack projects Ironic works as a back-end for open-stack compute, which is Nova post images from open-stack image service glance Use a cinder for providing boot from volumes using the networking service to provide connectivity to virtual networks and we actually can work with the neutron service to provide switch management get notes on the lands and Back and of course we can provide authentication through the open-stack identity service keystone and And pre-edge work. This is how your Aronic architecture can look you have a controller node It's hosts your essentially your control plane. You have for example Nova API as the remaining Nova services You have Aronic API there Then you have dedicated Aronic conductor nodes or you can host Aronic conductor service on the controller node but Generally, it's better to have a few of them They have Aronic conductor service which manages nodes through drivers and Driver implement various aspects of hardware management depending on the vendor depending on the technology you want to use and Drivers talk to bare metal nodes Usually you have some redundancy, right? Several conductors can manage several nodes for HA purposes and Aronic works in active active HA mode and again in the classical deployments It can look like this deployment starts Aronic conductor builds a TFTP environment picks the environment to build the nodes configures the networking service to provide the HTTP arguments Uses a PMI for example to power on the nodes it puts from network Boots into special deploy image where we have an API service called ironic Python agent because it's written in Python and its agent and is for Aronic and It provides some API endpoint to do various things to the node for example our oldest and most probably widely used method is to expose the root device Through the ice geyser protocol which then is connected to the Aronic conductor which then flushes the image onto this ice geyser share Reboot the nodes sets up DHCP or sets up Grab local boot for the final instance and a user can use it so it can be through Nova So it will look like normal cloud or it can be so ironic standalone and We'll be probably more like a standard provisioning. So Architecture's what challenges do they bring us? Well first PXC is based on UDP is Yeah, it's going to tell you in the details DHCP as well across wide-area networks. They can be unreliable. They can be insecure IP my is also based on UDP in this boss and reliable and insecure so short quiz Who here knows about cypher zero in a PMI? Okay, wonderful. I'm gonna tell you some cool story. I I Cypher zero is a standard mode in the IP my protocol which it's the protocol for management Bermuda machines out of band which allows using authentication without authentication Yeah, you heard me right. It's more authentication mode when you can provide any username and any password It was invented long ago. Probably back then it was not a concern for edge. It probably is traditional name other things traditional open stack use the same QP for RPC It allows for quite reliable RPC connection between components within each open stack service But stretching for example rabbit MQ over again wide area can be a challenge or impossible There's some work and stretching QPd, but There are problems there as well and of course low bandwidth between Your central location and your edge locations can be challenged with for example flashing the image to ice-skies So What are we doing about that? We are working on various approaches to federating ironic to splitting it more efficiently in Geographically distributed way Yeah, it's going to talk about replacing pixie boot with Approaches at virtual media or you fi HTTP boot which allows using UDP based protocols and not using UDP based protocols for transferring images Also guys going to talk about how we plan on a viewing DHCP over when We will have implemented Streaming of images Directly to the hard drive. I'm going to talk to I'm going to talk about it in a bit more details later and We are actively working on HTTP based management protocols instead of a PMI Also, I have support for secure boot as I mentioned so What about federation? There are several approaches some are implemented some are still in discussions some are in early planning I think it's like conductor groups allows you to group this ironic conductor Together with notes in a separate domains of the HTTP and pixie so that your HTTP and pixie traffic doesn't have to cross large boundaries This is implemented in a viable. I have been a prototype to replace Rabbit MQ with Json RPC So it's maybe an option for edge to avoid stretching MQP at the expense of maybe a bit losing persistence for your RPC traffic and For example is conductor groups your Architecture could turn to something like that you still have control plane in your central location You now have two dedicated locations with each or one or several ironic conductor instances With so on set of drivers and with their own notes So you see this IP my traffic and pixie traffic here is isolated Within one location, you know, and with probably Json RPC Can avoid having a rabbit MQ shared We still have a database shared between locations here and some future ideas include for example using switching to per conductor database where each conductor a group of conductor only handles its own but only have information about their own Bermuda machines and They don't share it it will require quite some coding work on the API level. So We need input by the way from people who are interested in all this work. We need to know what you think about it What would work what would not work for you? I have a prototype of a bit different thing in a federating proxy It is essentially an implementation of ironic API that talks to other ironic API instances as a back-end so then you could Each of your edge location would have a complete ironic installation with its own API database or PC or whatever and This federating proxy would just talk to them through the same HTTP API as the clients do And to talk about booting I passed earlier Thank you Maybe it's a little bit of trivia to any computer to boot there there would be at least two phases involved Network boot first network the computer needs to find its place on the network that is it needs to initialize its network stack and Second secondly the computer needs to pull the image of the network and start executing it apparently this process is not not Reliable and not easy The problem of booting has been approached many years ago The first and still widely used implementation we still use it in ironic Is is known as as pixie this pixie is a suite suite of protocol based on two quite ancient protocols Developed in probably 80s One is bootpea DHCP, which is used for network initialization and the other is TFTP which is used for image transfer Both protocols probably were designed with a small network interface cards in mines and the small Installations therefore they are pretty efficient on the resources, but they are also quite unreliable because both are based on UDP therefore later on the industry came up with a new implementation or More sophisticated implementation known as I pixie this with I pixie is a second phase of the boot process image transfer has been replaced With more reliable protocols HTTP or ice-cazi although the the first phase that for initialization is still based on on UDP DHCP Later on Especially in the context of of cloud deployments this First unreliable UDP based phase becomes a problem and therefore the virtual media boot Technology came up with virtual virtual media boot There is this small Satellite computer sitting at the at the edge of the main system, which is and this computer is called BMC Baseboard management controller this computer is always up and it has intimate location Relationship with with the main system it can power cycle it for instance or configure it this BMC Computer implements virtual media agent Which can be instructed to pull the image of the network on its own and Then insert this image into virtual CD and instruct the main system to boot from virtual CD like local boot this virtual media technology offers many advantages for instance the Pult image can be cached it can be authenticated or eat and and the most importantly In the edge context This the boot process can be done completely over Reliable protocols over layer 3 layer 3 protocols TCP in particular With ironic Deploy process there is still one phase which requires UDP In the case when when we need to deploy a computer with a local drive We need to somehow write the user image on the local drive to let the computer boot from the local drive later on for this phase of the deployment we use a piece of ironic Embedded into a so-called deploy image this deploy image is called at the at some point is booted at some point Into the computer and ironic runs there ironic pulls the user image of the network and writes it down on them On the local drive this deploy image still uses DHCP virtual media offers the way to Get rid of the HTTP even even here by using so-called virtual floppy for passing network configuration to the deploy image through Virtual floppy right as I'm through the through the networks of the HTTP There it's it's not yet implemented, but there is a spec on that and Upstream is working on this feature So a few words about writing images and images streaming as I said we started with ice-cazier based approach Which is quite easy to bootstrap had nearly no requirements on the target hardware We as it became apparent. It's not the most efficient the most reliable way to deploy anything We progressed to using this ironic Python agent on the target machines to download an image from an HTTP location Cash it in the memory and put it to disk now with this work We we progressed to using streaming so essentially we prepare an image on the conductor side and then The ironic Python agent side just downloaded image directly to the blog device by passing any in memory cache So we got back to very low memory requirements But there's still some bandwidth requirements and another idea that has been floating in the air and some contributors actually did work on it But it kind of got stuck is distributing images via be torrent protocol In this case an image would be seeded from ironic conductor and then the remaining nodes at the remote location would help The other nodes receive the image and thus reducing the traffic between the central location and the edge location So to sum it up We upstream is actively working on enabling edge architectures enabling bare metal deployment in the area architectures our main direction of development right now is various approaches to federating architecture reliable boot methods and reliable methods of delivering network parameters and Efficient and fast image delivery for remote locations and We need your help if you have Use case that requires your provisioning co-managing bare metals at the edge architecture We want you to talk to us. Please come to our sea Please come to our mailing list. We want to you want to know about your use cases We want to know about Your requirements and of course if you want to help us coding We'd be very welcome to have an active community. We're going to review a code and I Guess that's it. We actually finished quite early So I have plenty of time for questions So the question is what the status of the current pixie support? Yeah, it's still very well supported both pixie and I pixie of course we tend to recommend I pixie is a bit more reliable alternative and ironic supports a chain loading I pixie so if your machine only supports pixie for example, we can Load the I pixie image and then can see proceed with I pixie which is as I mentioned a bit more reliable At least you can have a short stage when you have to use TFTP and then your large Transfer with deploy image happens over HTTP We also use I pixie. It's not like edge topic But you also use I pixie to implement booting from I skysy volumes integrating with OpenStack volume service anything else We have upstream documentation on that It can use some laugh actually this documentation. It's not bad. I Personally so I'm a minute of advertisements. I personally developed a Library and CLI to call metal Smith, which is designed to work with ironic standalone or with ironic and neutron and glance But no nova it's it's much easier than using our API directly. So you're writing Python code or you're writing scripts, please take a look at this Library slash scripts, I would love feedback Okay, so the question is about multi architecture or multi or multi architecture essentially a support for wide definition of architecture Yes, we support that on the one level you can use different drivers per node. So for example Say you have only PMI somewhere you have only redfish somewhere you use different drivers The deploy images can be different per node again The pixie configuration a few speaks in a pixie can be different per architecture have CPU arch property in ironic nodes And you can configure the pixie environment to serve completely different configuration based on architecture of the node So it's not implemented out of box. What you could do have a service called ironic inspector It implements bare metal inspection and introspection in band by booting the ramps You probably know that I'm just telling for other people It has a feature called introspection rules That's a mini DSL which you define through API which can run on the data you receive and can do things with nodes So you can define rules that say, okay, I receive this vendor string set deploy image to this That can be viable if your ironic inspector is an option for you So the question is how to which extend the redfish out of bending by redfish based out of band inspection is implemented or supported in erotic It's it's implemented And it's merged. It's not yet released. I mean, it's it's it's in in master So but how usable it it is depends great deal of of the hardware because it depends on what properties of the nodes are exposed if you have a pretty rich Redfish Implementation in your BMC which exposes things like CPU clocking local storage, then all these things could be pulled by by by by ironic But so The question is will the world converge on redfish eventually or not? well, well Yes, the intention behind redfish is to to to make everyone using redfish But on the on the other hand the redfish is as a standard is designed in a way that everyone can implement its own Redfish redfish. So we are we may very well end up with a redfish school like many redfishes which Which are different and therefore each requires its own vendor specific driver and then we are back to where we are but where we were but Still well, maybe maybe vendors will converge and use the standards redfish schemas and implement at least the basic features of hardware management in the same way and Ironic could use the same driver for that. So we don't know So the question is about converging deploy methods as quasi direct and others to redfish So in ironic, there's some terminology confusion in ironic. We do distinguish deploy interface and boot interface so deploy interface is how you move the image there and Actually put it on the disk and redfish so far does not implement that I know there's hardware that actually does that out of band But it's to my best knowledge not part of redfish and definitely not standard. So I ska the direct And actually we have Ansible deploy and now we have RAM disk deploy Which is just booting a RAM disk without flashing anything to hard drive a minute of advertisement They are staying as they are now. I hope that boot methods we eventually Probably switch to redfish as a main one I think pixie will stay for compatibility for years to come, but maybe we'll stop being our to-go method anything Okay, thank you very much