 so are we good to start great so thanks for coming everyone my name is Dmitri I'm an engineer for Red Hat and I'm a core team member of Ironic and Ironic inspector and this wonderful Monday afternoon I'll be talking about Ironic inspector and bare metal inspection and what is happening there what happened during the Mitaika cycle what will probably happen in the Newton cycle and yeah this is for today I'll talk about bare metal inspection we'll explain how Ironic inspector works for those who don't know yet and I will talk about a few things that are not as obvious and not all people are aware of but they're very useful which we support and future plans of course future plans so bare metal inspection or introspection we can't agree on which work to use to be honest what is it okay I have a bare metal machine it has a lot of properties it's not like a VM which is mostly controlled by you you create how many CPUs how much memory and so on so on bare metal machine has a soul so to say hard-coded and some of these properties we actually have to know for the work and this is CPU architecture a number of CPUs memory size hard disk size and hardware addresses MAC addresses of network cards this is a bare minimum you have to put all this information in Ironic database if you want first three if you want scheduling to work the last one actually if you want deployment to work but the things are fixed and we can't figure out it from the machine itself so as the inspection process involves going there and figuring out these properties and maybe many more for other use cases Ironic separate inspection process via inspect provisioning warp and we essentially have two types of inspection first is out of band how we call it it's vendor specific it's use vendor BMC features for example HPA a law or Dell drag their management interface to figure out all this data so it's essentially it's one network request maybe a couple of requests to their API and that's all quick reliable the problem it's yeah it's very vendor specific and unfortunately the IPMI protocol doesn't really support doesn't really has support for these things it's gonna change with the redfish specification we include some of inspection as supposed to become a new IPMI but it's not yet there so the second option we have an invent inspection which involves actually booting some code on the machine you take a RAM disk boots as a machine with this RAM disk it collects information post it back you process it it works nearly for everything but at least for all enterprise hardware definitely for majority of non enterprise hardware just as well as minimum requirements unfortunately it's of course longer so boot time of a modern machine is several minutes at least sometimes not that reliable because of big C booting and so on but yeah we essentially have to support both and this second case is implemented by a project called ironic inspector which I'll be talking about in this talk a quick summary so ironic inspector project is under the bare metal project umbrella with ironic itself it's a separate service with its own API own client library and so on and here are a few stats we'll have relatively diverse community a lot of what happened by the way this numbers are from the mitaka cycle I forgot to write it here so it's not overall what we have how it all works on high level we have ironic Python agent that's a generic RAM disk for when it deployment is done by it as well so nothing surprising here it works for both inspection and deployment we set up a static DHCP server and most of our recommendations in Piper's support and in Dev Stack plug-in it's DNS mask but we're pretty server agnostic in this case it instructs all hosts that are not on deployment to boot IPA for inspection so it provides specific kernel arguments to say okay this boot will be for inspection please inspect the hardware then we have our service it which has its own HTTP API as I already said it manages success to the DHCP server essentially with IP tables and processes the data itself and as I mentioned we have our own client library in addition to what is provided by ironic how it actually works we require node to be created in most cases I'll be talking about more cases a bit later you only have to know power credentials it's IPMI address IPMI username IPMI password or in case of render technology like again draco I love it's I love password and I love address and so on then you start the in-band inspection you can use ironic API with inspect verb or you can use ironic inspector API directly ironic inspector API is used by ironic so it's kind of the same right now then what happens we configure for a vote to ensure that this node can boot from our DNS mask server that have some tricky magic to avoid clashes with Newton actually there set boot device issue power on request IPA boots IPA collects hardware information post it back we process this information we update ironic databases everything we fetched and optionally we store the whole data from the RAM disk in swift that's something not everyone knows but yeah we can actually store this data so that was the basics right inspection in ironic requires these four things to be discovered and we really discover them again CPU number CPU architecture memory disk make addresses five things actually but that can be more when you put a RAM disk on the machine you can figure out essentially everything that can be figured out from within it you can even benchmark it right why not do that so what we have the first thing is as a whole process is pluggable we have plugins on the server side where data processing happens and have plugins on the RAM disk site where that is collected so you can so usually if you want to do something smart you plug into both pieces so on the server side we can write plugins in Python we call them processing hooks you have to hook points first just after receiving the information so we can validate it second when we are making changes to the node itself why two parts well between two parts there is one more tricky piece we actually have to find a node in ironic database by the data so imagine put a lot of nodes they get introspection data they put big gson objects back but they don't contain node you ID so there is a lookup process which takes MAC addresses take IPMI address that we figured out from within the machine and tries to match so the first hook runs before that the second runs after that's the important difference so the first hook cannot update their own a node because we don't know the node but it can do validation in including it can help actually as a lookup process and the remt exercise we call them collectors they yeah by the way I didn't talk about the example right this is a simple example which extracted from our standard plug-in set it just if RAM disk reported error rise it during introspection the real one is a bit more complex but the idea is like that collectors okay collectors are built into the API image to fetch more information and unfortunately adding a collector involves a rebuilding your image what can be done we nibble in disabled collectors via the kernel command line once they are built in and here is the example the Python function it's updates the data record with something that's not a real thing actually I just made it up when building this presentation so if you try it and it doesn't work blame me just an example of one of plugins that we have at people usually are not aware of we have extra hardware plugins on both RAM disk site and service site it uses a hardware Python library and it collects some enormous amount of information I think people counted it was around 1,000 facts from the node as and we can then store it in Swift and then you can do something with it and that's what people actually do this is how you enables it enable a collector norm disk size in a processing hook and I was talking about benchmarking this thing can run benchmarks and add this to data the results of these benchmarks you can be smart CPU size memories output disk operations again we require one more configuration option to enable that because this benchmarking usually takes like two to three minutes and the more CPUs you have the more disks you have it will go linear but I have a very helpful thing you can use it next thing that people don't quite know about and it's very cool so everything I was talking about before is extending introspection by an operator so you'll install some plugins you configure your ironic inspector service to use them but what about the user what if you give a user of our API ability to extend the process and here is how we have API for things called introspection rules you essentially define a small snippets in JSON based domain specific language like that which will be run on every introspection so again the difference is this is user driven this can be created via API deleted edited and so on so no operator intervention is required I install wise simple example I know people actually use something like that if you find out that memory is too low fails introspection why not a bit more complex example it's again a bit made up but still can show some you are we can inspect the current system or board manufacturer and for example figure out if all disks are rotational are not rotational and set some capability on ironic node then after the capability is said it can be used by Nova for scheduling so this capability discovery is a big topic in ironic inspector we're really interesting in having more of that and actually triple is using it very heavily for process called profile matching that's how we call it is essentially is take creating these rules and adding a profile capability to a node which then is used for scheduling if you look in triple documentation that it actually is there with examples of such rules this one a lot of people are asking me okay introspection is good but what if I don't have to create a node before introspection this is something that people call discovery I was resisting it for quite some time due to numerous reasons but here it is in the Metaika cycle we actually added that the difference is you don't have to create a node record it will be created so you combine and rolling process with introspection process how it works very very high level you power on the node maybe manually just go to the server and press the button maybe via some CMDB IP my tool directly and next steps essentially the same IPA is booted data is collected data is sent back to ironic inspector then ironic inspector tries to find a node if it fails it creates a node that's not owned by default you have to enable that behavior actually and then for example you can use introspection rules to populate power credentials you can if you know the defaults for example I know the default credentials for Dell machines probably a lot of people here know them right if it if it detects that manufacturer contains Dell and this auto discovered flag is set by the discovery process always so we don't touch nodes that were not discovered that are already existing then we can set a driver ironic to most specific one we can set from inventory we can take BMC address and set it a drag cost and set the default credentials so with this introspection rule in place you can actually just use this node for deployment right after discovery for more generic cases of course you might want to go there manually to the node to populate IP my credentials or to use your CMDB for example to fetch them there's another interesting topic which you don't quite support yet and they brings me to future plans CMDB integration I know there are a bunch of folks who are really interested to be able after discovery or after introspection to go to CMDB with for example is a PMI address always MAC address which we detected and fetch all the remaining data from it and combine it as all whole process especially useful for discovery we come back with discovered data we need IP my credentials we need everything like that you go to CMDB you combine what is discovered in the band with what you have in CMDB and as a result you have a node ready for deployment essentially next ironic inspector is not H.A. ready as the current moment it can run only in one instance at least safely run if you run it or several machines several ironic inspector instances there are DHCP supportive clash and so on so big topic for Newton we wanted to become H.A. at least in the sense of you can have several instances of ironic inspector running in your cloud one is cute as a working introspection rules as I said we have big interest in the capability discovery using introspection rules and without introspection rules there are even talks about making introspection rules in a full featured programming language which a lot of people hate but maybe that's it it was it was not really short but I have plenty of time for questions if you have any thank you yes please what hardware has given you the most problems every hardware well our big problems is hardware it doesn't have reliable IP my access or something like that for some hardware IP my address cannot be discovered from inside the machine it also happens so yeah sometimes we have to apply workarounds here as some hardware have doesn't have a unique IP my address so we can't use it for low cup it's pretty a lot of hard actually of this kind so yeah we have workarounds in place for this particular case by the way so it's IP my bridging for example is using great somehow work around that any more questions yes please for the combination of a discovery with introspection how do you match when you have heterogeneous hardware you look for what you've discovered in ironic to see if it's already enrolled but how if they're all the same how can you differentiate between what you've discovered and if you know if it's already been enrolled or if you create a new ironic record for it so two things the same two things that we use for look up first make addresses ironic note should have make addresses in the ports database second thing is AP my address if it can reliably detect that also with the exception of some cases like I mentioned about it above it should be unique question yeah so do you have any plugins developed already that would allow you to pull hardware out of your open stack cluster run it through a bunch of burning tests and then reintegrate it back in so if you wanted to say run your cluster on repurposed hardware and then just continually test for performance or for any sort of failures that you might be able to detect through memory tasks or other hardware burning tests great question thank you that's one of the topic I should have put on the future plans probably ironic has a cleaning process despite its name it's just a generic set of operations that are run before deployments so my personal plan hopefully from Newton is maybe to have an inspector running in this cleaning process so when you decommission the note it runs through inspection maybe comparison formation is what we have in ironic maybe do benchmarks and fail something if you see benchmarks are too low you can create an inspection rule right to fail in respect of if these corporations is too low so yeah we don't have it there right now it's not too hard to actually make it happen so someone just has to do that okay this is a follow-on to just the previous question that was on this side so when you have a bare metal machine and you do some kind of work on it it could be just repair work but it could actually maybe they threw in some new drives or something or added another Nick how is that handled by ironic or ironic inspector and and what's the best practice for doing that sorry I think quite good particular like hardware upgrades like you add something so you've already you've already got this node in ironic but you need you add another let's say you add a bunch more RAM or you add another disk or you add another Nick how do you handle that and ironic because you've already got it in your database by it's like it's primary Nick or something it's indexed by something how do you how do you manage that change yeah it's not happening automatically so we don't have real-time update of that you can rerun introspection on the node just in the same fashion as in previous question right between deployments for example but yeah so yes the question was whether inspection can update existing records yes it will update the records if you were running the same node so I've got a specific problem I'm trying to solve where I don't know what I've got hundreds of machines and I'm trying to figure out which switch ports are connected to it the only way to figure that out is to turn the machine on can I use ironic sort of outside the scope of open stack where I can fire it up and maybe extra like bond a couple of ports so I can look at it from a switch side you know what I'm talking about instead of instead of using it to stage you know my open stack bare metal but can I use it just for the inspector well inspectors pretty bound to ironic you can use ironic and inspector standalone independent of the whole reminding of that stack that's true even without keystone authentication so ironic plus inspector are pretty much independent as to what you're asking about switch port discovery yes it's a big topic in a ironic right now we're working on tenant separation and for that task specifically we need to be able to detect which switch ports node is connected to it's not there yet I hope to land it in Newton cycle yep I think there is one proposed it needs to be updated but yeah yep so we just have to wire them in hopefully we can do it pretty soon anything else a couple more yeah I think we have time okay can I put my own tests in ironic can I extend it so you know you were talking about running a benchmark or whatever this back kind of the switch thing but can I run my own inspector function yeah this is these parts yeah for example this one is running on the RAM disk right I'm doing a pretty simple thing but you can actually the benchmarks are implemented the same way as a separate plug-in so we can run benchmark here this part goes to the RAM disk so built-in the IP a RAM disk this Python snippet and it will do everything you want yes please how do you need anything helps you deal with pools of MAC addresses that can move around and be dynamic on your host so we use a Cisco UCS right we have MAC IDs going all over the place yeah we don't quite deal with it right now probably should ask Cisco for more details on how it works I mean I've seen in some tools like this is kind of goes back but form and right you get MAC IDs will get caught up in your your fact tables you have to delete them manually you know if you recycle a MAC ID or something yeah I can imagine ironic doesn't deal with it I think just as well I have a fixed port database it's an interesting topic to consider I know some folks we're asking about what to do with it's not only about MAC addresses actually our other resources can be allocated on demand for such systems so it's not there yet please use the microphone one of the one of the options for inspector is what to do when it reruns on the same node and recognizes the max of changed yes so that might answer your question yeah thanks for reminding that actually we have a lot of configuration options one is controlling whether the overwrite information we can prevent overwriting of information so it's only adds new information but never removes existing one and yeah we have several options for how to deal with MAC with MAC addresses whether to remove ports that we didn't find which ports to add at all ports or all ports that got IP addresses only ports we use for PXC booting for example this all kind of thing can be tuned in the configuration file for an inspector yeah thank you yes please question here is do you actually support like in the multi-naked or nick bounding etc and also how do you do with like a multi-tenants network to make sure the network at isolation for VMSI we can use SDM for bare metal is there any way out there to do the same thing the great questions think I'll start with the second one because well it's the first one I'm not quite sure that's an answer about nick bonding we don't do it but for the second question yes for the second question it's a bit more interesting so the tenant separation is not ongoing work and ironic and inspector wants to support that but with one problem to be able to switch networks you have to know MAC address knowing MAC address is not a requirement for an inspector but then network switch won't happen so two options here first well you manually switch the node to provisioning network before before running inspection and second the second option if you have for example out-of-bend inspection support in your hardware but you need some additional features from inspector you start with out-of-bend inspection just to figure out MAC addresses then it's then you probably feel in switch configuration manually then you can use inspector this is not landed yet the support I have a spec up for that so probably again Newton is a goal for this work so yeah both answer to both questions is not yet sorry yeah yeah so what Jimmy is saying we have a session on a tenant's switching topic cross-project session when tomorrow so we can go ask us some questions there on this topic anything else okay thank you very much