 Hey It's all live now. Okay, cool. Good morning. Good afternoon. Good evening. Welcome to a very special edition of ask an open shift admin I am joined by some fellow Red Headers today. I'm technically not supposed to be here I'm supposed to be at a doctor's appointment, but they canceled on me all morning So some point time I'm gonna get a phone call and I'm gonna disappear. So Andrew, what do we got cooking today, buddy? Yeah, so Yeah, well good. Good morning or good afternoon everyone apologies for the technical difficulties it turns out that Andrew's still learning how this works and with Chris may be not being here the whole time. I figured I should be in the driver's seat. So apologies for that upfront technical issues So, but no, no, no, no, not at all. So yes, thank you everyone. Welcome to this week's stream of ask an open shift administrator So this is one of our office hours series of live streams, which means that we are here for you We are here to answer your questions to talk about chat about whatever it is That is top of mind for what you have going on inside of your environment So this week, I am very happy to be joined by one of my peers one of Chris's peers Reese oxenham As well as on the product management team one of our peers and Ramon I'm I'm not going to try your your last name because I know I'm going to butcher it and I don't I don't want to embarrass myself It's a CEO. It's it's it's kind of fishing. Okay, I see though So we are joined by Reese and Ramon in order to talk about bare metal installs And it's not just any bare metal installs because we've kind of talked about bare metal as an installation method, right? the non-integrated But what we haven't talked about before is bare metal installer provision infrastructure, which is what we want to focus on today So Reese, I'll let you introduce yourself and then Ramon if you don't mind following Yeah, of course. Good morning. Good afternoon. Good evening folks. My name is Reese oxenham I am the director of what we call the field product management team So we work very closely with lots of customers and partners worldwide to help them be successful With some of our technologies me and my team. We work very closely on You know, all things core platforms related so we do a lot of work around bare metal installations networking storage integrations and virtualization and those sorts of things and Hi everyone Ramon I see the Rodriguez and I'm the product manager for OpenShift on bare metal Which involves many things but many things but essentially what we are doing is trying to make bare metal Behave as a cloud provider trying to automate as we probably will discuss today all the bare metal provisioning and management as if you were dealing with any other Public or private cloud provider Yeah, so I know we have a bunch of exciting and interesting things planned not the least of which Thank you very much Reese is a demo of what all of this looks like and I know I want to talk about you know OpenShift bare metal the when the why and all that other stuff but I'm gonna hold off on that because as is Sort of our tradition with the OpenShift admin ask an OpenShift admin hour is our top of mind topics So if you're new to the stream, this is essentially a set of things that I Think Chris Chris and I think are interesting to you things that have happened in the last week or two Since the previous stream that might be relevant. So The first thing that I want to talk about as soon as I find my notes list here Is oh VMworld is this week? So if you are so I believe VMworld is free the generic pass anybody can go and join There is a tech plus pass, which I think you have to pay for But anybody can go and join VMworld you can look at all the sessions that they have going on there And importantly, I want to share the open shift centric See I'm in the doc Tuesday. Yeah I'm gonna quickly share my screen then So you all can see what I see So if you go to the link that Chris just very hopefully pasted in there, you'll end up at this page And shameless self-promotion one of these sessions has me in it so down here in this Integration considerations for OpenShift and VMware. So we did a panel session along with some of our IBM peers and VMware peers So Dean Lewis down here does a great blog. I think I've mentioned his blog before He published about doing the realized integration or V realized automation integration with OpenShift So if you happen to have a few minutes if you want to put that on in the background and listen to the various sessions There's a ton of information that's coming out of there this week Mm-hmm So the second thing I wanted to talk about is of course KubeCon I hope that nobody is surprised to Realize that KubeCon is next week I say that because I was surprised to realize that KubeCon is next week. It has snuck up on me You know, it's not like we're busy or anything But the good news is red hats as you would expect has a pretty substantial Substantial presence there not the least of which includes mr. Shortz. Hmm. Yes, I will be running around like a madman as usual No, actually, I will be there for a lot of day zero events I think I'm participating in three right now That's my cap and then I will also be hosting the cloud native TV wrap-up show on day zero as well that Tuesday next week Yeah, I know for our team basically everything has been getting dumped on Christian Hernandez because yeah He's he's local and so everybody knows that he's going to be there. So it's oh Christian will be there Christian I need you to take care of this Yeah, I've been trying to help him and I just feel bad. Yeah. Yeah So I see that Chris posted into the chat here the link to our red hat landing page for KubeCon So if you browse through here and you can see that this is a very big very in-depth Listing of everything that we have going on at KubeCon next week As a result of this I will also remind you that we won't be streaming next week So or at least I know we will be streaming other things, but this show will not be streaming next week So take that time. I would highly recommend that Folks invest that that time into listening to what's going on at KubeCon Remember OpenShift is Kubernetes and the things that happen in Kubernetes eventually happen an OpenShift as well So as we highlighted during the last stream with the Kubernetes API deprecations. Yes, which I have a short quote No, thank you. No problem. Yeah, we've even I forgot that there was a I should really remember these things It's not like I haven't yet like a dozen commons. This is one of the three. I should probably mention the three. I'm involved with so Obviously OpenShift commons. I will be around that and I think we're intending to live stream it It's very weird things are hard this year. I'll just leave it at that. It's logistics and it's LA. So yeah I will also be part of get ops calm which Is happening on this day as well, but it's like I think yeah cute the commons is in the morning whole day event where get ups cons like a half-day event and Yeah, there's a lot of other events that were part of as well like supply chain prom I think we're part of well, I I know Christian is is Intimately involved with the get ups con. So I expect a sponsorship check any day now For promoting that they get us doing the keynote. I forgot about that. Yeah. Yeah. Yeah, it's gonna be interesting So yes tons and tons of stuff going on with coupon next week again Definitely check out all the stuff that's going on there not just related to red hat but related to kubernetes in general There was another thing that I wanted to bring oh tomorrow Tomorrow is the OpenShift 4.9. What's new live stream? Yes So I got it. Thank you So as you can probably predict that means that 4.9 is not too far away And we will be talking about by we I mean the product product management team Will be spending time talking about all of the new features that are coming in with that So we'll be live streamed here tomorrow both across YouTube as well as twitch I know I will be here helping to answer questions helping to shuttle questions back and forth to the product management team So please don't hesitate to if you're watching to ask questions on whatever platform you're on We'll make sure to get those answered as best as we possibly can So and I know 4.9 4.9 10 11 seem to be busier releases I know at least in my world So I was on a meeting at 7 a.m. This morning talking about stuff that's happening in 410 and 411 So there's a lot going on and these sessions are really really helpful for keeping an eye on that Oh, yes, our hope nine. Yeah, we did do an astra presentation last week So that was a fun one working within that up guys. Yeah, so the Kubernetes deprecations are a 122 thing and more are coming so they will come into OpenShift as we update Kubernetes under the hood Yeah, and I actually so in that same vein I Deployed my usual cluster this morning and it turns out that I don't have anything in here that will break it But what you'll start seeing is I think it starts with 4.8 13 Maybe it's 4.8 14 But you'll start seeing a banner in here that says that it's not eligible to update to 4.9 because of certain APIs being in use So I do know and because there's now I think daily emails that go to the product management team like hey These operators still need this work done for 4.9 compatibility So I know everybody is working very very diligently in order to make that as seamless as possible So the last thing that I had to talk about today and it is very serendipitous that Reese is here is the poison pill operator So I happen to see this morning the doc team made an announcement that they have completed documenting a feature that was released with 4.8, but was basically unknown So the poison pill operator is Effectively a way of determining that a node is no longer accessible and then taking action based off of that So Reese I'm gonna I'm gonna let you kind of take over and talk about that because I know you have you're very familiar with it Yeah, sure not not a problem. So I just want to back up a little bit and talk about why these things exist So Typically if you're running applications on top of a kubernetes or open shift cluster You may run certain types of applications that you need to invoke some kind of remediation if for example the host or the node that was running that application Goes away by go away. I mean it crashes. Someone pulls the power up the back or you know It's some ticks a network cable out or something like that and Kubernetes will typically mark that node after a certain amount of time as you're not ready It's not heard from it for a little while But in some cases some applications They might have reliance on for example a storage volume and you need some kind of external Remediation to take place to recover that workload Now for a long amount of time you as a kubernetes administrator could go in there and you could recover that workload You by you know deleting the node It will essentially tell the kubernetes cluster you can relocate this workload on another surviving node and Over time we've added in additional automated remediation methods now if you are using Bare metal IPI so our integration which you know the typical the topic of this conversation will center around There was what we call an external remediation capability What that would do is it will monitor all of your nodes and if one of them went offline for a certain amount of time Which was configurable it would essentially fence that machine It would connect out to the out of band management machine be that iLoDRAC or whatever it might be and it would forcefully reboot that machine what that would mean it what That would lead to is a guaranteed safe relocation of that workload because we know that that machine has been turned off This is akin to traditional high availability concepts stoneth shoot the other shoot the node in the head to recover it Now this gave a predictable and rapid recovery of workloads on bare metal clusters Now what we are driving towards with the poison pill operator is kind of an evolution of that Where it can be run on any type of cluster. It doesn't need to have any external you know access to You know any out of band management platforms any apis, you know, it doesn't have to connect into anything to forcefully reboot a machine it comes down to the simple premise that In a failure scenario one of two things has typically happened So the first one poison pill tries to to consider as well if this machine has had its power taken out of it or you know The data center has caught fire or it's just simply no longer accessible There's a good chance that it's gone and it's not going to come back Let's say for example, the machine is actually still running It's just not contactable the poison pill steps in and itself fences itself. It turns itself off So in both of those situations the rest of the surviving nodes know that either it's definitely gone Or it has made sure itself is gone and that's why it's called a poison pill essentially that node is taking the poison pill and Sacrificing itself for the good of the rest of the cluster. So this poison pill operator Forgive me. I can't remember exactly when it is I think we had Some delays on making this available because it was a packaging of the operator and putting the operator into olm But this is as I said something you can deploy on top of your cluster requiring no external integrations You know not it can be run on on any type of any type of cluster and it's very very simple to set up and I want to point out that this is particularly I won't say important but relevant if you're using persistent volumes because if the node goes down if it's marked unreachable If there's pvcs attached those don't get released. So it'll hold up rescheduling those pods So the poison pill basically gives you a way of forcefully removing all of that and allowing all that workload to be rescheduled correct And that's what I was Saying as an administrator before we had these remediation capabilities You would run into that situation where to your point the pod that was holding on to that It's that persistent volume claim That would block it from the kubernetes scheduler from moving that workload to another machine and reassigning that pvc um So you as an administrator you could go in there and you could delete the node And by deleting the reference to deleting the node objects You're essentially telling the kubernetes cluster that I have taken Manual action. I know that that machine has gone away. I know it's safe for that workload to be rescheduled Now kubernetes will go ahead and do that. It's it's you know, it's assuming you've taken that action Um, you know, otherwise you might end up with some data corruption or data consistency problems But it's assuming that that action has taken place With both the external remediation that we can do with bare metal ipi and the metal cube integration Or poison pill it takes care of both of those So it's making sure that the machine has gone away But then going the further step of deleting the node automatically So that the scheduler can can you know reschedule those workloads And our hope nine. I see your question. Will the poison pill operator be able to communicate with the vSphere API? So no because it relies on the other functions of open shift So when resize it deletes the node it literally just uses machine operator You know machine api and deletes the node and then relies on the machine set to create a new one So it doesn't have to do any direct integration Sorry, just just just one more thing It deletes the node But it's assuming that that that machine or in that in the vmware example that machine is going to reboot It will reboot and it will re-register itself when the kubler comes up So it's it doesn't even need to do anything with the machine set that machine set the number will actually stay as it was All we're doing is telling kubernetes that that machine has definitely gone away It can reschedule the workloads then after a few minutes if it's if it's bare metal It might take longer if it's a vm. It'll be a little bit quicker But that machine will re-register itself inside of the cluster and then pods can rebalance themselves as and when required That's a good point. I think it's the machine health check is the one that will remove the node Whereas poison pill just Removes it and reboots it and then lets it come back right So you touched on a couple of things inside of their resets that uh that are Tangent to what we are are talking about today or or uh, not really tangential But they are very relevant to what we are talking about today so bare metal Is and I want to start by saying that well first I want to start by by acknowledging ben join us so Hey ben Sorry for taking a while to get to you ben No, that's fine. I was uh running late just had a customer call. Yeah, no worries. No worries. Um, so first Ben if you don't mind introducing yourself for all the folks who are are watching By sure. My name is uh, benjamin schmellis and I am a product manager here at red hat and Specifically on things like bare metal So that's kind of why I joined today Yeah, and ben is one of those uh super smart technical people that I use as an escalation point So just remember between the the three folks who have joined us today like andrew knows nothing All the hard questions get get passed along so um Yeah, so uh the first thing that I want to highlight is From andrew's perspective, we overload the term bare metal We use bare metal to refer to both an installation platform i.e. physical servers as well as an installation method which Is now I think we've gotten better about it in the docs, but it is now the platform agnostic or install to any platform mechanism so I tend to I avoid the term bare metal upi even though I know quite a few folks use that because upi still implies that there is A cloud provider involved bare metal ipi There absolutely is a cloud provider involved and I think that's surprising to some folks So with that out of the way, I wanted to ask, you know, remone from your perspective. I want to cover kind of two things. So one Where does bare metal fit in? You know, when should we be considering using a bare metal cluster or nodes that are deployed to physical servers? and then two How does that work? I I think you're you know, one of the main product managers for things like the metal cube project Which uh, Rhys just mentioned a moment ago. So kind of going from there Sure, so there are many reasons why we've been learning from customers. They want to have Kubernetes on bare metal Reasons like you have workloads that require access to say a gpu, right? And it it needs to be a gpu passed to the workload to do whatever they need to do So this is direct access to to the hardware Sometimes they similarly, uh, they need to process a large amounts of data through the network and Many servers these days come with um, sri ob enabled mix, right and You do the same but you need access to the hardware in order for your workloads to access this type of Of mix. So these are two examples very directly related to running workloads directly on on the hardware, right with no layers in between then Layers in between. Well, when you have open shift Kubernetes on top of another platform All of a sudden you are managing you have to manage two platforms, right at least the class the bare metal itself and Usually that's not a problem In fact, many customers come from using if you are on prem say open stack or aws or, you know, any other public cloud provider as well And then you start with Kubernetes then you install open shift and well, you will have made like A transition into this world through the multiple platforms that you are dealing with but then You realize hey and and this is direct feedback we have from from large customers having to deal with Very large-scale Classes for different purposes, right and they say look if I can get rid of One of the platforms that I need to manage once I am only Working with Kubernetes working with open shift, then my life is going to be so much easier So this there's that incentive as well on the other hand Which I seem contradictory Customers say some customers have told me look I'm not in the business of managing hardware, right? I mean the business of managing applications and managing hardware servers network equipment of any kind Is is difficult to say the least many times right in comparison Perhaps not to everyone but in comparison to managing software so We need to provide solutions for And these these requirements these these, you know reasons that different users have to use Be a metal and for example Using be a metal As a platform, which is what we do using and managing Needs to help with the managing of the bare metal itself Once you are working with Open shift on bare metal If we can help with managing this bare metal and here is where What you mentioned before metal cubed comes in or rather the bare metal operator that uses metal cubed right as as as the Layer that manages the Yeah, management of the different types of hardware So with all of these we provide a solution where the management of the hardware is probably Less complex than if you did everything yourself and do it Yourself involves Anita the HCP server. I need a pixie server. I need to manage Images to configure to install and configure the operating systems All of this is something that with bare metal API. We're taking care of Right. These these are things that will alleviate a little the problem of having to manage the the the physical infrastructure So yeah, I want to point out that that is Just like we have a machine api provider for say vSphere You know our hope none. You were just talking about that Of I can through the machine set mechanism Which uses machine api basically say hey VMware provision me a new virtual machine that I'm going to use for You know, whatever We can do the same thing in this instance with physical servers That's right. Yeah, and that's the magic that we have with Thanks to metal cubed as I would say the core technology behind this And metal cube using another technology with years of experience in managing hardware, which is ironic that comes from the open stack project And ironic over the years has specialized in managing it. You may think it's simple, but it's it's it's a lot of operations a lot of say Polish that we've been doing to it over the years so that In this business in this case of managing hardware We have got very very very good. I would say so putting these two technologies together wrapping them around with a band metal operator and Using this on they want to do the installation of our api and on they to to do maintenance regular maintenance of your hardware Of your hardware has has made these to behave as you were saying VMware or any other cloud provider that uses the machine api In fact, you know when you deploy with api one of the things that we do is we deploy the machine api and then the servers that are You know the the data structure is bare metal host So this bare metal host in reality is a machine Which in turn is a note to the eyes of open shift, right and the operations that you do in a note They are translated into operations that end up Being done in the physical node, right? Thanks to this operator. So Maybe it looks simple, but there's a lot of magic as I was it's not magic, you know, it's a lot of very smart engineers Putting their brains on this, but yeah So I don't risa. I know or I think you have a demo staged and I don't know if you or ben are able to answer Like when you look at the bare metal install process, you know, my favorite is usually to do like an open shift install explain there's a lot of things that go into that And that is effectively and I giggle a little bit rumon when you say, you know, oh, yeah managing hardware is simple well, no It's um as a as a human managing one piece of hardware It's relatively simple if time consuming, you know hardware takes a long time to reboot compared to vms Um, but when you typically have to touch it. Yeah But you know when we're talking about, you know Even ones multiple servers, if not dozens or hundreds, it's not easy at all um It's it's not easy at all And we've had to put a lot into the open shift bare metal project and all the metal cubed integrations to kind of bridge that gap I mean realistically if you look at um, some of the other platforms be that a public cloud Or be that you know on-prem virtualization There is an api that we can call we can say hey create me a virtual machine It needs to have, you know x amount of memory disk, you know nick layout It goes away and does it it also abstracts all of the underlying infrastructure away You know, it's got all of the networking and storage and you know, it does all of that for us So from a an onboarding of a new virtualization or a new public cloud platform It's relatively straightforward. I'm not going to you know say that our engineers that go and do all that integration that their job is easy I'm not saying that for one minute, but when we have to do this for for bare metal It becomes a little bit more of a challenge those um So if you were a customer of open shift before we had done all of this bare metal Integration work. Yes, we had all of the capabilities to dynamically scale our infrastructure Make use or make reuse of underlying storage and networking Efficiencies and push that all the way through into the open shift layer We wanted to provide a similar capability for bare metal, but there is no api for bare metal I can't just call out to an arbitrary api somewhere and say hey give me a bare metal machine And it has to have this convay can go go and do that And so that's where the packet folks might disagree with you where I guess Sure. Yeah, right, right But you know, I would say yeah, and one of the differences is that uh, say that you are aws Somebody's managing all these hardware somewhere, right? Probably thousands of engineers, but it's hardware that they know is hardware that they are familiar with It's hardware that needs to comply with a series of automations that they've been putting In through the years, but in what we are doing We need to deal with customer x y z type of hardware and that can be it's usually Common hardware, right? Well, not all the time, but usually it's hardware that you would know But it's hardware that each of them has different tweaks different Areas that you need to look after so that you can manage them in a standardized well way So this is what we've done and with metal cubed and the integration in it of ironic Ironic has drivers for iDRAC for Dell for iLO in hpe for IRMC and Fujitsu For you name it redfish, especially we probably need to talk a little bit about redfish, which is A standard say not implementation you can but a specification of how to manage hardware through apis Redfish in fact is helping us a lot It's not perfect just yet. It's implementation at least it's not perfect But it has helped us a lot to treat hardware in a very very similar way Regardless of which vendor you you have chosen, right? But all of this to say that all these difficulties in managing different types of hardware, which can be anything that your customer may have They are Solved to a degree, right? We work on this every day. Thanks to this technology All right, you give us a little note. We know how to manage it. You give us an hpe note We know how to manage that, right? I can tell you it's Light years better than what I used to do which was using a serial concentrator Telnet in and then access the serial console for for sun and then oracle servers and it was Absolutely miserable to try and automate and and get that stuff done compared to you know The the pretty impressive and amazing things that we have available today. It's true And by the time you have automated something for this specific infrastructure piece of hardware You get something new and you need to change the the the scripts the way you address The bmcs if they have Etc. So having this abstraction layer and with the years of experience that we have in this abstraction layer has been key for us to be able to Publish and release all this the software for managing bare methods so quickly So Reese I want to give you a moment to kind of stage the demo that you wanted to show and show us what all of that looks like And while you're doing that sharing screens and everything i'm going to answer Apologies for butchering names bachesh's question here So bachesh asks In simple terms kubernetes is simpler than open shift But what points would you use to convince a customer to migrate to open shift from kubernetes? And I think we all you know have different opinions or different perspectives on that I will say when andrew answers that question. I usually break it down to two things so one is test and Integration and most importantly support of all of the components that make up open shift So open shift is kubernetes. It has a creamy kubernetes center at the you know, that is the foundation And then we build upon right we add on a bunch of value to that So Think coro s, you know redhead enterprise linux coro s the machine config operator and all the things to manage that underlying operating system Um, you know a registry the admin gooey, right all all of that stuff that makes open shift open shift in not just kubernetes and importantly taking all of that and Testing it together Making sure that it's all going to work together Making sure that You know when you upgrade from version to version you have i'm not going to say a perfect You know 100 chance of it always working every time but a very very very good chance that it will And if it doesn't you can pick up the phone and you can call somebody um So the other one of those is usually you know kind of related to all of that which is To really you know red hats perspective is to go into production with kubernetes You need more than just kubernetes You know you need things like life cycle management of each one of those components to include the underlying operating system And you know, you need a logging solution. You need metric solution. You need all of those things which you know open shift's goal is to provide all of that So I saw chris dropped out. He waved goodbye. So thank you for joining us chris. We'll we'll see you next time So with that, I don't know if you all have anything to to add to my Um slight diatribe there. You're more than welcome to um or rice you're Happen I would love to see what what you've built Sure. Uh, well, I I would say uh, don't get too excited about what I've built but We'll certainly I have seen many of your demos and I always get excited for them. You do a great job. So Oh, I appreciate that. So I'm assuming that everyone can see my screen. Um, and that it's that the text is big enough Yeah, please let us know if we need to make it any larger for anybody who's who's watching Sure. So what you will see so this is a um, I was going to say this is a bare metal cluster This is a bare metal cluster. It just happens to be a virtualized bare metal cluster Um, so we follow the exact same workflows, but we just have some automation around Essentially simulating bare metal clusters inside of our environment for demonstration purposes and you know Testing at various different functionality. So everything you're going to see here, you know follows exactly the same code path There's any real bare metal cluster does but behind the scenes. I'm actually just using virtual machines for it So the big thing you will notice is that on the left hand side of the the overview page You'll see provider is bare metal So this is utilizing all of the bare metal capabilities and the functionalities that reman was just explaining So behind the scenes we do have metal cubed providing us with or providing open shift with an api to aggregate control and automate physical hardware Now one of the things you're going to see on the left hand side if you go into compute is a new entry called bare metal hosts and in here it will show us all of the various different nodes that we have Associated to our machine. So you have three masters and two workers inside of my environment now um, in here you have for example, uh management address because typically if you're going to do an automated bare metal installation with ipi You need to provide some additional information to the installer So it knows how to go out there and power these machines on. Um, you know set them to To boot up in this environment. I'm using more traditional ip mi with pixie But as reman was saying there is possibilities to use redfish as well now what I will show you is um The uh installation convict. Um, so this describes how the open shift installer Or gives gives all of the configuration information to the open shift installer so that it can run and there are a few very important details here So first of all if I go down to platform it is bare metal So we are specifically saying when you do this open shift installation Utilize the bare metal which is you know utilizes uh metal cubed behind the scenes to drive this installation Uh at the top. I've specified how many machines I want so three masters and Top we got two workers And then we have to specify All of the individual hosts. We have to tell it which machines to use And uh, so we have to specify bmc. This can of course as I said be redfish, but in my environment I'm just using ip mi A username a password and a unique identifier which in this case is going to be its mac address So when these machines power up they will attempt to dhcp. They'll pixie boot and they will be identified by their mac address We then say if you don't mind me interrupting you there, so It's possible for the installer to I think it deploys a vm on that host and then basically it'll host the dhcp and pixie services You need to load those nodes initially Correct. So in a very similar way to how most ip i installations work We when you run open shift install it provisions um A bootstrap machine Now this bootstrap machine will essentially help bootstrap the cluster And then it will hand over control and responsibility to the resulting cluster at the end But the way that we we set this up is that bootstrap machine temporarily runs As raman said open stack ironic behind the scenes and that's what we used or to make the provisioning of the bare metal machines So we spin up a really really small open stack ironic instance on top of that bootstrap vm That goes ahead and starts the bootstrap process off Pixie boots the machine. So as you said it starts up a dhcp server there It lays down the image helps get the cluster running Then that bootstrap machine steps back unless the cluster itself continue the rest of the process So that initial bootstrap process is really only responsible for the initial deployment of the masters Open stack ironic then gets deployed. So I got a rogue cat here trying to push my screen around The open stack ironic will then get relaunched on the resulting cluster It will then go ahead and deploy the additional workers, which is why on here you will see Um, the three masters are considered externally provisioned because they were deployed by the bootstrap vm And these two workers are provisioned because it it had this cluster has provisioned them in the the latter part of of the installation Which to be clear is how all ipi works right the with the cloud providers the installer open shift install is the one that provisions those resources into You know aws or vc or wherever it happens to be So it there is no machine sets yet for control plane nodes correct, correct Um, and yeah, that's why you'll see, you know, we have a machine set here just for the just for the worker and we just have two of two So Reese while I have you paused for a second Um, I'm going to rewind you slightly to the poison pill operator. Will it work with upi and ipi or is that ipi only? It will work on any type of cluster. It can be upi ipi typically with bare metal we have um We have the external remediation through metal cubed so you can set up a machine health check which will monitor the machines and then when it's goes over a certain period of time the um one of the other machines will Contact, you know, this ipmi address of the failed node and will force it off And so it can short circuit that remediation process So this is really great if you want high availability And one of the really good use cases for this is for running open shift virtualization or cnv Let's say for example, you have a virtual machine and it's critical for whatever reason There's one of it and you need to know that that will get restarted very very quickly You essentially have two choices going forward. You have the external remediation if you're running bare metal Uh, running bare metal ipi if you're using upi you now have another option with the poison pill In that again going back to what I was saying earlier Either that machine has gone away and it's definitely gone away or it has consumed the poison pill So in both of those circumstances after a defined amount of time You know that again, it's either gone because it's caught fire It's you know, the power's gone or it has consumed that poison pill so it can relocate That that bm Let's hope there's no fire. Um, no Absolutely right zp to uh, to quickly answer your question. Um, how are entitlements calculated? Uh, so entitlements you you only need to entitle Compute nodes That where you're running your applications at so control plane nodes Unless you have marked them schedule and I have your applications running there You don't need to um entitle them same thing with info nodes And I'll I'm digging for the link for the subscription guide as well as our hope nine. Thank you for posting that That link Okay, so, uh, I think I've pretty much shown everything I wanted to um in this It's very similar to any other ip I configuration you know specifying the vips that you want to use but the big the big Change here is when you specify the hosts, you're actually telling it what to use and how to contact These things behind the scenes now, um, what you will find on a bare metal cluster is there is a There is a pod in open shift machine api And this is where you have the metal cube pod And so inside of here. This is where all of the Cluster management the bare metal management actually happens So inside here you will find there are various different open stack ironic pods And these are the ones that actually manage, you know, when open shift has to say You know restart this machine again from a remediation perspective or I want to add a new machine It has to go via open stack ironic to do it. And so we just provide this as an interface essentially translating all of the well known well loved open shift features into You know give us the ability to do the same on on bare metal as well And just to prove that um open stack is actually running behind the scenes um, I can I've got an open stack client pod here And if I just uh, this is literally just a a pod that's running on top of my Open shift cluster that just has the open stack command line tools installed and um, I've generated a um A convict here I just grab the password Which is just a secret and uh, it's always running on a very specific ip address And I just paste it in the password there I can do, um Open stack bare metal node list And you will see that all of my machines are actually just open stack ironic instances I like that you're highlighting this because it's it's a reminder that this isn't net new This isn't something that we just created, you know, when was bare metal 4 5 4 6 You know, it's not it's like a decade old at this point Yeah, exactly. And so we have a lot of experience in managing bare metal. We haven't reinvented the wheel here We knew that we needed a component to handle the You know, just the basics like turn the machine on and off set it to pixie boots Clean the instance so that they can be you know reuse should we want to and ironic has all of these various different Drivers these interfaces these features it makes sense to reuse those capabilities And ironic here is standalone. It doesn't have any of the other open stack components We just use it for the bare metal management piece and so so yeah It's absolutely reuse of of components and it works really really well for for this use case No big scary things like under clouds and over clouds like you might be familiar with if you're an open stack customer No, indeed. So so yeah, we have all of these obviously power status is is got over ipmi And then that's exposed through metal cubed into The open shift ui. So if we go back down to bare metal hosts And we were to say go on to let's just take our work at one We have the power state here. We have, you know, various different information About it. It collects insights and information about it It also gets additional information because we've inspected this machine before we know all about its network interfaces So let me let me dig on this because I don't think I've ever noticed that there's a bare metal hosts item That gets added to the menu over there with bare metal ipi. So This is effectively like this gives us insight into The hardware that we're running on absolutely and Again, the big drive up behind this is to provide an equivalence You know for customers that have experience in running on top of you know any other infrastructure with aws vm Where open stack we can get that insights We wanted to provide that same experience with bare metal and that's really where metal cubed and the open stack Sorry, open shift ipi. It's really bridging that gap Yeah, this is um, this is it completes Without even realizing it right at least for me it completes the whole story of you know with open shift I manage the applications. I manage the kubernetes. I manage the underlying operating system and I manage the underlying nodes kind of regardless of where those nodes are deployed at that's that's super interesting So what I will show you is um, well, let's say I've got this cluster now and I want to Add more more capacity to it. I want to add more nodes Well, if you are using a typical provider such as again, um vSphere aws open stack or whatever it might be You'd simply go in here. You'd you'd edit your Machines and you would you know add another one and it would automatically spawn that Now this works in exactly the same way inside of bare metal However, if you look at my bare metal hosts, I'm using them all. They're all provisioned onto So I have an additional Machine that I want to add into this cluster. So what I have here Is just a file with two resources You know resource definitions The first one is a secret and this is essentially a username and password For the BMC of my node and then I have a new bare metal host object. I give it a name. I give it the The the MAC address of that particular machine I give it the IP MI the credentials are pulled from this secret I've defined and root device hints again just say and pay deploy on this disc Which of course is useful if you have real bare metal machines and they have 10 discs. You want to make sure I deploy onto this disc So what I'll do is I'll just um I'll See apply bmh and I need to put it into open shift machine api So I now have defined To this additional work and you'll see immediately in the in the ui. I have this additional machine It's registering. So it is going to go ahead and it's going to attempt to contact That machine it's going to populate it into open stack ironic. Look, you can already say it's see it's going into the inspecting mode So it's going to power on this physical machine. It's going to collect information about it and it will sit there waiting um waiting for us to consume it by increasing a machine set or If getting it ready to to actually be used Now this obviously takes a few minutes to do It is a vm behind the scenes as I've explained earlier with we're kind of faking bare metal here just for convenience Um, but this would it would follow through the the exact same process if this was real um real metal Um, so if I go I assume that inspecting is it querying the ipmi or the redfish interface? Correct. Yeah If I go into my bare metal node list again You'll see that we now have this worker three. So it's populated inside of ironic It hasn't yet provisioned it, but it's powering on because its state is Inspect wait. So we're just waiting for that machine to actually Do something now. Hopefully Let me see if I can bring that up here I forget that your workstation is a monster Yeah, I'm actually connecting to machines in the data center here, but uh It is a pretty good machine, right? So this is my worker three the one. I'm just provisioning And so you'll see that it's booting up into the ironic The ironic python agent environment At which point it will essentially get it ready so that it can be consumed and it will stay there Just essentially just just waiting for us to uh to to use it. So if I check here now it should be it's probably Okay, it's still inspecting. I said this this might take a few minutes to do but all I want to really do is Show you us scaling this machine set and it eventually becoming part of part of that cluster So I'm looking through the questions here. Um, our hope nine Thank you for pointing out that you can get a developer subscription So if you have a red hat developer account that also entitles you to the same 16 cores Of open shift that you would get with, you know, 16. I think you get 16 rel licenses and all that other stuff Um, I haven't checked recently There was an issue on the back end when you were changing if you log into the console and go to associate that developer license with your open shift cluster There was an error with that. I haven't checked on that in the last few weeks. Um, hopefully they've got that remediated I'll I'll follow up on that Um, and then oh master light, um was asking about using fedora Because I I guess they noticed that you're using fedora Reese So I switched back and forth between mac os and fedora, but there's a lot of folks in red hat who use fedora as well as What we call the the rel csb the corporate standard build So what you'll see is a lot of red hat folks. We we run our own operating system. Um, you know, kind of kind of consistently and frequently Yeah, I I think I've been using fedora Since at least 2006. So it's just second nature to me To to use fedora. So it's Yeah, yeah, and this is this is also on my my own personal workstation Yeah, I have so right now I'm on my mac because that's where all of my streaming stuff is attached to but right over here I have a fedora workstation that I use on at least 50 of the time um ZP ask, uh, are there any additional crds that need to be installed or is it uh out of the box support? So you have to deploy um bare metal ipi So all of the resources you need Um are automatically deployed and configured when you select bare metal as the platform that you want to use as part of your installation convict You don't have to define any additional Um resource definitions to do this. Um, I applied these this bmh in the using the cli, but you could just as easily go in here and use I could use the dialogue and it'll last me what last me the details You know, I could put the mac address in and how I wanted to do it And it'll you can do it via the ui. That's absolutely fine. But no to do this Um, the platform has to have been deployed with bare metal ipi You can't take a bare metal upi environment and turn it into an ipi environment Um, that sounds like some experimentation that ben might have done at some point Um, essentially So i'll i'll point out that Like with vSphere upi or rev or no not rev or open stack or Basically all of the other upis except for rev When you install or you deploy a cluster the cloud provider is still there So if you do vSphere upi, it's still the vSphere cloud provider So after deployment you can add in machine sets and you can dynamically, you know, scale nodes Just like you would if it had been deployed with ipi rev for reasons they Choose or they choose They they decided to use uh platform equals none or a non-integrated install when doing their ipi um But bare metal upi and the reason why kind of up front i said bare metal upi doesn't really exist because There is no cloud provider With bare metal upi and you can't mix cloud you wouldn't get any of this. Yeah, so The the reason why that's important is you can't mix cloud providers and that's a kubernetes limitation So effectively you can Technically you could add a node. So let's say I have a non-integrated install on physical servers, right cloud No cloud provider provider equals none And I wanted to add some nodes that have a cloud provider of bare metal They would probably join the cluster But then would be evicted after I think it's no more than five minutes And basically kicked out by the underlying mechanisms So it you can't mix those it's an upstream kubernetes thing. I've got the link around here somewhere Because it comes up all the time Whilst you're doing that andrew what I just wanted to do because this might take a few minutes I know we're running up against the clock here Now that this machine has you know past the inspection stage it's going into an available state so this is kind of sitting in a pool waiting for me to assign it It's it's no it's not in my nodes. I still have my My three masters and two workers what I'd need to do is go into my machine sets I can scale this and say three And what that will now do Is if I go into my bare metal nodes, it'll go into provisioning state So at this point it's going to write the coro s disk image to the physical disk And it's going to reboot that machine and it will join the cluster just like any other new node word And I would you know dynamically scaled my environment with with plenty more more more resource If I look at this, it'll probably now show that it has an instance It does instance and it is going to a deploying state and again We're literally using open stack ironic to drive the deployment here because it has well mature ways of doing that And we're literally just using a very small component of open stack to to do this and Just to reiterate and I think you already said this of While those nodes are in that kind of waiting state You can use that with the machine autoscaler and and all of the other functionality Absolutely you can yeah Lsi, how do you manage to always use the latest coro s pixie image? So you you don't um with ipi so with Open shifts and ipi whether it's you know review ipi vcr ipi Basically when you deploy the cluster it'll upload a template virtual machine Any new nodes when they are created by machine sets get created from that template And the first thing that happens when it boots is it'll it connects to machine config operator and machine config operator says Go and download this new coro s image and lay it out. Yes. Same thing happens here Basically, it it boots up to coro s. It connects to machine config operator. It says hey go in You know, it's rpmo s tree so go and apply this image And it does exactly that So the good news is with coro s So long as you are on the current or older That's the behavior if you're on a newer version it gets weird So Yeah Yeah, so just to add to that in the install convict I point to a very specific image that I want to use and I only do that because I've got it cached It's just to speed up the deployment process, but by default it would just reach out to Wherever the installer version the manifest says to go download it and to andrew's point that image will be essentially persisted by Open stack ironic So it'll always deploy that coro s image for the life of the cluster But again when it comes up it will be instructed to download the latest Version that's available So it'll only use that slightly older version for for a little while and it'll always be pushed up to the latest version Very very quickly. Yeah, almost right away So we Go ahead Ben Can we back up to that? Install config just so you can clarify so you've mentioned that you've got it set up because you're doing it for caching purposes But customers that were using this in a disconnected environment also would have A similar string in there to point to their local cache copy of The arca's image, correct? Correct. Absolutely correct. Yep That actually you bring up something really important Ben which and we just got asked this earlier this week Bare metal ipi like all of the other installs works disconnected too Absolutely. Yeah Yes, it does I don't think I've So does this function is this a part of Like the zero touch provisioning stuff that we're starting to see come with a cm. Is is all of that related It is to a degree because zero touch provisioning uses ipi to deploy the initial cluster And then obviously the problem that it shows is very specific, right? It's provisioning without any touch. I guess that's why we are calling it as your touch provisioning, but it streams line it streamlines the whole workflow From the plowing your first cluster that will be your cluster Say hub cluster, right of clusters that will help you to deploy the rest of the nodes With ztp, so it's it's all related In reality And then one one last question that I hope doesn't get me in trouble Are we do we have any intention of being able to use this for a single node open shift? So Single node open shift is a very specific use case. And no, it's not going to get you in trouble, Andrew but first off we One single node open shift many times because we don't even have space in the data center for more nodes Customers are saying look, I cannot even install open shift in a cluster of three nodes, right? Which we support as well with IPI what many people call a compact cluster So You need a provisioner node to deploy Open shift with IPI, right the node that RIS was showing us and and described for us If you have sno and you need a node to provision sno single node open shift You're kind of defeating the purpose Right that doesn't mean that we're not looking into it because we are actively looking into that the bootstrap Machine that bootstraps the entire cluster with um IPI This is something that we want to get rid of right we want to do um Yeah, an installation that doesn't depend on a provisioner node even though having said that with The current IPI workflow you can reuse this Provisioner node as a worker. So say that you have five nodes You're not gonna be using a four nodes cluster just because you need the provisioning nodes and so that's there But with single node open shift, it's a very specific use case that we need to look Into very, you know, specifically to solve the problems that the customers are telling us they want to solve with it Okay We'll reset I bought you enough time that the road that the node made it into the ready status Yep, the machine came up and we have now dynamically scaled our cluster. We have additional capabilities And I think the last thing because we're we're just a couple minutes over time here So the last thing I wanted to highlight is like you can use, you know red red fish or the other mechanisms It doesn't have to be IPMI with this private network for pixie and all of that and it'll just use I think in that case, it'll attach an iso directly to the node using the red fish, you know api and it'll boot that way So even if your network is less sophisticated, you know, your network team won't give you a dedicated vlan or something for this provisioning network Um, you can still absolutely, you know, take advantage of all that um, so Sorry, was there anything to uh No, I I wanted to say that this is important the point that you just made because uh, this is what we call red fish using virtual media With virtual media, we are mapping that iso remotely through the vmc and through uh, red fish and as you said, we don't need to ask For vlan for provisioning or even a network for provisioning nothing all we need is Access to the nodes from wherever we are doing the provisioning and that can be you know, multiple hops l3 As opposed to having to be l2 adjacency with with pixie So that that that could be important for for many scenarios And then the other reason to use red fish is that it's uh, as I was saying before What we see as the standardized management api specification and implementations by I would say all vendors that you know, uh to to manage hardware in a more consistent way Standards are are always great. Although I always think of the xkcd about you know, there's there's 14 standards But it doesn't meet what we need and soon there's any standards. Yeah So so we're a few minutes over. Um, I want to be respectful of everybody's time Thank you very much Reese been and Ramon for joining today. Um, it's been an absolute pleasure having you all on I know I've learned several things To our audience, thank you so much for joining today Really appreciate the conversation the questions if you have anything that you weren't able to get out Here during the stream if there's anything else that you would like to ask Please don't hesitate to reach out to me. You can contact me on social media at practical andrew on twitter Just like you've seen me chatting Across the various platforms or you can always reach out to me via email Andrew dot Sullivan at redhat.com. I won't I won't have you all toss out your contact information necessarily Um, but I'm I'm more than happy. The worst thing that happens You know, I've heard from a number of you the worst thing that happens is well I don't know the answer but let's track down the right people who do and we'll get those answers So don't ever hesitate to reach out With that being said Please tune in tomorrow. Uh, just a reminder tomorrow We will have the what's new and open shift 4.9 live stream happening that is at I want to say 10 a.m. Let me bring up my calendar real quick here If I can type and talk I don't know I know all of you all redhatters, you know, um, steve steve is has this really crazy talent where he can I've seen him present holding a microphone in one hand typing with the other hand and talking completely separately before And it just blows my mind like I can't type and talk at the same time because my fingers say what my mouth is or vice versa Uh, so yes tomorrow at 10 a.m. Eastern time apologies for not converting time zones Is the what's new and open shift 4.9 session Next week, we will not have a stream here Please do be sure to tune into all of the kube con stuff that's going on And I will go ahead and repace that link in the chat here And last but not least, I hope everybody has a great rest of their week and please stay safe out there Thanks folks Thanks everyone