 Good morning. Good afternoon. Good evening. And welcome to another episode of the open shift administrator office hours. I'm here Chris short executive producer of open shift TV principal technical market manager here at red hat joined by my teammate. Andrew Sullivan welcome Andrew you just got off a customer call how's it going. I am I am happy to be here as always. Yeah, it's it's been a crowded morning. Yeah, my crowdedness is beginning now so I appreciate you joining. So apologies to all of our, all of the people who are watching who are listening for my my tardiness. I was on a call that ran unfortunately a couple of minutes late so I do appreciate you sticking around and waiting for us to start. It's, it means a lot to me it makes me happy on this cold rainy, maybe snowy Wednesday here in here in North Carolina. It is actually very sunny and cold here today. Like, I woke up and it was 16 degrees Fahrenheit which is several degrees below zero Celsius. So, yeah. But it is now warmed up to a beautiful negative six Celsius, which is 21 Fahrenheit. Yeah, good stuff. Well, that, you know, I'm in the Raleigh North Carolina area, which, despite the fact that we get one or two snows every year we're always completely and utterly unprepared for it. And it, yeah, you used to live in this area as well. And it's already causing, you know, massive amounts of panic. I haven't been out of the house in like a month and a half. Oh, wow. Thank you, COVID. So aside, aside from, you know, going on taking the dog outside and going for a walk and stuff like that. So I can only imagine that the grocery stores right now are completely out of bread and milk because, you know, milk sandwiches are important. Everybody needs milk sandwich, yeah. So, alright. I won't do to my my tardiness I won't waste any more time with the, the small talk that you and I can do at any time, you know, right, we do work on the same team after all. Yes, indeed. So I don't think that there were any follow-ups from last week. I don't recall any. I know there probably was that I am forgetting. Um, I'm trying to get better about writing notes to myself to actually cover those things, but, um, yeah, it was funny. I did a, uh, a, ask me anything session for our field yesterday and came away with like almost a dozen questions for it was, wow, I don't know the answer to that. I'll have to follow up and I'll have to help out. So let me get back to you on that one. Always fun, uh, which is also interestingly Christian is, uh, who is supposed to be our guest today is in today's session or the today's version of the same thing I did yesterday. So we'll see how it goes for him. So I did just see a, a slack pop up right underneath your name here that it said Andrew was right from Christian. So new things this week, um, and I specifically wanted to call out in case, uh, like me, you filter emails from red hats, um, which I know sounds strange as an employee, but you know, we, we have a bunch of distro lists and one of the noisiest ones is the Arata and security one because every time we release an Arata, I get an email about it. Even if it's a product, I have no interest in. Um, so very importantly, we announced a CDE this week. Um, so I'm going to, you know, like, okay. I'm ready for this one. Oh yeah. Well, yeah, I've already got it in my pocket. So yeah, saved there, but yeah, I'm going to share my screen as well. This is an Arley one folks. You're going to want to update SUDU at the very least, like right now. Um, yeah. So I know, you know, this is the open shift administrator office hour. Um, and at first glance, you think, well, SUDU, open shifts, core OS, so core OS does have SUDU in it. And yes, even though everything is deployed as a container, we still rely on the security mechanisms of the underlying OS to be intact. Right. If you deploy a FIPS cluster, it relies on, for example, those, you know, FIPS encryption libraries at the OS level to write to provide that functionality. So if you scroll down into this security advisory and we get down here to the potentially impacted products, you'll see this red hat, open shift container platform 4.6, et cetera. So what I would say is, and I don't have any additional information beyond this or what's inside of here. I just wanted to make sure that our audience here is aware of this and is paying attention. Thank you. So a couple of things, make sure that your container images are up to date. Right. If you're using UBI, if you're using a rally image, you know, make sure that you update those regularly and continuously update those as possible to keep an eye on the open shifts update channels. So I strongly suspect that we will see an update for CoroS or an update for open shift, which will include an update for CoroS to address this, but I do not know the timeframe for that. I do not either. I know that there's a 4.6.13 release in the fast channel right now. I don't know if that includes the pseudo patch or not. It does not. So as soon as I can remember where I'm talking to here, I know. So if you go to the cluster settings tab inside of your cluster, so you can click on the fast channel and save here and it will come up with this 4.6.13. So there are release notes. So this is the release notes for 4.6.12. However, if I scroll down here, there is a 4.6.13 bug and security update, which you notice was released two days ago. So this was before the CVE was released, et cetera. Right. And I looked through, I took a cursory list through this earlier this morning. I did not see anything related to pseudo inside of here. So typically we release these updates on a bi-weekly cadence. So every other week, every two weeks. So that would mean that 4.6.14 would ship. Andrew's guess is probably a week from this Monday, a week and a half from today. Right. And I haven't looked at, so there are nightly releases. I have not looked at at those to see if it's been addressed inside of there. I'm not sure if anybody would want to run a nightly release. No, probably not. There are some and I scroll past it up here for rail nodes. There are some workarounds and mitigation factors that you can do inside of here. I haven't evaluated these to determine whether or not they are suitable for CoroS. But at a minimum, please be aware, please keep an eye on the information and of course, keep an eye out for an update to OpenShift and which will include an update to CoroS to address the security vulnerability. Well, thank you for covering that. The biggest thing is just right now, if you have anything public facing with pseudo in it, make sure it is patched and updated, including container images. So please make sure you do that. Yes, post haste, as they would say. Yes, very, very important. So cool. So the so I've got two other things that have popped up strangely more than once in the last week. And these are very random things sometimes. So Monday, there was an internal thread that was basically asking, how does a CoroS node get its host name? Oh, good. And yesterday, a second thread came up that was saying, why is my CoroS node getting the wrong host name? In particular, they were using DHCP and DHCP. DHCP host was handing out host names. So when it when it DHCP does CoroS even look at that? Yes. So I've done a fair amount of research. I'm waiting on some responses on the engineering side to see if I'm right or not. So I mean, I know CoroS response to DHCP. I didn't know if it picked up the host name or not. Yeah. So the the core of it. And actually, I forgot to grab the link to the code where I had this up. Let me stop sharing and see if I can grab that because it's on the same screen by history. Yeah. So the core of what we're talking about here is when there's a service inside of CoroS that runs on startup that basically uses host name control to set the host name. Nice. And it pulls that host name from Proxxas kernel or Proxxas kernel or wherever inside of there. Yeah. So the question is, how does that value get? Or how is that value? Assigned. Assigned, changed. Yeah. So for and I will say I have only done in depth, if I could speak, research on vSphere. But effectively, it is pulled from the virtual machine name. So there and there's a bit of a hierarchy here. So if I'm doing IPI and I use IPI to deploy VM machine sets, new virtual machines, what's going to happen is the machine sets and machine API will create a new VM, you know, host name, cluster dash, randomness dash, worker dash 12. Right. That is the name of the VM and vCenter. When the VM powers on, VM tools is used to determine that name and feed that first host name that it uses. Interesting. However, that can be overridden in a number of different ways. So one would be if I have something like ignition that runs, right, and puts something into Etsy host name. One would be if I provide the IP equals and then IP address, basically I'm setting the IP address at the kernel, so not applicable for IPI, but definitely applicable for UPI within the case of vSphere. Another one would be if I have DHCP handing out my IP address. So the host says, hey, DHCP, give me an IP. And it says, I think it's DHCP option 12. So it says, you know, hey, here's your IP and your host name is this, right. So that will override that setting as well. And then last but not least is reverse DNS. So for example, if I am doing DHCP and not doing dynamic DNS updates, so CoroS, like, like RHEL and most of the other Linuxes, if it has a host name, when it DHCP, since the DHCP request, it will also send its host name with it for DDNs to do an update. If it ignores that update, if there's something that's already existing there, et cetera, so that the reverse DNS is something different, that name will take holds, right? So that will override it. So why is this important? So a couple of different reasons. So the biggest one, particularly with IPI, is it determines which CSRs to auto-reprove based on that host name. So when the node is created, right, machine API creates a node, gives it the host name, it comes up, it pulls down its ignition config and goes through initial configuration stuff. And then when it comes time to join the cluster, there's an operator that says, there's a CSR for node named XYZ. I see a node trying to join, or I created a node named XYZ, rather. Those two match, I'm going to approve the CSR. Nice. If they don't match, then it's going to say, I don't know who this is, I'm not approving that. Madness, chaos, what is this? Yeah. Which makes sense, right? Sure. You know, you don't want random nodes joining your cluster. Yeah, that, yes, that's really bad. Yeah. So, you know, you can go in and manually approve those CSRs. Right. You know, that would bring the node in, that would get it up and running. Long-term ramifications of that, non-random ramifications of that, not known to me. You know, every time you do a certificate refresh, I don't know if it's going to require manual approval. Remember, open shift will manually or automatically renew certificates for those, for nodes that it knows, because that one has a mismatch. You know, does that apply? I don't know. So, and of course, node autoscaling, which the whole point is, I want it to do it by itself. I don't want to have to go in and tell it to do it, would now require manual intervention. Right. At 3 a.m., when, you know, the, everything's going haywire and it's trying to scale up from one node to 400 nodes, now you got to go in and, you know, get out of bed and come into the office and approve all of those things. Oh boy. Letting open shift do its job. So, again, long-term ramifications not quite clear to me, and I wish I could find this thread that I was looking for. Oh, here we go. Wait for it. I know. Yeah. I still got to find the right link. Got to do any carnival music. What do you want? It might help. Do-do-do-do-do-do-do-do-do-do-do-do-do-do-do-do-do-do-do-do. Here we go. Yeah, all right. Thanks. I couldn't keep that going too long. Ran out of notes, did you? Yeah. I don't even know how that's going to sound after, but yeah. Okay. All right. So, I will paste this in here and find our Twitch here and paste the link in there as well. So, this is, as we can see here, or maybe you can see this is probably a little small. I'm in the machine config operator GitHub repo, specifically the 4.6 release, and I'm underneath the templates common base units. So, the way that this works is machine config operator includes a number of files, a number of things based on the infrastructure that you're using and the deployment type that you're using. So, common and base effectively means that it's going to be included in every node regardless of infrastructure that's happening. So, real quickly, if we go to, for example, worker and select, you know, I'll do the zero zero. If I'm deploying to vSphere, it's going to include these. So, if I'm doing a worker deployment to vSphere, right, and it'll include this MDNS, and basically it determines if this is an IPI deployment, then it'll output this data as a part of that. So, this is kind of how we do specific actions during the install process or doing the node stand-up process based off of the infrastructure and other things that are happening there. So, let's jump back to where I was before. So, in every machine that is deployed, we'll end up with a unit, a service that runs, that effectively just runs this setValidHouseName.sh script or it sources the script and then runs two functions, wait and then setValidHouseName. So, that setValidHouseName.sh is here. And I'll open a new tab for that and I will paste that into chat and I see some questions coming in. There's a couple of questions, yeah. I'll finish rattling or prattling on about this and then we'll address those. So, this wait local host, you can see all it's doing is saying wait until proxys kernel host name contains something valid, i.e. something that is not local host and then return back that value. So, the question is how does this get populated and whatever populates that or the last thing to populate that becomes what ultimately sets the host name that is returned back for this particular host. So, again, that could be something early on through VM tools that's setting the host name, could be DHCP, could be reverse DNS, there's a number of different ways that can all be determined. Interesting, okay, cool. Awesome. All right, are you ready for questions? Yes, please. Okay. First question. Do we have any plans to develop a PowerShell module to administer OpenShift? You're a PowerShell guy. Yeah, I wish. So, I am a PowerShell guy. I've been a PowerShell guy since the very first versions of PowerCLI with VMware. I was one of the co-creators of NetApp's PowerShell commandlets and one of the advocates over there for them. So, I am a PowerShell guy and unfortunately, as far as I know, there is no intention of doing that. I have done a little bit of research in all my copious amounts of spare time. Could I take the Kubernetes API, the OpenShift API and effectively auto-generate some PowerShell commandlets, and test some of the modules based off of that. But I haven't actually tested that or tried that. So, I think there is some community-based Kubernetes modules for PowerShell. Again, I haven't had time or haven't had the opportunity to check those out and test those. But I believe that there are some community modules for PowerShell and Kubernetes, which remember OpenShift is Kubernetes, so they should quote-unquote just work. One of my, one of the things I daydream about in my copious free time is having an opportunity to test those and showcase some of the capabilities there. Which I think is particularly cool if you're a PowerShell fan. Most of it is going to be .NET Core, which means that it also runs on Linux. Which is nice. So, no real plans that you know of. Probably not going to happen. But entirely possible to make. Yeah, I'll poke around. I'll see if I can poke around and find a Jira issue on that. Yeah. Like an RFE. So, did we finish the disconnected OLM episode from a couple weeks ago, or is that the one where we hit the bug that we couldn't fix? We did finish, so yes and no. So, Update Manager was where we found that it was not fully released yet. But for OLM and the rest of it, it was functional and should work as expected as far as I know. And I tested it in my lab. I think we do have a live stream with Christian and maybe myself where we spent the entire hour or two hours covering that. So, we can... I'll dig up that as well. You notice I'm taking notes, so I don't forget these things. We're included in the show notes. So, if you didn't see Last Friday, I think we published the show notes blog post on openshift.com slash blog. So, any of the links and other things that we used last week, you can find inside of those blog posts. And this week will be the same. I haven't talked with Alex. I don't know if we'll publish those on Thursdays or Fridays or when that'll be, but just keep an eye on it. No, it's a very cool thing you're doing. And I greatly appreciate it as it probably helps more people and as we do them more and more, it will help even more people as we go. So, look for a follow-up in the OpenShift blog, which I just linked to. Next question from... I'm going to try and say this one because RAP Scallion Reeves is just something that just rolls right off the tongue. Is there a way to control what node gets deployed onto which overt host? Let's say I have three smaller machines and three larger machines can have the masters run on one small... on the small ones and the workers run on the big ones. Yeah, so real quick, so I see it was Killer Goley who was asking about disconnected OLM. If there are things that you would like to see or are missing, please reach out. Just let me know. Andrew.Sullivan at redhat.com and we'll be sure to specifically cover that next week if you let me know. Yeah, if there's something specific missing, we'll cover it. I also thought I saw somebody ask if they can ask about OpenShift, but not related to the stream. And yes, of course, you can always ask us anything at any time. This is an office hours, so yeah, feel free to ask any questions as always. I need to have like an office hours short command. I'll make up something real quick. So controlling node placements. So this will apply as far as I know to all of the IPI deployments. And it is essentially there is no specific mechanism to control where in the cluster a particular virtual machine lands, whether it's rev, whether it's vSphere, whether it's OpenStack, et cetera. So you can go after the fact and apply those rules. So create an affinity group for create an affinity group for the host and an affinity group for the virtual machines and manage them that way. Actually, now that I think about it, I wonder if you could assign a group to the template that it uses. And in the machine set, define so associated with the template so that way anytime a new machine for that template is created, it's going to be automatically inherit that rule and therefore have those placement options. That's an interesting one that I haven't tested for that may be worth researching. So I'm going to bring up my rev manager instance here. Someone said daemon set question mark maybe now probably not for this. So if we come to our and now I have to remember how to do this because it happens so infrequently. So I think if we go to so this is my red hat virtualization manager environment. You can see I just went to the cluster and my cluster name here and I'm looking at affinity groups. So I can create an affinity group and assign VMs to hosts and I can create these rules at the same time on virtual machines and I'm just going to edit one of these virtual machines. I can with open shift or excuse me red hat virtualization 4.4 I can specify affinity groups and labels directly in the machine definition. So what I'm thinking out loud having not tested this at all is I wonder if you could for the template that's used with IPI basically specify this information so that any VMs created from it automatically inherit that. So for master nodes you would effectively dynamically provisioned. You would go in and assign this information as a day to operation through red manager to assign them to the specific hosts and then for the worker node each worker node machine sets have a template that specifies whatever that affinity information is. If you happen to try that out please let me know whether or not that works. Yeah that'd be interesting. I would be very interested in that and you know we could do a blog or something on openshift.com to talk about how to do that. Yeah. Okay so the next question is from Islam. What is the way to calculate sizing based on knowing how many pods we are creating and I mentioned to him right like you know it's very dependent upon the needs of those pods but if you know like you have 500 pods is there like a magic number for number of worker nodes or something like that? Yeah. So that's the that is the topic of today's session and I know that we're like 30 minutes in. We're 30 minutes in, yeah. We're just now getting here. So this one may span more than one episodes but a lot of this came out of so I started some work last year I don't know October November of last year around creating a sizing white paper and all the things to take into accounts which has now resulted in me also doing a presentation for IBM fast start around the same topic. Oh lucky you. So yeah it's a fireside chat I think I was asking you about putting a fireplace in behind me like you've got. Yeah. So it really comes down okay so it comes down to a couple of different things and by a couple I mean it varies based off of what you're doing. So first I want to use or I want to explain two terms characterized and uncharacterized so a characterized workload is one that we know and understand and we know what its resource requirements are so for example I have a Java application and I know it's only going to use at most one CPU one core or 1000 millicores and the JVM heap max is set to two gigabytes so I can authoritatively say this pod needs one CPU and two gigabytes of RAM so with that it's relatively straightforward to calculate you know how much CPU and memory am I going to need for my workload I've got 500 pods that's 500 CPUs that's one terabyte of RAM right so now the next step is so how do I translate that into nodes and this is where it gets a little more complex it's going to get real complex so the first thing that we need to understand is what is the maximum number of nodes that I can have and what are the kind of supported configurations of those nodes so in the documentation here and I will link this page we have our tested cluster maximums and be careful when you browse to this page because there's actually two sections there's one up here that is tested maximums for major releases where you see the 4.x tested maximum is 2,000 nodes but when we dig down into minor versions you can see that the 4.6 tested maximum is actually 500 so just be aware of that I don't know or I don't recall if this is if 2,000 is the supported maximum not just the tested maximum or if 500 is the supported maximum I haven't read this page in enough detail or asked that question so we may just need to double check on that but we want to look at importantly the maximum number of pods per node and then whether or not there are any size restrictions or limits so for example continuing on down the page here you can see what are the AWS instant sizes that we test with so things like how much CPU how much RAM so forth this is not the list or the only supported instance types these are just the ones that we test with so essentially what I'm trying to discover here is is there anything that would artificially limit or change the number of nodes or number of pods that I have in my cluster right if I've got a pod that needs a half a terabyte of RAM right needs a half a terabyte of RAM that can pretty dramatically change how I size my nodes and how I interact with my cluster so let's assume in my first example there one CPU 2GB of RAM it's pretty straightforward right 500 pods easily fits within you know a reasonable node size even though we wouldn't want to have just one node for availability purposes etc so now we can do kind of a mental exercise so what happens if I have two nodes effectively I will have two nodes each one being equally sized so 256 CPUs 500GB of RAM from an application perspective now what happens if one of those nodes needs to go down we patch something we update something we change the config and MCO machine config operator needs to reboot it now that one node has to host all of the workload so I now really have to have two nodes and each one is capable of hosting the entire workload at any one point in time so let's expand it out 3 nodes 4 nodes 5 nodes 8 nodes 10 nodes 12 nodes effectively what you're trying to do here is figure out what's the right balance of distributing the workload across the nodes in your cluster for maximum performance and maximum availability and maximum flexibility flexibility here is an interesting one and is one that is quite subjective so flexibility here could be well I'm only ever going to take one node down for updates at a time so the other nodes only need to have enough spare capacity extra capacity to accommodate that it could also be failure domain my failure domain maybe I'm running in a physical data center you know on premises maybe it's running in I'll pick on rev right I've got four massive rev nodes each one is I don't know 8 terabytes of RAM and you know 500 CPUs and I could easily fit you know 30 of my open shift nodes onto those four hosts okay but what's the failure domain because now if I have one physical node that has 10 virtual nodes I haven't solved that problem I have to be able to accommodate that amount of infrastructure failing at any point in time so we have to be aware of those things we have to work with our underlying you know infrastructure underlying service provider if you will to understand what's happening there and be able to accommodate that at the infrastructure level we also from an application perspective want to be aware of what those failure domains are if the application is architected so that there's a single pod that is a single point of failure that could be bad so and then none of this planning around failure domains etc is going to be particularly useful so I've basically or in a nutshell over the last six minutes we've talked about okay workload sizing but workload sizing is only one component of node sizing and I see this chat scrolling I'm not looking at it I'm answering those long questions he's got follow-ups and there's more questions coming in so feel free to tell me when you want the next one so node sizing also has to accommodate not just the workload but the other things that are happening so what are the other things that are happening so cooblet itself you know the other kind of services so think things like CSI if you have monitoring agents so maybe you're deploying data dog or something like that inside out of there the open shift metric service all of these things are going to consume additional resources on the host by default open shift only reserves I think it's one half of one CPU so if we scroll down here and I'll post this link as soon as I make sure it's the right one yeah here so platform tested cluster and then so this is the link that I just posted a minute ago as of 4.6 half of a CPU 500 millicore is reserved for the system compared to 3.11 and previous versions so if you expect those system level right open shift function or services to consume more than half of a CPU you need to take that into account so in particular metrics Prometheus can be a huge consumer of CPU and memory on the host now when does that happen the more pods the more containers we have running on that host the more efforts CPU memory is going to have to be put in by Prometheus to collect all of those metrics and then serve them back up to the metric service so it becomes a little bit of a self fulfilling or what did we used to call a traffic trombone if you've ever heard that term on the networking side the more pods I put on the host the more non application resources I need on the node to accommodate the other things that are happening don't discredit things like network and storage traffic as well especially if you're using iSCSI PVCs and other things that are known to consume CPU resources at high throughput I've got 40 gig going into my servers and I've got all of these pods with a bunch of iSCSI PVCs and you know they're pushing 30 gigabits of traffic that's a lot of CPU that's going into processing those packets and doing the things that it needs to do so we just need to be aware of that plan for that accommodate all of that type of traffic and of course it's okay if you don't get it right the first time there's nothing wrong with that that's the beauty of Kubernetes we can add nodes in at any point in time so we can kind of temporarily scale out with bigger nodes and then go back up and remove smaller nodes so that we can consolidate those down back to the failure domain back to the number of nodes that we prefer okay so alright cool I tried to explain to Islam how to do the math essentially for his worker nodes wants to put the workloads on there so those worker nodes have a baseline of system requirements and then your workloads have their requirements plus you probably want a little breathing room just so stuff doesn't blow up if something like a pod is being removed as another one is being added kind of deal so I said 20% for example add that all together and you have your node sizing exactly I just wanted to add that excess you said 20% of extra overhead that number is dependent on Ender's opinion two things one burst capacity for things like node failure and two burst capacity for things like the slash dot effect or the reddit effect or something like that Superbowl add whatever where you just had this huge burst of traffic and how can I help accommodate that how to actually determine that number is based on again my perspective your ability to react to that scenario what do I mean by that if you need if you can react you know auto scaling will take effect and it takes four minutes for me to get a new node up and operational and join to the cluster and ready to accept workload then you need enough capacity to accommodate four minutes of burst right if it takes you three hours you might need more capacity it takes you three days you might need more capacity we used to deal with this when I was a storage admin all the time right set your alert thresholds at 95% well if I'm growing at X bytes and it takes me six months to get new hard drives in I'm gonna have an issue in four days based on my alert threshold that's not gonna work right I have to have this balance of how quickly can I add capacity and then work backwards from there to determine what my alarm threshold should be cool so next question how about sizing the three node cluster the compact clusters right like those when GA for bare metal four or five I think so yeah and to be clear that's the bare metal installation method not just physical servers yeah yeah yeah so the minimums for compact clusters are effectively the combination of control plane and worker node minimums so the bare minimum for a control plane node is four CPUs and 16 gigabytes of RAM and that is if we go to installing and we go to excuse me bare metal I think it's in here yes so control plane four CPUs 16 gigabytes of RAM 128 or 120 gigabytes of storage a compute node two CPUs and 8 gigabytes of RAM that's the bare minimum so bare minimum is add those two together so six CPUs 24 gigabytes of RAM and probably a 200 gigabytes drive you know storage drive for compact nodes but note that that doesn't include any workload so however much application capacity you need add on top of that now that's two CPUs and 8 gigabytes of RAM here I think it's safe to always build on top of that because you're going to have the metric service you're going to have you know those other things that are deployed inside of there that are consuming resources as well co-located you know hosted on those nodes so if you don't have a dedicated infrastructure nodes to host Prometheus and all that other stuff you have to accommodate that capacity here cool make sense and thank you for that JP Dade says he got his 469 problems figured out it was a CSR issue it looks like one of the nodes wasn't joined or didn't have its certificates approved so that's good CSRs are half of the bane of my existence yes so JP Dade says when in doubt do an OCC get CSR good point that is a very common troubleshooting step that I use is everything issued right so the three node compact cluster you mentioned bare metal installation method and then the follow up question from inception X was you mean you can do this on vSphere yes you can so this is so Andrew has issues that I know my team is well aware of we've raised these of we overload terms when we talk about totally yeah so IPI installer provision infrastructures also called full stack automation UPI also called user provision infrastructure or pre-existing infrastructure those are fine those are great we understand that there is those installation methods for all the various platforms bare metal is where it gets confusing so I tend to use when you see me especially in written communication I will refer to with the documentation calls and let me scroll up here so with the installation calls a bare metal install including basically all of these this entire subset installing on bare metal I call these non-integrated installs you can use this bare metal or non-integrated install method to install on to basically anything what it means is that there is no integration between open shift and Kubernetes with the underlying platform that's the that's the take away from this right like that's the big gotcha so if you're deploying to vSphere and you use the bare metal or platform equals none in the install config then essentially it's saying I don't know that this is vSphere I don't care it's vSphere I have no integration with vSphere whatsoever so you can't use things like the dynamic storage provisioner you can't use things like NSX all that other stuff it is infrastructure agnostic right so this is the installation method that you want to use with physical servers that are not IPI so bare metal IPI is also the installation method that you want to use when you are doing a mixed infrastructure deployment some nodes are virtual machines in vSphere some nodes are physical servers I can't mix those infrastructure types and that is a Kubernetes limitation not an open open shift limitation hilariously because this comes up I have this this github issue bookmarked I love it I just posted the github issue into the chat so that's the github issue for Kubernetes that prevents us from mixing infrastructure types in a cluster damn it's still open too and it has been for a while life cycle frozen, milestone 119 okay 121 is being worked on right now share this one out and get some more eyeballs on it alright so I see a question is there a virtual ram do we support swap space so this is a yes so generally Kubernetes always recommends that you disable swap space if you've ever installed a cluster with like kubadmin or something like that it'll say swap space is not disabled and it will force you to do that before you continue right so why is this important because yes technically you can turn on swap you can turn on all of those other things open shift virtualization has brought this to lights do I want to turn on things like kernel same page merging KSM to help consolidate and get more over commitment of those resources so this is a choice that you have to make but I can tell you why it is strongly discouraged in the Kubernetes community and that's because Kubernetes doesn't know when those resources are being over committed so for example my host has 16 gigabytes of RAM and it's struggling it's hurting it's swapping out it's sending memory pages to swap and application performance is just suffering but using that swap space Kubernetes isn't aware of it so it just looks at it and says your memory pressure looks fine you're like 80% or 85% memory pressure so it'll keep assigning workloads to it which just exacerbates the situation it keeps getting new pods it's masking this underlying resource contention issue so you want to be very careful anytime you use something like swap or other resource over commitment technologies on your notes it may not ever result in anything bad happening but it could also result in very bad things happening so I'll take this a step further and say that over committing at your hypervisor level is equally dangerous for the same reasons basically if the hypervisor you've got your hypervisor node that's running your Kubernetes nodes the hypervisor is way over committed it's swapping or it's having to vSphere has CPU ready time when it's waiting for time and can't get time on the CPU so the vSphere or the hypervisor is really hurting Kubernetes doesn't know that so it's saying I need to autoscale because the application maybe whatever metrics you've got set up on the application saying everything's slow I need to scale up I need to make my application perform better so you end up with this kind of whirlpool right this circling drain of application needs to scale up it's adding more resources the underlying hypervisors can't accommodate those it's just making everything worse and it just leads to disaster badness so the recommendation I always make is if you must use over commitment make that over commitment happen as close as possible to the application right so with what that means is whichever scheduler is closest to the application in this instance Kubernetes open shift let it handle that over commitment don't do it at the hypervisor and Kubernetes don't do it at you know multiple layers you know so on and so forth right cool so we are approaching the top of the hour um let's see so somebody dropped ocp vSphere upi automation project is more easy to use in the IPI way is what our friend questioning about the three node cluster size says cool yeah so as far CSR approvals go is it kind of just a blanket role I believe so killer goalie says I believe UPI requires manual CSR approval IPI will approve them automatically correct is yeah okay just want to confirm that um yeah there are scenarios so for example open shift I think it was 4.4 4.3 or 4.4 we introduced automatic certificate renewal so if you used an early 4.x version you remember you deploy the cluster and then within 24 hours it would rotate the certificate and if you shut down the cluster within that 24 hour period before it rotated the certificate and then you turn it back on after that period of time everything would just be in chaos and it was this long complicated process of going in and reissuing and re-approving certificates and getting everything back up and running so we fixed that it now does automatic certificate rotation and approval and all those other things except when it's the cluster's been down for a very long time I'm talking weeks sometimes you will need to go in and just re-approve those CSRs and re-approving those CSRs basically to get the nodes joined again resets that whole process and then it'll do itself so it's much much easier than before yeah it used to be a bear and now it's a little bit easier you're right um there's a lot of chat so sorry if I missed something uh there's one question do we answer the about the scale up versus scale out we didn't ask that yet right okay would you say it's better to scale up worker nodes or scale out vertically versus horizontally scale up makes more sense to me but scale out means I don't have any configuration changes right so it's just adding another node for example um and you know my answer to that was kind of like it really depends on what you're doing right like what is faster in your instance right if you're on AWS and changing you know like Ram is pretty the you know interesting yeah um and just quick sometimes right but there is your your system has to be able to acknowledge the increase in memory and put that into play which it really depends on the infrastructure at that point right yeah well I say both infrastructure and application um and this is true and it is a strategy that can change over time um so maybe initially it's scale out if you've only got two three four nodes scaling up increases that failure domain where effectively I also have to keep an additional extra capacity on the other nodes to accommodate you know node failure for that burstiness or that burst of new workload in the event of a node failure awesome um so initially maybe it makes sense to do scale out instead of scale up on the other hands you know if your application has fundamentally changed you know hey we thought that the largest pods we were going to have to accommodate were you know and I know they sound an awful lot like VM sizes because sometimes they are an awful lot like VM sizes you know two CPUs and eight gigabytes of RAM but really you know after running for a couple of months the app guys figured out that we really need 16 gigabytes of RAM you know that can dramatically change your strategy of hey I still want to keep X number of instances of the application per node so now I need to scale up scale vertically to keep my ratio in check so that's one thing that I have not discussed at all and this is a concept that was introduced to me in the storage world and it's called stranded resources oh so with storage we hear of stranded resources when I have a IOPS to 10 gigabytes mismatch right so spinning media is really good for this or is a really good example I can have 10 terabytes on a single hard drive but that hard drive can only deliver 100 IOPS so if I only need 1 terabyte of storage and it's consuming all 100 IOPS I now have 9 terabytes of gigabytes 9 terabytes of capacity it's basically inaccessible I can't use it I don't have the IOPS to deliver that right and flash media SSD and especially NVME almost have the inverse problem and this is why you know particularly storage vendors that do deduplication compression and stuff like that see a big benefit from flash media because it concentrates IOPS onto those that media and that media has much higher IOPS per gigabyte density so packing more of those in is beneficial for the media so the same thing is true with virtualization with Kubernetes with OpenShift right of I need to understand what my from an application perspective what my CPU to RAM ratio is so that I can then accommodate that in my node sizing so let's look at the example I just used right two CPUs to 8 gigabytes of RAM so if I need you know when I'm creating my nodes if I deploy a virtual machine that's so that's what I'm going into for CPU to RAM so if I deploy a virtual machine that has 8 CPUs and maybe 48 gigabytes of RAM right that's that ratio is off right I'm going so 4 to 1 or 1 to 4 8 CPUs 48 gigabytes of RAM so 8 times 4 is 32 right so I would have a 32 or 8 CPU 32 gigabytes of RAM virtual machine virtual node in my OpenShift cluster to be able to effectively accommodate that workload with 48 gigabytes of RAM I've consumed all of my CPU but now I have an extra 16 gigs of RAM that's basically inaccessible as a result of that so you want to be cognizant of those ratios and keep them balanced so that you don't strand resources accidentally so JP data asks wouldn't you spread the data across the nodes or is that more of an HDFS thing which I believe like each node has to have its own set of data or access to the same dataset right so I think it's going to depend on the storage type right for one so for talking OCS OCS does distribute data across the nodes right I did and I'll I have the link up here I know everybody's still looking at my stream screen so you get a lovely picture of Chris and I Chris talking and me not paying attention I call that a Wednesday so we did talk about storage or sizing storage for your nodes so I'll include a link I'll post it here in the chat real quick but I'll also include a link to that in the show notes so if you want to go back and listen to that episode where we talked about sizing the disks that are used by OpenShift nodes to maybe help cover that awesome cool so we are at the top of the hour we have another show coming up here in 30 seconds OpenShift Commons briefing just going to include the team at Kong if you're or wait nope yes Kong if you're familiar with the folks at Kong they have the big gorilla logo so yeah we're going to be jumping to that here in a few seconds so thank you all for joining thank you for your questions Andrew check through discord chat and see if you missed anything and please I don't know if you've got a thing for discord please feel free to join discord ask questions at any time also please if you have a question that we didn't get answered today follow up social media practicalandrew on twitter or andrew.solovan at redhat.com definitely don't hesitate don't think twice about sending us a message yeah and if you join the discord you can any time and someone will randomly call along and get you an answer it's pretty cool yeah yeah all right well thank you everybody see you here in a few seconds