 Let's get into the big topic of open source, something that we actually have in front of us. This is so awesome. We are an open culture. It's exactly what he said. It's that process that a developer or, let's say, as the Kubernetes ecosystem really brings. Welcome to this week's Ask an OpenShift Administrator livestream. And I hear something coming from my computer, so I'm going to have to mute it real quick here. I don't know where that came from, but hopefully nobody else heard something playing in the background. There isn't an echo of us here, Johnny. I don't hear it. If you hear it, it's totally clear to me. Good. I'm glad that it didn't impact us at all. Well, happy Wednesday. It's another great red hat day here. Unfortunately, here in Raleigh, it is gray and disgusting. If I look more washed out than normal, it's because most of the light in my room is coming from the monitor in front of me instead of from the windows. It's sunny. It's like in the low 40s here and it's supposed to get to a balmy 66 today. So it's going to be a cool one here. Yeah. Well, anything below 70 is cool in Texas, right? That's pretty much winter. Well, hello to everyone who is joining us. Hello, Khalid. Thank you for the welcome. Our hope nine as well. So today, this week, we're going to be talking about a couple of OpenShift 4.10 topics. So these are things we briefly mentioned them last week. And they were talked about during the what's new session by the product management team. But we wanted to do a little bit more of a deep dive on these because they can be a little bit more complex. So I've done some testing. I've done some things inside of my lab. Hopefully we can illustrate some of these points. That being said, I will in full disclosure say that Andrew is not a networking expert. I've been many things in my life. A network administrator is not one of them. I have kind of an architects level understanding of, you know, like deep down networking concepts. So don't let that stop you from asking any hard questions because I would very much love to go back to engineering and the super smart folks that are working on these things and take those questions and maybe even bring them on to the live stream in the future. So we can talk about all that stuff. So if you're curious about if you're interested in, you know, say we're going to be focusing on NM state. So NM state is a desired state engine configurator. That's the word I'm going to use for Linux networking. And in particular, we use it with core OS and OpenShift. We'll also be talking about metal LB in particular with OpenShift 4.10 metal LB ads and L3 mode or layer three, which means that it's really we're working with BGP in order to do that load balancing. So we're a it's going to be an interesting day. And we've got a number of other topics that we can talk about during the the top of mind section. So Johnny, now we're chatting about that beforehand of we're going to one of these we're going to need to be careful to not eat up like an entire show with so. Oh man. Yeah, for sure. All right. Well, so my normal introduction here, right? This is one of the office hours live streams here on Red Hat live streaming, which means that we are here for you for our audience. Anything that is on your mind any questions that you have about anything OpenShift related, we will do our best to answer those questions. So please don't hesitate to submit those. So whatever platform you're on use whatever chat window you have and we'll do we will do our best to answer those. However, sometimes we we don't know the answer. I'll say a lot of times we don't know the answer in which case we take that back. We go and find the SMEs, whether it's product management, whether it's engineering, whether it's anybody else inside of Red Hat and we'll get those answers and publish them either through the blog posts or here on the live stream itself as the case may be. Kamar, when is OpenShift 4.10 GA soon? I will say that. So if everything goes according to plan very soon. I always tell people, you know, internally at Red Hat, it's folks can find specific dates that English that English that engineering publishes. I always tell folks that those dates are subject to change at any moment in time, like literally at any point in time, the QEQA process can find a blocking bug that will push out a release date by, you know, anywhere from a day to a month. And it has happened before. So assuming no major issues are found, we should see 4.10 drop drop soon. So yeah, thank you for asking that and karate chop. Thank you for your very kind words. We appreciate it. Although Johnny and I are just sort of the we're the tip of the proverbial ice iceberg, if you will, right? We're the faces that you see. I'm so, so sorry for that. But really there is, there's an organization of, you know, thousands of people behind us that help with all of this, whether it's our individual teams, you know, which my team or extended team has something like 30 folks on it now all the way back through product management, which is another 50 or 70 folks and engineering, which is hundreds and hundreds of folks, right? It's not just us. So, and of course, Stephanie is here to help us as well with all the things, all the amazing things that she does. I don't think I mentioned her last week, but she's a, she's a tremendous help. She's, she's the one. So yeah, anytime you see on whatever platform you're on, if you see like OpenShift or Red Hat OpenShift responding, that's, that's Stephanie in the background. She's keeping us honest and posting, posting links and all that other stuff, keeping us, keeping us focused, if you will. That's right. What is it? Adult onset ADHD. That's right. Yeah. Our hope nine caveat for non US viewers this weekend is, oh yes. Thank you. That is actually one of the notes that I had to talk about today in that. So yes, to our hopes nine points. Our hope nine's points. Daylight savings time takes effect this weekend here in the US. So what that means is that we will spring forward one hour so for those of us in Europe, we will be one hour earlier in the day for you. So yeah, it's, we always go to 11am Eastern in whatever time zone we happen to be in or that happens to be. So like Johnny's in central time, for example. Cardi chop, if time permits at the end, I would like to hear about my lab setup. So I'll be showing a little bit of that today. So I said most of the stuff here in my lab, either nested or well actually all of it's nested today. So I can talk a little bit about that. My lab is pretty small and pretty humble and usually a jumbled mess of things. So as it should be. Yeah. As any lab is. All right. So just yeah, please any comments, any comments you have, drop them in chats, restream puts them all across everywhere. So wherever it is that you're at, we will see those. We will do our best to respond to those questions and answers to that. So let's talk about our top of mind topics. So the first thing that I want to talk about here is something that we've talked about a couple of times with the deep dive. So we've done like the disconnected deep dive. We've done a couple of other topics around like creating, managing, you know, updating disconnected clusters. And in the last week or so, I've had or seen a number of folks, probably four or five different questions, either in chat or email and various other places, asking about how do I convert a, a disconnected cluster back to connected or how do I disconnect a connected cluster? So if we look in the documentation here, I'm going to share my window with the documentation. There we go. So if we look in the docs here, we have this process, which describes exactly what you need to do in order to do that. Actually, this one is connected to disconnected. So the short version is this mimics exactly the disconnected install process and that you create a mirror cluster, a mirror registry with all of the images that you need and all that other stuff. And then effectively you go through and so mirroring the images here, I'm scrolling down, configuring the cluster for the mirrored edge. So the first thing it's going to do here is have you set the global pull secret for any policies that you need or for any credentials rather that you need for your mirror registry. Add the certificates for that mirrored registry and then create an image content source policy that redirects everything over to where you need to be. So if you've ever done the disconnected or mirrored install, this image content source policy is exactly the same thing. And in fact, one of the commands that's further up above for mirroring the images, it spits out the ICSP that you need to do that. So I bring this up because in particular and I scrolled right on past it on purpose, you'll notice that this process is tech preview. So the process of converting is not a supported process. It's technically possible and I've done this a number of times, but it is not one that the support team is able to help. So if you're trying to disconnect a cluster and something's not working, effectively the support team doesn't, they can't help with that. So if we take this process and we reverse it, that is how we connect a disconnected cluster. So usually the biggest thing is updating your pull secret. So if you deployed disconnected and used just a private pull secret without your red hat pull secret. So update the pull secret and then remove the image content source policy. And then that'll point everything back at the red hats registries and it should behave as normal. The only other thing that I think is here and I don't see it on this list is if you have changed the update server. So let me switch over to this tab. So here in cluster settings, when it looks to where to pull these updates from, you can actually update that. So you can change it to be, and I've had several folks ask, even for connected clusters, can I point that at something that's not valid so that I can prevent some, any random administrator from going in and just applying updates. I can control when it sees an update available. Upgrade path, yep, thank you. So yeah, if you've changed that to point at say the update service, the disconnected update service, you would just want to change that back to the defaults pointing at the red hat hosted update service. Okay, so that covers that one. If there's any questions on that, I know we talk about disconnected freely frequently. I'll say that this one is near and dear to my heart because again, I do a lot of cluster deploys and I found that it expedites things pretty significantly to have it locally mirrored instead of pulling across the internet. So, Johnny, let's talk about security for a moment. So there was a CVE that was released earlier this week, right? I think it's lovingly referred to as dirty pipe. I use that facetiously. Yes. Yeah, so let me, I'll open the link here that we have for the CVE, but I'll let you talk about it because I only have ancillary familiarity. Okay, yeah, and I mean, I might have just a touch more than that, but yeah, essentially the dirty pipe exploit, it's named after, like, you know, so we had dirty cow a couple of years ago, which is another kernel level exploit. And I think it's either the same group that found the bug and that's why it's named dirty pipe. It's just, you know, that's where the name kind of lineage comes from. But essentially the way it affects the rail eight kernel, the rail eight real-time kernel and the red hat for virtualization. I say it affects, that's the product that it affects, but it doesn't actually affect it because in rail eight, the flag that's being used for the exploit is actually not part of the rail kernel. But there is another underlying flag that could be used as an exploit that they haven't, it hasn't been completely vetted out, like that it can or cannot be used and so that's why it's still an issue. So right now it's affecting products, but it's not really affecting anything that you might have in your system. Anything older than like rail seven, rail six, something with like an older four dot kernel or even a three dot kernel, it's unaffected. It's really going to be anything with the newer kernel base and even in like the five dot kernel range. Yeah. And I seem to recall, I should do some more research on this. I thought I recall something about how it's affected, but no thank you, Siri. I don't know why you're talking to me. I said that was strange. It's, what is it? It's affected but not vulnerable or something with rail. Yep. So if you actually look, if you scroll up a little bit and you just look in that right there. So you can see that that's, that's the flag that gets sent. So what essentially happens is they're, they're overloading a pipe, this flag into a pipe. So they're just putting a lot of data in it and that's other bit able to take over. So this isn't, it's part of our kernel base, but it doesn't actually affect anything that we have going. So that, that last line there, or they're talking about, there's another flag that could be used. That's, they're still investigating whether or not, you know, what, what the attack factor is on that if there is one. Got it. So that's, that's probably why it's still like a thing for us. Okay. So short term, it sounds like we should keep an eye on the CVE page, you know, keep an eye out for any Arata for the fix, but it shouldn't be cause for significant alarm. I don't, I don't want to undermine or understate the severity of the issue, but just that's rel eight with the known vulnerability or known, yeah, known vulnerability is affected, but not vulnerable. Our hope nine. Yeah. Not necessarily. I mean, it would essentially local access to the node. Yeah. Okay. I was learning something new. Yeah. So the, I was trying to remember where like, so the dirty pipe name, it kind of stuck out to me, right? And I was like, this is so weird. And yeah. So I think it's either the same, the same person or the same group of researchers found it that say that found dirty cow. So that's, that's where the name is kind of coming from. So that's, it's actually a pretty cool. Yeah. So I just clicked on the, um, CM for all.com link here. It's similar to dirty cow, but easier to exploit. That's reassuring. And I saw this one, this one was on slash. As well. Slashed out was talking about how it's one of the more significant Linux kernel vulnerabilities, um, in quite some time. So, okay. Um, so as always, I always try to remind folks that, um, with open shifts and the CV ease, more than likely we'll only see rel eight listed here. Um, so core OS does not explicitly get called out. Um, and the reason for that is because red head enterprise Linux core OS is not a distinct product or offering from red head. Uh, it is only available through open shift. So we may see open shift for get added to this list. I don't know how they do all of that in behind the scenes. I know after the last one of these, I had pinged the security team and there is a bunch of stuff in the back end systems that needs to happen so that they can track core OS or rather separately from rel, errata. So that's blah, blah, blah. It's beyond my understanding. Um, so I would say for now, um, if you have questions, reach out to your account team, you can reach out to me. Um, and Johnny will do our best to connect you with any resources that we have if you have concerns. Um, but yeah, use the, uh, use the CVE here as sort of your starting point. There's should also be a, uh, rel security red head security bulls in our HSB. So, um, YNS never understood how metal will be layer two works. We can talk about that. Um, so yeah, we'll talk about that from, uh, in a minute, even though I've only staged how to do it with, um, L three with BGP, but we can still talk about how it does with a layer two. Um, what else do we have? Uh, so let, let's talk about something that, um, Johnny, you and I, we both, uh, immediately launched into this one when we were chatting about it earlier. And this is the one I was talking about where we could have a whole show talking about cluster architecture. Um, so in particular, I've been working with a partner and we're talking about what happens or how should you architect your open shift deployment if you have a small hypervisor cluster. So think I'm deploying to, um, you know, maybe three hypervisor nodes or four hypervisor nodes. Um, you know, how should I architect my open shift cluster in order to accommodate that? Um, so first I'll address two nodes and why I skipped over that one. Um, so two nodes does not have much like having a cluster spread across two sites. It does not have meaningful, high availability, um, improvement for open shift. The reason for that is because open shift has a three node quorum with that CD. So you're always going to have one node that has, uh, an imbalance. And if you lose that one node, right, that one node is a single point of failure. So I'm going to kind of skip over two nodes and assume that we always want it to have a minimum, right? The hypervisor cluster to have a minimum of three nodes. So that way we can have, uh, anti affinity rules. We can keep our control plane nodes onto separate nodes for full availability, full resiliency in the event of a, a single node failure. So what are some of the impacts? What are some of the considerations? What are some of the thoughts that we should have as open shift administrators when we're knowingly deploying into a, a relatively small hypervisor cluster. So in my opinion, there's a couple of things that we need to take into account, um, at the infrastructure node concept layer. Um, so think, um, you know, the router or ingress controller rather and all of the other services that are associated with that. So in general, I've always suggested to folks that infer nodes are a good idea because you're, you're taking workload off of your compute nodes and putting them on two nodes that you don't have to entitle. So it's more cores that you're paying for that are devoted to, you know, your, your actual workload. Um, but we want to have kind of a minimum of two infer nodes. Um, and the reason for that is ingress controller availability. Right. So the ingress controller will always have, um, at least two instances. You can scale it down to one. Um, but of course that means that if that node is lost, then you have lost access to all of your applications. Um, so I would say a minimum of two infrastructure nodes, it may make sense. Again, three or four hypervisor nodes to have it match the number of hypervisor nodes that you have. It gives some availability, some, some rationale to being able to move those pods, um, or recover those pods in the event of failure. And I think it also ties into the next section. Um, which is compute nodes. So Johnny, I'll pause there. Any thoughts? Yeah. So I mean, if you do anything like cluster logging, like you'll, you'll understand right away why you want to use infrastructure nodes. I mean, cluster logging can eat up a considerable, very considerable amount of resources on your application nodes. And so you'll want to use them for nodes. Um, and then the other thing with it is like, you know, I think you made a good point about it not being entitled. And, um, instead of paying for like a subscription and cloud cost right now, that's what you're really only playing for. And, uh, yeah, it's just, it's a good idea. It's easy to implement. And, um, it's worth the trouble. Yeah. Yeah. Yeah. And logging, logging, uh, has I think three elastic search instances. Um, so it's another one of those where you'd want to have three info nodes. If you're deployed, the logging operator or the logging service. Uh, yeah, our hope none. Um, exactly. You and I, I think I started saying that at the same moment you sent that. So great months. Think alike. Um, so let's talk about compute nodes. So this is where things get, um, interesting and a little more opinionated. So my initial thought around this was, you know, one with, uh, let's say three or four nodes. I'm going to use that interchangeably. I'm just going to say, you know, a small number of hypervisors, you know, should we have. One compute node per hypervisor node. So I've got four compute nodes and, you know, maybe they start off as, I don't know, 10% of the physical resources. So this kind of makes sense, right? And that I can evenly distribute my compute nodes across the physical resources that are available. Um, Christian, it is not out yet. Um, so I can evenly distribute my resources, my, my virtual resources across my physical resources. So that way I know when Kubernetes, when OpenShift schedules a pod to a node, it's not contending with other things that are on there. And that last point is kind of leading into, so where do we go from there? So what happens when I need to, you know, scale up my cluster? I need more resources in OpenShift. Do I deploy a, you know, more compute nodes? So maybe instead of, you know, four, I don't know, eight CPU, 32 gigabyte of RAM or 64 gigabyte of RAM, you know, virtual machines for my compute nodes, maybe I now have eight of those, you know, two per worker node or per hypervisor node. So there can be positives and negatives to that. I think that is particularly useful if you're encountering any pod count limitations on your worker nodes. But the issue becomes, now I am, now I'm paying for, I'm, I'm using those kind of core, core OS without meaning to be redundant there. I'm using those things twice, right? If I have the logging service deployed, right? I've got two instances of FluentD forwarding logs over instead of one, I've got two instances of core OS that need to be rebooted and all that other stuff. So it makes it so that it's a little bit less efficient. The alternative is, will I vertically scale my compute nodes? So I've got four hypervisor nodes, I've got four compute nodes. Those compute nodes, they vertically scale until I basically encounter a pod count limitation, right? And then I can do things like add additional compute nodes to scale up my pod count. So, okay, what happens if I lose a node at that point? And in particular, what happens if that, those compute nodes, those OpenShift compute nodes are a substantial amount of the physical resources, right? I'm doing this vertical scaling thing, you know, now each one of my OpenShift compute nodes is, you know, 30, 40% of the physical resources. And one of those hypervisor nodes fails. Now I'm trying to fit two of those, potentially two of those, onto a single node, right? Letting the hypervisor recover my lost compute node. Is that a good idea? Maybe, maybe not. You know, now if I lose another node, I've now, you know, OpenShift would technically still be running, because theoretically in a four node cluster, I've only lost one at CDVM. So the control plane would still be running. But my compute capacity physically is reduced by 50%, right? Two nodes failed in a four node cluster. If OpenShift or if the hypervisor then restarts all of OpenShift's compute nodes onto the surviving hypervisor nodes, it doesn't know that that capacity is lost. So we could end up in a race condition where OpenShift is trying to scale up a bunch of pods or add a bunch of pods into those, you know, nodes that have just recovered, which is then leading us to do, you know, the hypervisor to be even more overworked, because it's also probably trying to recover a bunch of other stuff. So anyways, where I'm going with all this is, you know, maybe we should consider in that instance, not using virtual machine high availability, let those resources fail inside of the OpenShift cluster and let the OpenShift scheduler make the right decisions about how to recover the workload. So if I go from four compute nodes down to three compute nodes or two compute nodes, we theoretically, we should have defined things like, you know, pod priorities and, you know, resource requests and limits and all those other things to let the OpenShift scheduler recover things in the right order to get the right applications up and running again and keep everything functional. So thank you, Christian. Your money is in the mail. I appreciate you agreeing with. You know, in a larger hypervisor cluster, you know, if we're talking, you know, maybe 8, 12, 16, 32 nodes, maybe that doesn't make sense, right? Because there is enough excess capacity in that hypervisor cluster as a whole to, you know, recover our compute nodes or OpenShift compute nodes inside of there. So anyways, architecture thoughts for the day, Johnny, I know you've been chatting in there, so I don't know if you have anything to add to that. No, I think you make a bunch of good points and like I was telling you in the pre-show, right? Like I never even considered about scaling out and like the even distribution of the workload, right? It's always just been scale up to meet the demand and you know, I probably should have thought about like that distribution, but it just never did. Especially like when you're in cloud, you're just like, oh, just scale, you know, keep going, keep going. Yeah, yeah, on a hyper scaler, yeah, it makes perfect sense. Scale out horizontally all day long. Worlds are oyster, baby. Yeah, but on prem and especially again in those, you know, small hypervisor, you know, small hypervisor account deployments, it takes a little more thought, I think. And identifying which scenarios you want to protect yourself from. You know, for years you know, Johnny, you and I were both VMware administrators, you know, once upon a time. You know, and for years it was always everything gets VMware H8 turned on all of the time with no exceptions. You know, maybe we'll set a restart priority type of thing so that way we get you know, the critical stuff back up first. But when there's a whole secondary scheduler Kubernetes open shifts involved in that recovery, you know, it can, it's something that needs to be accounted for, you know, when we do those recovery plans. So yeah, anybody, if you have any thoughts, comments in chat, please don't be afraid to post those in there. We'll certainly continue that discussion as needed. You're also welcome to reach out on social media. I see Johnny's information there up on the screen now JROC TX1 on social media or Johnny no H at redhead.com and you can also reach out to me Andrew dot Sullivan at redhead.com or practical Andrew on Twitter. And we're both also on Reddit as well. So if you look for any of the Reddit Johnny's been pretty good about putting Reddit posts up about each one of our live streams so you can of course respond in the comments there and we're happy to have a conversation. All right. In the interest of time, we've got one more very quick top of mind topic and that is topics for the future. So just a quick recap. Last week I worked up our schedule. We were supposed to have the Microsoft guys on. They were very, very generous and allowing us to reschedule. So we will have in two weeks time we will be talking about Microsoft. So it would be the 23rd, March 23rd. Next week we will be joined by Ortwin Schneider to talk about the service mesh. So that's kind of the next two weeks coming up. I think the what's new excuse me the what's next presentation. So the roadmap presentation is coming up in early April. So whatever day that that happens it overlaps with our show. So we won't be on this stream will not be airing but Johnny and I will be here to help answer questions and help you know funnel things over to the PM team etc. So keep an eye out for that. As always if you are interested please don't be afraid to subscribe on whatever platform you happen to be on so you can be notified. I also always encourage folks to go check out redred.ht slash live streaming that takes you to our landing page and it has the embedded Google calendar that has all of the streams including we update those when we do things like if we have to cancel or anything like that. So it can be useful. I usually copy the ones that I know I want to attend on to my personal calendar so I get reminders. All right. So let's talk about some networking. When you set up an HAProxy on helper node for OpenShift installation is that the exact one that is in later steps transferred to the control plane is the control plane's instance of it created separately. So helper node and the HAProxy instance that it deploys there no need to apologize there or I'm apologize if I put your anybody's name and hello Mike. Nice of you to join us. So the HAProxy instance that's deployed to helper node is an external load balancer that's used for the API api-int and star.app's ingress endpoints. So in a UPI or a non-integrated installation we need that external load balancer to send traffic from outside so when you browse to the console or when you access the API from your terminal to send that over to the right host inside of the cluster. So the HAProxy instance that is deployed as a part of the ingress controller is going to be different so the ingress controller is going to accept that traffic coming from outside so maybe that external load balancer and identify which pods inside the cluster need to receive that so when you create your service or whatever that happens to be it's the one that determines how to get to that. So you don't, so one thing to note is that if you're doing API or an assisted installer install it will deploy a load balancer if you can see me I'm using air quotes here so that it isn't a real load balancer what it is is a virtual IP address that's managed by keep alive D so when I create the ingress VIP when I assign it that IP address keep alive D on the worker nodes on the compute nodes will manage that virtual IP address and ensure that it is always on a compute node that has an ingress controller so that way that traffic coming from outside lands on one of those nodes it is directed one of those nodes and then HAProxy for the ingress controller manages that directly so let me catch up on chat here real quick and then hopefully that answered your question if not or if that's not clear just please don't hesitate to to ask clarifying questions so we'll get to that. Johnny I see you're looking up some kind of ACS here or a KCS rather yeah so Fahad was asking like I apologize if I put your name there. He was asking about like what services can go on the infrastructure nodes and I'm trying there's a KCS that actually lists them out right now but I'm going to keep digging essentially it's Quay that opens the directory. What's that? It's in the subscription guide. Oh okay awesome and then so I'll find that and post that but yeah it's Quay, ACS, ACM things like Red Hat Server Smash are going to be like because it's an application workload or it's at the application workload level that's why it would typically not go on an infer node and it should go under your app nodes. Yeah if we can share my screen here so I posted the link into chat on Twitch so the subscription guide as we scroll all the way down here and I'm silently cursing Christian's name because he handed this thing off to me so we have this infrastructure node section down here and it tells you about all the stuff that can be that is considered infrastructure workload including down here it's not just you know see everything up here is either OpenShift or Red Hat it's not just those things so for example if you are using gosh Turbinomic or something like that and you know they have a monitoring agent that gets deployed that doesn't invalidate its status as an infer node you know if you're deploying you know CNICSI drivers and the stuff that they use that doesn't invalidate its infrastructure node it's when you deploy your applications right stuff like that to it that it invalidates that what we call customer workload or customer application workload that it invalidates the infer node status so hopefully that helps with that no that's perfect thank you that's exactly what I was looking for yeah shakes fist to me that's right Christian this so this this subscription guide is it is the user friendly version of the official like legally this Red Hat subscription guide so we basically regularly go through and try and update this as the Red Hat you know legal policy around OpenShift subscriptions and all of that evolves to help folks to understand how to work with it so it's my quarterly cat hurting exercise as Christian knows so well because we have to go out we have to reach into all of those places inside of Red Hat and basically say hey is there any changes is there anything that you want to update in here so mentions third party apps again in a more general way does it additional improved usage of the infrastructure node yeah good I so yeah there's there's a lot of stuff in here and again if it's confusing at all if there's anything that you need help with don't hesitate to reach out to your account team you can reach out to Johnny and I and we'll do our best to figure all that stuff out and yes our hope 9 Trident is okay to run on the infer nodes yeah and if I just to back up what Andrew's saying right like if you're asking somebody else is also asking right I mean like so don't hesitate to reach out to us and let us go dig in on it for you because you know we want to make sure that everybody has all the information so that way they can you know they can be successful yep yep and I will also say that I just recently learned that there is actually two subscription guides that are out there what happened here the one of them is one of them is the old one so there is actually two of them that are floating around out there so the difference is one of them this one has the table of contents and stuff at the top and you notice it says last updated February 8th the other one will have the navigation on the right-hand side here or left-hand side rather you know my military left and it has a last updated date of like last July or something like that that one is not valid so it has been superseded of course by this one which has had at least two if not three pretty significant updates since then so just keep an eye on that last updated date the current release is February 8th so we are trying to get that one taken down we just have to find the right folks and get that done here inside Red Hat because I don't know who manages the web properties strangely and before we move on there was another question from G. Kamar about it really says it's for us or Christian, is there a preferred way of managing secrets sealed secret as a community based operator is that Red Hat's preferred way so I can just speak personally from our team we are using hash code vault as part of our secrets manager sealed secret is really good there's another group within Red Hat public sector that's using like SOPs for some of their secrets management and stuff like that but I know from us on a pattern side that we're doing we're using vault and it seems to be pretty effective I don't know if Andrew if you have an opinion or if Christian wants to chime in and chat I have no no opinion Christian and y'all are far more knowledgeable in that area than I am and Christian's answer is the preferred way is the one that's supported so yes I completely agree sure you're not a product manager Christian it's a very product manager answer yep exactly and there's a community and an enterprise version of vault and there's a lot of actual vault support within Red Hat community of practice as well they have the vault operators so there's a lot of good stuff going on with vault yep alright so please don't stop asking questions but I'm going to go ahead and move on and talk about some metal LB so let's look at my cluster here so if you saw it a few minutes ago the cluster that I had deployed in preparation for today is running OpenShift 4.10.2 so we talked about it last week anybody can go and pull the pre-release bits you can get those off of the mirror you can also get those off of console.redhat.com slash OpenShift to deploy pre-release and you can get access to that so one thing to notice for whoever was asking about when 4.10 will be released you notice that we're out of the RC stage however notice that we're not going to GA with 4.10.0 more than likely it won't be 4.10.2 either as far as I know that's just an artifact of how they do patching how they do the versions and all that other stuff so it doesn't mean that it will go from in the upgrade branches that it will go from fast to stable any faster or anything like that that's entirely dependent on the post-release metrics and stuff that they have around update success but just be aware that you can go and pull those bits you can deploy it today you can take a look at it some things are I will say in my experience not all of the operators really spits and I say that because when I tried to deploy Metal LB in prep for this Metal LB operator failed so it was trying to pull an image that was not available yet when I went to install this operator so the result of that is I had to deploy using the upstream bits so if we go to github.com slash open shifts we have Metal LB Metal LB we have this Metal LB operator here effectively what I did was just pull this in I followed the documentation that's in here for how to go through and deploy it I think I actually deployed the Metal LB one not the open shift one it was this one what's this into chat here so just be aware that this is only temporary until it really does go GA so for now I had to deploy the upstream Metal LB operator which is also created and maintained by Red Hat so it should behave exactly the same inside of here I'm just one version ahead of what they were doing so with the Metal LB operator deployed it then added a number of custom resource definitions inside of here for the Metal LB side of things here we now have all of these different CRDs we of course need to deploy Metal LB itself so remember an operator is what manages the deployment of operands or of an application so deploying the Metal LB operator doesn't mean that Metal LB bits and pieces are deployed yet it just means that we are ready to deploy those so I created a Metal LB and then inside of there and it's a super simple there's nothing inside of here you literally give it a name and click create Metal LBs will do a create one see this is all I had to all I had to provide was what I wanted to name it and then once the operator sees that hey you do want to deploy a Metal LB it goes through and deploys everything for us and you'll notice I'm in the Metal LB system project here so if we come up here to pods we have all of our pods running inside of here so the other part of that I clicked away so now I have to do this again you'll see that there's a couple of other custom resource definitions that exist so there's address pool there's BGP peer and then there is BFD profile so an address pool represents the addresses the pool of IPs that we want to be able to use for our services when they're created right so I gave it this set of addresses that I wanted it to use and because this is what I'm using L3 I'm using BGP mode we wanted to use protocol of BGP so let's look at the documentation here and what I want to do is Metal LB and we'll click here so this is the 4.9 documentation so remember with 4.9 Metal LB layer 2 mode is supported 4.10 layer 3 mode is added so it'll be layer 2 and layer 3 mode with 4.10 so with layer 2 mode where am I at here when I create an address pool you'll see that the protocol is layer 2 and then I can specify the range of IP addresses that I wanted to create so one thing that's important is let's talk about BGP for a moment again I'll caveat this by saying that Andrew is not a networking engineer or a networking expert by any stretch of the imagination so please for any network savvy folks forgive my layman's definition of how this works this is going to be amazing you know hey yeah, yeah if by hopefully not confusing my bar for success here is not confusing then this is going to be great let's do it so BGP if you're familiar with yeah thanks Christian for the popcorn emoji so BGP works off of the concept of autonomous systems or AS and we define between different entities between different routing groups effectively ASs separate identifiers for each one of those and what they do is they publish routes they publish how to get to a specific network based off of the networks that are I'm going to use the term behind but are accessible from inside of that autonomous system so let's say that my house is autonomous system 65,500 and Johnny's is 65,501 and inside of my network my internal router can my networks so let's say I'm using 192.168.0.0 slash 20 and then Johnny's using 10.0.0.0 slash 20 so inside of my house everything routes fine my routers understand how to connect everything together but if I want to get to Johnny's house it needs to understand how do I route how do I get from my autonomous system and what networks are behind this autonomous system and so that's what BGP does it communicates hey I'm over here I have these routes I have these networks that you can access by coming to me there's a lot more complexity around things like how to do traversal so let's say that's Christian who's in California let's say that Christian also wants to join in our fun and I have no direct connection to and the networks Christian's house so I would have to traverse Johnny's autonomous system to get there there's BGP magic that happens inside of there and how that is set up it's way beyond one my understanding and to the scope of what I'm trying to show here so the core of all of this is essentially BGP much like RIP much like OSPF right all the others it allows us to create networks and then have the routing protocol basically advertise hey this is how you get to this network this is how you get here without each host having to understand how to do that a specific host you know my desktop if I needed to get to Johnny's house I just say go to this IP address and we rely on the networking devices to understand how to get there excuse me so that's where I'm going to start this up so with BGP what I've done here is I'm using a set of IP addresses that do not exist in my network right these these are my network I actually have a bunch of different networks in my lab slash house here but this 30.100 doesn't exist anywhere except right here and I did that on purpose right so I do have some 172.30 .012 subnets I have some 10.0 subnets I have some 192.168 subnets but this does not exist and I have not defined it anywhere else except right here so our protocol is BGP so the next step that we need to do so we can look at our resources so I define an address pool hey when you create a service of type load balancer which is what Metal LB is going to service no pun intended one of those IPs from that pool when you create that IP address or when you assign that IP address to that service you need to advertise it to the BGP peer so in this instance I have one peer that's been created and if we look at that all I'm doing is saying here is the upstream router so this happens to be the in my lab it's the gateway router and then here are the ASNs that I want you to use so so ASN is autonomous system number and I'm saying that the peer the upstream is 65.512 and this instance inside of my cluster is 65.514 so that's all the configuration I did here inside of OpenShift for Metal LB so let's look at what happened upstream at my router I'm putting this in a separate window so y'all don't have to watch me log in I think I can bring this in here and you should see it so here is my router so I've got this BGP you can see I'm using OpenSense I installed the FRR plugin for my OpenSense I enabled BGP and I set my autonomous system number inside of here so that's what I did inside of my lab and what we want to look at is what happens to our routing when we begin to deploy services inside of here so if I click on routing you can see that I already have one of my IP addresses I already have a service that's created so if I come back over here and go to networking so I have this Hello Kubernetes service inside of here and you see it has this external address of 100.10 and if I browse to that it takes me to directly to my Hello Kubernetes service that's running inside of there so what happens when I delete this service so I'm actually going to do this from the CLI because it was deployed with Helm for me typing I'm just doing a Helm uninstall and what we should see here in just a moment so my application is now gone no service I come up here to pods I've got my simple deployments but I don't have my Hello Kubernetes running inside of here anymore and if I come back over to my router and I refresh this you notice that my routes have gone away it no longer knows how to get to my particular nodes or that particular service so let's create a new service so I want it's not what I wanted yeah Christian you're right like BGP it takes a while for it to propagate out because it's got to reach out to all the neighbors and all the neighbors have to update their neighbors it's just this cascading effect it can take a long time yeah so in my lab it's only one router so it only has to update itself but yes in an enterprise it can take a little bit of time for it to propagate so all I'm doing you see I'm creating a new service and I want to specify type of load balancer I'm remembering that correctly so it's a which means that Metal LB will create will be the one to service that service so we'll hit create here and you can see that that is exactly what happened right we now have this external load balancer it reused the IP address you can see it's the 100.10 IP address and this time if I browse to that IP address I get my simple service simple service is just a thing that I created to get a JSON response for who is requesting this you notice that this is coming from inside of the cluster and then what's the server what's the service that service servicing it I keep saying service servicing which sounds like a weird phrase right so what you saw me do there is destroy one service that was using a Metal LB load balancer and then create a new service that is using a Metal LB load balancer and now if I come back over here we'll see our three routes get created again so let's talk a little bit about what we're seeing here so first you'll notice that it is creating a slash 32 for each one of for the same IP to each one of my nodes in the cluster so 160, 161 and 162 so that is BGP saying I know how to get here it's going to be one of these particular nodes let me see if I can stretch this and you'll see that these columns so valid all three are valid but it's going to prefer this particular link here or this particular node first so that can be changed so when it does this from what I understand it will send traffic to that node first and then it will use kube proxy or the internal mechanism to distribute traffic across the pods you can also set it to local so there's a policy I don't have the docs page up but there's a policy you can set where it will do it will only send traffic to nodes so these routes that are hosting one of the pods that's a valid target for our service so that's how it works in a nutshell right and I can't see chat so I'm assuming you're going to interrupt me at any point in time here Johnny so that's how it works in a nutshell so there's a couple of other things that we may or may not want to know about what's happening inside of here so first let me deploy my other service again let me deploy my the same hello kubernetes service that I had before just so that we can see what that looks like when there are two of them so let's click back over here and if I come up here to deployments we've got our hello kubernetes back and if I click over here to services we've got our hello kubernetes service back you can see this time it was assigned to the IP address of 100.11 and just like before I get my hello world application if I refresh this you'll notice here in the pod name it's going to change the pod name periodically and as well as the node name that it's running on so it is load balancing across the pods that are running inside of there and you can see each one gets its own IP address inside of here so what are some other things that we can do with this so let me bring up the little lb documentation as soon as I find the right tab here let's add into stream chat here so in our configuration if we come down here to bgp what I want to talk about here is our our advertisement configuration so the first thing that I want to say here is bgp is big and complex and a little bit scary if you're asking Andrew the reason I say that is because bgp is it's what the internet runs on literally if you were to query an internet router and look at its bgp routing table it's like 250,000 entries and tens of megabytes in size on how to get places and there are some very public instances where screwing up bgp routes has broken the internet like literally broken the internet you know facebook has done it recently I want to say cloud fair has done it recently there was a very public one back when some country blackholed all of youtube do you remember that it was like they wanted to block youtube in the country and they updated the routing table and propagated it outward instead of inward and they ended up creating a black hole for all youtube traffic so where I'm going with all of that is we should not be doing this in a vacuum right we have to work with our network architect network administrator peers in order to make sure that this works successfully so one of the things that remember each router has a finite set of resources and one of the things that I want to talk about here is creating a slash 32 so remember over here I've got a slash 32 for each node in my network that is a speaker for metal lb if I create a hundred routes or excuse me a hundred services or a thousand services in my cluster I'm going to have three slash 32s for each one of those so it's entirely possible that you could basically extinguish the resources on your routers in an enterprise situation and of course that would be bad so to work around this metal lb has this advertisement configuration and effectively what you can do is you can set an aggregation length so what that means is that instead of creating a slash 32 instead of creating a slash 32 for each one of these if I refresh this now there should be six right one for eleven one for ten it will create a slash 24 for all of them so there will be one slash 24 entry into each one of the nodes that is acting as a speaker for metal lb I'm realizing now that I may have skipped what a speaker is let's go back to our pods and let's look at the metal lb system namespace so if we look inside of here so this is the metal lb deployment so you see I've got my operator controller so when I went and deployed the operator this is what it deploys this is what manages the custom resource definitions etc I've got the controller which is what actually does things like it's the one that is talking to each one of the speakers to request that they do configuration so if we look at the logs for our controller we can see here it's allocating this and then it will reach out updated service object where it will reach out and it will talk to each one of our speakers and the speakers are what handle actually handle that traffic so you can see here it's creating our responder in dpr all of the things that it does yes resume stream service announced on the ip's for our 100.10 and 100.11 load balancers pods deployed to each node that are actually managing that traffic and if I were to look at the expanded view there would be one of these on each each note it's a a demon set speaker so that's why the speakers are important and you can basically for every speaker or for every node that is hosting a speaker you want to have it will create a entry over here in our routing table log so you can use node selectors to limit that to specific nodes which you would definitely want to do in larger clusters to prevent these routes from just getting totally out of control just be aware that that is those are effectively ingress points for your cluster right so you would treat them just like you do a infer node that is running an ingress controller including having the ability it is going to be sharing network throughput and all that other stuff with the application that's running that as well as anything else that's running on there so you may not want to have it on the same node as an ingress controller demon set or daemon set I switch back and forth Christian remember I use all the words incorrectly all the time just to make sure that everybody disagrees with me at some point right there's kubectl kubectl kubctl yeah I embrace hostility I I it's a matter of opinion right so yeah 192 so so yes my nodes are on 192 168 14 so if we come down here to compute nodes is it going to show me on here maybe not so rarely look at this interface yeah there we go internal IP 14.162 and then the other two are 160 and 161 so that's bgp metal lb bgp mode in a nutshell right again work closely with your networking peers they should be able to feed us all of this information right they're going to tell us specifically which asn we need to use and unless the pure asn and local asn are inverted and match on the other side it's not going to successfully instantiate the bgp session and you know it's not going to work you know if they want to do aggregation then we would want to you know work with them to determine what is the appropriate you know subnetting to use for that aggregation you know they should also assign us the IP pool that is coming from that um so the communities are an interesting thing so the communities and there are three common communities there's no no gosh what are they now anyways they control what we are requesting the upstream the next hop router do with any routes that we advertise so no advertise would mean hey don't advertise these externally beyond you so it wouldn't work for any traffic that has to traverse to a different router or to a different as you can advertise externally for example and then it'll be broadcast across everywhere in order to get there other thing that I want to talk about here is yeah here limiting peers to certain nodes this is what I was talking about with the node selectors um so the other thing that I wanted to talk about quickly is bfd so effectively bfd as it says here quicker path failure detection than bgp alone provides again work with your network peers in order to determine what the correct settings for the bfd profile is the goal being if that router fails for whatever reason the up whatever the next hop the upstream router is how do we handle that and how do we recover from that right so that it can recreate or repropagate out those particular routes in order to regain access so I'll pause there um and all that other stuff and what was I going to show I don't remember so the last thing that I wanted to talk about on this is just a quick um highlight that with open shift 4.10 the external dns operator is in tech preview and it only works with hyperscalers so dns operator right so the external dns operator is in tech preview with 4.10 what this will do eventually because the upstream external dns operator has alpha level support for updating many other types of dns servers eventually we'll be able to create annotations on our let me see if I can find an example here we're able to create annotations on our services blah blah sorry for all the scrolling folks test so we'll be able to create annotations and it's not going to show me in here that say I not only want to publish using metal lb or you know whatever my external load balancer is this service so that it's externally accessible directly I also want to create a dns entry for it so that you can now you know basically publish something outside of the open shift ingress controller mechanism inside of there but again with 4.10 it's only tech preview it only works on amazon, azure and google it doesn't work internally but the upstream operator or the upstream functionality so here's the kubernetes external dns so if we look at this guy you see there's a whole bunch of different dns providers supported using bind like I am rfc 2136 would be I don't want to look at the actual rfc thank you the rfc would be 2136 would be how we went to update that and then if I click down here on the actual documentation link it tells us precisely how to how to configure that here's an example so annotations external dns blah blah blah service.example.org so what happened is when I published that it would then create a dns entry to point to whatever ip address is assigned to it that's going to be such an awesome feature when it releases, I know that's been a long way to things so it's going to be that's going to be awesome I can see it either way good or bad ingress open shift ingress controller it adds a lot of value in my opinion most especially certificates and all of that other stuff you don't have to manage those yourself everything old is new again my children wearing in 1990s clothing I distinctly remember the 90s and it's funny what is qwerty123 it's a very terrible password I don't know where that came from there's a small tangent I don't know about secure passwords so we're all putting our most secure unsecured password so alright I will take a brief tangent because somebody had asked about what my lab looks like so I'll actually take a brief tangent to look at that and I say that so let me log into vcenter here my lab is constantly changing although right now it seems like it's been the most stable that it has been in a long time so my lab consists of a grand total of two notes inside of here both of them are actually desktops I chose to use regular off the shelf desktops because they were very very one affordable and two customizable which is what I've done kind of to an extreme almost I would love to simplify that I can still do all of the things that I want to do and upgrade to something like I think Christian has was an R720 or something or an R730 in his home lab I just haven't been able to get there so for me I have a couple of things inside of my lab so the first one is a TrueNAS, formerly FreeNAS virtual machine because all of my storage attached to it so in my case I use m.2 SSDs because they're small and I can easily cram a bunch of them into the case and from there I provide all of my shared storage so if we look at my storage over here I have like an iSCSI LAN that comes off of that I have an NFS export that comes off of that so that virtual machine is the only one that uses my storage on this host so when I go and turn on my lab I turn on the host that has this virtual machine attached to it and then it has PCI pass through for all of those storage devices and then creates the shared storage pool that then has everything else on it from there I have a helper so this is it originated with Christian's helper node I've taken it and I have expanded it so Christian and the automation that he created creates a helper node instance with all the services DHCP, DNS load balancer all that other stuff for one cluster I have mine configured for 10 different clusters that I can stand up in different configurations and different ways and all that other stuff so same core services bind HA proxy I'm going to take my mind the ISC DHCP server all that other stuff inside of there I just customize the configuration and then aside from that you'll see things like I use this bastion VM to mock or mimic a disconnected environment when I need that vCenter interestingly I don't turn on vCenter very often unless I'm doing something like a vmware IPI installation or something like that so you'll notice there's three node compact cluster but I actually deployed it using the IPI method so that's completely unsupported you can't do a a platform integrated compact cluster but it works and it helped to illustrate my purpose without having to jump through a bunch of hoops so I turn on vCenter for things like that but most of the time I just connect directly to my ISX host and turn things on so this VM I was using the other day for what was I using it for um oh that was my um the one that we used for the OC mirror the local mirror and all that other stuff and this is my usual demo cluster this is the one that I use for most things so this is a non-integrated cluster that's deployed onto vSphere and then this one I recreate on a regular basis it's non-integrated specifically so that I can talk about the infrastructure integrations and stuff like that but I also do things like nested virtualization in here so if I want to test with OpenShift virtualization or something like that I can use that so the only fanciness here if you can see it I have this DV trunk so all I did was create a if we come over here to the networking I created a VLAN trunk port on my switch there's two of these for whatever reason I was having issues at one point with the standard switch trunk port so I created a DV switch trunk port and that works great I don't know what the difference is but it fixed my problem um and that's what I use when I want to do things like create VLANs that get passed through into the bigger network um so the other side of that is if you were a quick guide here you'll see I've got a couple of different networks that I use for all of my lab stuff so one is the work stuff um this is what like my work desktops sit on it's kind of the management interface for most things the other one is the lab this one is isolated it's what I consider the disconnected network so when I want to deploy things and test it that way so yeah my lab is pretty simple um all in all um I'm happy to answer any questions about it folks can send me some messages on social media or send me an email if you have any questions about how I set it up um there are some limitations right my two hosts each one only has like 64 gigs of memory right now so I can only deploy one or maybe two small open shift clusters depending on what I'm doing inside of there um yeah let me know if there's any questions on that and then so the last thing that I wanted to talk about today is NM state so let me find my cluster here again so the NM state operator is as the name implies it is a stateful configuration engine for network manager so effectively anything that you can do via NM CLI or NM 2E you can do through NM state what it's doing is just defining that using YAML um so let's do um NM if I do a search here for NM states I don't want New Mexico I want NM state so if I do a search for that you can see that it comes up and talks about the upstream project for NM state and this is something that Red Hat has been using for a while um so if you're familiar with Red Hat virtualization uh so Red Hat virtualization manager 4.4 uses NM states to apply network configuration to the host right so it uses it for all hosts and it's also the the thing that works when you do in 4.4 you can like copy a host network configuration and apply it to other hosts it's all NM state driven so here we can see it's JSON output but it also works with YAML um and there's a bunch of different examples of how to do things like I want to create a bond um so here's YAML and all we're doing is defining NYAML or JSON exactly the same thing that we would be doing on NMCLI right so I want to create a bond that is in a state of up I want to assign it this IPv4 address for the bond I want it to use a balance RR mode and I want these two interfaces to be members of that bond it's it's literally all it is right if I want to create a bridge interface right so I want to create an OVS bridge I want it to be on top of this particular you know network adapter and you can define you know a bond to be a part of that as well and you can combine these right so I don't have to have these as separate you know um um files or separate implementations I can combine all these into one and effectively because it's a state engine it will ensure that it matches that particular configuration also that one thing that configures folks is um if you want to un-configure something you don't just remove the configuration you have to set it to um basically present equals false is how you remove things so how does all of this fit in with OpenShift so let me find a documentation I thought I had a documentation page up so NM State has been a part of OpenShift since 4.6 which was when OpenShift virtualization was released and since 4.6 NM State has been fully supported when used with OpenShift virtualization so we want to go to node networking here and down here we have these you know observed node networking state right and updating node network configuration so all of this has been available with you know again OpenShift deployed directly to physical servers when used with OpenShift virtualization since 4.6 with 4.10 we're now expanding that to be generally available and fully supported with all bare metal deployments right so that means that I can go in and I can apply day 2 that networking configuration so let's come down here and find an example I apply that networking configuration to my cluster as I see fit so let's look at what this you know how this looks and works in practice and I'm going to hope that this works because I literally just before the stream deployed the operator and I haven't actually tested anything yet so we'll see what happens right so if I search my CRDs here for NM State you'll see that I get these NM State set of CRDs so the node network state represents the current status of the network configuration of our node a node network configuration policy and NCP represents how we want to configure the network for a particular node so if we jump back over to our documentation example here this is a node network configuration policy and then our node network configuration enactment is on a per node basis the status of the node network configuration on a per node basis so let's look at our node network state and you'll notice that I have an instance of this for each one of the nodes inside of my cluster so I'm going to select one of these guys and look at the YAML and down here in the status we have our current state so I'm going to select one of these guys and look at the YAML and down here in the status we have our current state and it walks through and shows us everything that we know about the network so effectively if you were to use NMCLI and kind of show the tree of configuration for our network this is all the information that we would find out so we have this where is it so here is BR0 which is an OBS interface it's currently down I have two IP addresses I'm not sure why that one has a 32 again I just tested this so who knows what we're going to find out and then so here's our MTU here's the network adapter that it's using it is VMware so ENS192 we can see our speeds down here we've got another IPv4 address this one is running on ENS224 so on and so forth scrolling down through here here is the OBS so this is going to be the SDN down here I've got route config so it's a breakdown of what is the current configuration of my nodes of my host so you'll notice here that I have two network adapters on my host so let's come over here so I've got two network adapters so you'll notice that that second adapter also pulled an IP address that's because it has a native vlan on it so that would be this interface here ENS224 so let's say that I want to create a bridge on that interface and I want to remove this IP address so let's go back to our CRD list and I want to create a node network configuration policy so you'll notice that first I have no instances here and I need to find the instance that I want to use so I'm digging in my nodes here to find the instance that I want or the example that I'm going to use let's paste that in here so I have a node network configuration policy we'll give it an arbitrary name like anything else in Kubernetes I can use a node selector so I want it to apply to anything that has a worker label associated with it or a worker tag associated with it and then what we're saying here is I want to create a new interface named BR1 that is a Linux bridge I don't want it to have an IPv4 address and I want it to use port ENS224 so let's say that I wanted to keep the IP address that's assigned there I can leave this IPv4 address and actually because we're creating a new interface I think it would inherit the default which is going to be DHCP or I can expressly configure DHCP or a static IP address on it and it will apply that but because it's set to enabled false it will remove that IP address and it won't reconfigure it so one thing to note if you're using OpenShift SDN you can modify the first interface because the quote-unquote primary interface where things like the SDN is using if you're using OVN Kubernetes I think it's still possible but it's a little more complex and it's basically you have to set OVN Kubernetes to use the local gateway detection mode if I remember correctly there's a BZ out there that has the exact details it's something that was changed either in 4.9 or 4.10 we can put it in the blog post so the end result of all of this is that hopefully, ideally, when we're done here what we're going to see is my nodes be configured with a new bridge named BR1 that will have our network adapter attached to it so we'll hit create here and now I can go back to my custom resource definitions search for NM state and look for my configuration enactments and you see that I now have a configuration enactment for each one of my nodes so you see here's the node name master-0 and then worker-BR1-UNS224 which is the name of my NNCP that I just created so we'll click on this guy and we can see that it has successfully configured supposedly right so here's our status conditions where it's saying it's successfully configured if I go back to my custom resource definitions and find my NNCP what we should see is that this one is also set to successfully configured with 3 out of 3 nodes so how do we verify that? how do we check that? so one thing that we can do here is look at our node network state again so I'm going to go back to the same node and now if we scroll down we should see a new bridge let's see where's it at here so here's our the start of a new entry so we see it's a bridge and we have port 224 assigned to it in trunk mode but no IP address assigned to it there's BR1 no IP address assigned to it so that IP was removed from the host but we still have our other IP address the 162 that's down here so just like that we added a new a new configuration we reconfigured the network on our host if I look at host 2 we should see the exact same thing here's our bridge port 224 ENS224 name of BR1 so that's how we can apply day 2 network configuration to our host now with 4.10 and bare metal deployments in particular this same functionality can be applied to basically during installation so I don't know if I have a link to it here or not in the pre-release documentation so if I find installing and I want to install to bare metal and network customizations anyway so at some point the documentation will be updated and effectively in this networking section you'll be able to add NM state definitions so this is particularly useful when you want to do things like installing to a bare metal host and that bare metal host has I need to configure a bond and a VLAN interface that it's going to use for that primary interface or whatever that happens to be as we go through so I don't see it in the docs here which means there's probably a PR out there for it somewhere I'll highlight that once the docs go live and once we have everything inside of here hopefully I'll be able to put that into the blog post for this week so I'll pause there, Johnny you've been awful quiet I haven't looked at chat in a while so is there anything that I should know yeah so there's one question I was answering some questions from I lost it but anyways it was about SRIOV and like a Linux bridge or OVS bridge so OVS bridge is software defined bridging in a Linux network world or in a Linux bridge it's essentially just like if you look at it it's like a switch port I mean it's just hey I need to get from here to here so it's a layer 2 interaction where SRIOV is like a password networking so if I understand that correctly it's like a full on it's like a legit PCI to VM or to machine and it bypasses any SDN type stuff so that was one of the conversations going on and then Kaiser asked if you could expand more on fully supported when used with OpenShift Virtualization can I use NM State Operator in a supported way for configuring my primary interface just by installing C and V yes as far as I know yes so I don't know that anybody you know whoever on the support side goes through and like double checks like have you deployed any VMs to make sure that it's there before they do that so I would say yes if you deploy OpenShift Virtualization it will automatically deploy the NM State Operator as a part of that and then you'll be able to do that configuration stuff it's deploying NM State Operator without OpenShift Virtualization that historically has not been supported gotcha and then Alan was just asking essentially like well so how do you do PCI pass through of a network to a VM and C and V yeah so if we go to I think it's down here for IO PCI pass through is what you want so effectively you're doing PCI pass through of the device and SRIOV is supported with OpenShift Virtualization and our hope 9 shared a link to the SRIOV Operator so just something to check out as well yeah I think it's slightly different with a pod using an SRIOV adapter versus a virtual machine but I'm not entirely sure I will fully admit that SRIOV is outside of my realm of oh my god do the same I don't think it matters if it's InfiniBand or not Alan but again I would have to double check that you know it's so long as it's supported by CoroS which means so long as it's supported by RELT and in theory it will work yeah Alan and if you have more details that you want to provide feel free to email us and we'll get the answer for you right I mean just shoot us an email with like the architecture that you're looking at and we'll see if we can get you an answer yeah any other thoughts anything to add no man you did really good today I was expecting total glory but you know you killed it so it's funny I was telling Johnny before we started here that normally I spend the morning before the stream kind of prepping and making sure that everything is staged and I have the flow in my head and I lost like an hour and a half this morning to a phone call with that that was unexpected so I didn't have as much time to prep kind of funny you did good you did real good thing I'm gonna just because we're already way over so what's another minute or two I want to see what happens if I go in and I try and remove this thing so let's go in here and we want to go to what is it it is status equals false maybe removing an interface state absent so if I take this guy and I think all I need to do is to set the right indentation I leave everything else the same and just set state as absent yeah you're at the same level as the other states so you see how you have like your state up yep there's a conflict I spelled it S-C-E-N-T like a smell so now if we hit save here what we should see is this guy go back into a progressing status two out of three nodes finished and there it's been applied so now if I go back to this is much easier from the CLI where you can just do an OC get NNS if I go back to our instance YAML and scroll down through here so here's our list of interfaces so here's the first one which is ENS 192 here's the second one you can see that ENS 224 is not doing anything right so there's no configuration on ENS 224 anymore there's our loopback interface so here is our OVS for the SDN both of them but you'll notice that my bridge is no longer there and now I can simply go back and now that it has been un-configured if I want to I can delete this NNCP so that way it doesn't have to it's not constantly trying to apply that so yeah that's how to configure and then un-configure networking using NM state hopefully it's helpful we'll be working on some collateral in a technical deep dive presentation that dives into some of this stuff as well but yeah that's all I got this is good I think it's really helpful I mean honestly there's a lot of there's a lot of day 2 stuff that I think people are going to get out of this so it's really good yeah it um we get a lot of questions I know you see them as well of hey I want to add in a second adapter for a storage network or for something like that how do I configure that and a lot of folks assume that the answer is use machine-config and machine-config works in some scenarios mostly where DHCP is available but if you want to do static IP configuration you would effectively have to have a machine-config pool per node to set that static config so NM state is one way that we can basically work around that both at install time as well as day 2 so yeah it's going to be that's a cool tool and then the metal LB stuff is pretty awesome so I'm looking forward to messing around with that yeah I'm looking forward to getting more familiar with it as well like I said I scratched the surface of BGP as I was going through and implementing this inside of my lab and just all the stuff that is possible with it there's a whole unexplored avenue of things of like how it makes a determination of which interface to send it to I will note that you know a I don't understand I don't know what algorithm it uses to determine you know last time I sent the traffic out this you know to this IP address next time I'm going to send it to this one next time I'm going to send it to this one you know whereas with a quote-unquote real load balancer there's a lot of policies you can put in place around that right there's a lot of sessions least amount of traffic least recently used straight vanilla round rob gosh excuse me so yeah it's one of those there's more to learn I am curious about and I would love if anybody is a networking a BGP expert what's the impact to the router on that you know if I'm using a metal LB BGP load balancer and it's pushing you know tens of gigabits of traffic is there an impact to the router as it's having to make those decisions with you know potentially thousands of sessions I don't know hmm yeah good call so okay well I'm a I'll quit rambling you know we're 37 minutes past our normal stopping point so thank you everybody who has joined us today we'll definitely follow up on a blog post with this one we'll link to all of the relevant places inside of the stream here so that way you can find the information that you need if you have any questions if there's anything that we missed that we didn't answer anything that comes to mind as you're watching this after we're live please don't hesitate to reach out with any of those questions at any time so as is on the screen now you can reach me on social media at practical Andrew on twitter or on reddit you can also send me an email at any point in time andrew.solovan at redhead.com so and Johnny is JrockTX1 on Twitter Johnny with no H J-O-N-N-Y at redhead.com I don't remember what your Reddit username is uh yeah it's something stupid yeah look for the live stream posts on the OpenShift subreddit right but yeah we are we are more than happy to help with any of those questions anything that comes up so thank you everybody reminder next week we will be talking about service mesh two weeks we'll be talking about uh uh uh micro shift um Christian's been doing a lot of really cool stuff with micro shift he and I've been chatting on the side he got OLM deployed to micro shift which I thought was really cool oh man nice um so we'll have a great conversation with those guys when they're on in two weeks um and with all that being said thank you so much everyone and Johnny last words yep thank you great great episode Andrew even though you know you think you thought you winged it you did awesome so you know good work and uh Stephanie as always thank you for everything all right see you next week yep see you guys