 Good morning, good afternoon, good evening. Welcome to a special edition of Ask an OpenShift Admin. I am Chris Short, host of Red Hat Live Streaming, showrunner with the most, I guess. Yeah, I do lots of things for streaming these days. I'm joined by the one and only Andrew Sullivan and Andrew, we're gonna do a deep dive today into something that's super helpful, I feel like. We are. I like how you call it a special edition. It's like all of them are special, so. They're all special in my mind. They're like children, right? Like this one's special, this one's special, this one's special, right? There you go. Yeah, indeed. So yeah, hello everyone. Welcome to the Ask an OpenShift Admin office hours. Man, words are hard today. I've been in presentation mode since 8 a.m. It's now 11 a.m. and I've had over a pot of coffee, so I'm gonna try not to. Getting stuck in traffic, are they? Yeah, you know, time is moving more slowly, you know, if you will, so. Yeah, but yeah, hello everyone. So this is one of our office hours series of live streams, which means that we are just like if you had a teacher that had office hours, or if you've ever had a manager that had office hours, right? We're here to answer your questions, whatever it is that's on your mind, whatever you wanna talk about, whatever you wanna bring up, that's what we're here for. So I will say that this episode is special because, well, it's just, it's me and you, Chris. It's just us. It's just us, we didn't bring on an expert to talk about this like super critical point of like open shift, right? Yeah. Well, I, so, you know, it's funny, right? Cause officially we're both technical marketing folks. So one of my responsibilities, yeah, yeah. And one of my responsibilities, cause we just went through and did a whole spreadsheet of who's responsible for what, right? Happens to be operators and OLM. So this is not necessarily Andrew showing off, but rather than just an opportunity for us to talk about something that, as you pointed out, is a core critical aspect and component of open shift itself. So yeah, as the title gives it away, operator life cycle manager is today's subject, but don't let that limit the things that you want to ask about, talk about with us. Exactly. You're more than welcome to ask us about anything and everything. I will say life advice may not be the best topic because you've got a lot of bad life advice, Andrew. I'm gonna give that out. Yeah, you know, results may vary. We'll put it that way. That's one way of putting it, I guess. So as is tradition here on the ask and open shift admin live stream, I do have a few topics up front. So what I call the top of mind topics to cover. So let me share the screen here. Let's see, I want not Google Chrome with Twitch in it. I want Google Chrome with this guy. Did that share the whole screen? You know, you have your super wide screen monitor right there. I want to, all right, basic. You would think, yep, select share. The whole tab or window sharing thing, right? Like, I think it was either you or Christian where it's like, I wish it would give me that menu full screen no matter what size the zoom window was. Well, it's, as you pointed out, I have an ultra wide, but anytime the sharing window comes up, it's like this big and then it tries to fit all of the windows in it. And I can't, it's hard to tell which one is which. It's on prints for every window and you and I both have probably 10 to 20,000 tabs over in any given moment, so who knows. Yeah, yeah, so, you know, it's funny. We've been in this, you know, all virtual, all the time mode for 18 plus months now. And you know, I still struggle with how to share my story. That's, wait a minute, I should admit that, should I? I mean, it's okay. We're friends here. About everybody else, but we're friends here. You know, you know, so yes, top of mind topics. Let me switch to my other tab here, my other other tab so I can see my list of links. So the first one that I want to talk about is, so if you are paying attention to our releases, so this is, I went to console.redhat.com slash OpenShift, clicked the releases button here and it tells us quite, quite helpfully what's all of the latest releases are and their status. So you'll notice that 4.8 stable is still 4.8.5 despite fast being 4.8.9. Likewise, if you are currently running OpenShift 4.7, you will also notice that there is no updates available. And I thought I had one of those anyways. I thought I had a tab open that would, had the upgrade path thing that showed that they're all blocked. So they are blocked at the moment due to a, what feels like a very familiar bug. So essentially the team discovered that there is an issue with OpenShift on vSphere and well, hardware version 14. So we'll take this bug if you haven't already, Chris. We'll paste that into the chat here. You can dig through this. You can see that effectively this is very similar to what we saw back in the early 4.7 days as well where there is an offload issue causing packet loss. That is very familiar. Yeah, and this one, if I'm being honest, I have and have had a 4.8 cluster since 4.8 RC3, deployed it new with RC3 and then I've updated it to every version since. And you're on vSphere. On vSphere and I haven't hit this one, knock on wood. So it seems to be a, but before it was very consistent. I couldn't even get a cluster deployed. None of the, the SDN basically wouldn't come up because it was losing so many packets. And this time it seems a little more inconsistent but you can see going through that some folks are having it. I think down here at the bottom is some suggested workarounds. Anyways, if you are curious, if you are trying to or attempting to deploy into vSphere, 4.8 into vSphere and you're having issues, definitely give this one a read. But this is currently the main blocker for those 4.8 updates. And just keep in mind that as soon as we can, they will release those updates. They'll make the updates available and all that other stuff. I do appreciate that engineering is being, sometimes I won't say overly cautious, but they are being appropriately cautious with these. We don't want to accidentally disrupt somebody's production cluster or anything like that. So, seeing that, if we look at fast, let's assume that the fixes in fast or in 4.8.9, seeing that we're at a .9 why release before we unblock and I'm just, I'm not saying we are, I'm saying if we do, upgrades from 4.7 to me, that's a good thing. That means that we're relying on that telemetry, we're making those decisions to consciously not disrupt folks. So is it just for upgrades or fresh installs? So the bug applies to both. Yeah, so if you do a fresh install on V-Sphere, you have the potential of encountering that issue, but it is the reason why upgrades are blocked. Yeah. Our hope says the 4.8 Zs have been a bumpy ride though, .9 feels more significant and more stable than .5. So I'm curious what problems you were hitting, our hope nine and if you're on any VMware hardware. Yeah, and so internally the engineering folks have a blog. So for any Red Haters that are listening, reach out to Chris or I and we'll shoot you the link to that internal blog that they have that goes through like details of releases. So they have like update metrics and success and failure rates and like all kinds of other stuff that feed into that. It's really good information. Yeah. But like 4.6 or 4.7.25, I think was and 4.8.8, I think we're both pulled at the last minute because of things. And like you'll see these or 4.7.27, maybe it was. Anyways, it doesn't matter. You'll see all of that information inside the to your point, our hope nine. It's been a little bumpy. There's been a few things that they have found, but fortunately most of them have been in the candidate stage. Do remember though that both stable and fast are fully supported. So if there's something that you absolutely need in 4.8 and you want to do that upgrade to 4.8, from 4.7 to 4.8 today, you can use the fast channel. There are known bugs and stuff like that. And if you encounter one of those, I'm sure support will do their best to offer workarounds and all that other stuff, but yeah, you absolutely can fully supported upgrade to 4.8 today using the fast channel. Cool. So the second thing that I wanted to talk about here, I actually don't have a tab open on this one, but somebody was asking on the OpenShift SME mailing list, basically saying, hey, my customer wants to know should they use iSCSI or NFS or Fiber channel for their OpenShift cluster? And- Always a great question. Yeah, this is a fun question for me. So I came from a storage vendor, right? My previous employer was a storage vendor and we would get that question pretty frequently, especially because I was one of the VMware specialists, right? My focus point was VMware on storage and we would get the same sort of thing. And my response for years has been, whichever one you like the best, whichever one meets your needs. And it's funny because there have been amongst storage folks, protocol is almost like the VI versus EMACS type of thing. There can be a lot of, I'm gonna say passion involved in those decisions. Yeah, that's for a number of different reasons. So Andrews take, it entirely depends on whether or not it is capable of meeting your needs. NFS has good things and bad things, right? Bad thing, there's multiple versions, does my application work best with V3, V4, V4.1, 4.2? Whereas, I SCSI fiber channel, it's blocked up, right? Block is block. NFS has good things. The manageability aspect is pretty nice. I can go in and I can resize that NFS exports and it automatically shows up. I don't have to do anything. I don't have to resize a file system. I don't have to worry about any of that craziness. So they're definitely good and bad. I will say that with the IP based solution, so I SCSI NFS, and I hope in 2021, after 15 years of virtual machines on NFS and I SCSI that we've learned these lessons, but the network team needs to participate in those discussions. They need to be aware of latency requirements, throughput requirements. Big packets. Yeah, reliability requirements. I remember a lot. Yeah, back in, gosh, it was 2008, 2009, I got into an argument with my network admin because they're used to this. You did back in the year. Well, we need to apply an OS patch to our router, so we're gonna apply it and we're gonna reboot it. And it's a couple of packets. That's what TCP is there for. It's designed to retransmit if a packet gets dropped. It'll be fine. No, no, not when there's a bunch of IO blocks that are flowing through there. The operating systems that I remember, REL in particular, drops a few blocks and then marks all the file systems read only. And then you gotta go in and either reboot all the VMs or log into each one and unmark, put it back as read-write. So, yeah, but as always, same story, different topic. Work with your peers in storage, work with your peers in networking, make sure that the requirements are known and understood so that everybody can architect appropriately. We've got a great question in chat. Okay. Rafael says, every three to five days, my OpenShift cluster starts having issues and the OpenShift API operator and authentication operator go down. Is there an easy way to restart them? There's always an easy way to restart them, but I'm more concerned about why they're going down every five days. So when you say operator, do you mean literally the operator or the operand? A pod underneath the operand, yeah. Because that would be my question. So the operator by itself is tangentially related to the operand. So this is getting into our topic for the day, but that's okay. And operator is itself just the framework, the automation for deploying, controlling, managing the operand, which is the instance of the application. So in this case, let's say the OpenShift API operator goes down. That alone could be harmless. It's if the API application, the OpenShift API pods start bouncing around, that's when we should definitely be concerned. Yeah. So Rafael's on four, six, four. And Christian says, it sounds like a storage issue slash at CD. So I'm curious, if you do have a dashboard for a CD built into the cluster, have you looked at that Rafael? Yeah. Because that does seem like a like form problem. There's a couple of things that come to mind. So one, it could be a network issue. So most likely something like drop packets or if we're talking, you know, going back to that vSphere issue. So keep an eye for that. In that instance, I would look at the logs for the API server pods as well as what was the other one, authentication operator. So look at the logs for those. Also look at the Kublet and the system logs for the nodes. So if it starts dropping packets, you'll see those show up in the system logs, for example. So he says the operator status shows availability false. So that means the whole operator failed, it sounds like. It could be in transition. So if you do like an OC describe CO, like OpenShift API server, Yeah, show it off. Let's see. New share. This guy. All right. So if I do an OC get CO and let's see if we look at OpenShift or see describe cluster operator, OpenShift API server. So up here in the status conditions, it'll show us some things that are going on. So if you start to see issues, if it says, you know, not expected or whatever, it should have some sort of error inside of there. You can also look down here in the related objects. And this is an easy way to find out, for example, which namespaces will have pods related to this service. So down here we've got resource namespaces, name, OpenShift config, OpenShift config managed, OpenShift API server operator, OpenShift API server. So you would want to dig through each one of these namespaces, look for the pods that are in there, check the logs for any relevant looking pods. So let's go to the OpenShift API server. Gracefully shutting down and restarting usually fixes it. If you do that to the VM itself, when you're shutting down exactly. So shut down the whole cluster? Like that's not how that's supposed to work. No offense. If you haven't yet, Rafael, please open a support ticket. They'll track down. So they'll do a request a must gather. They'll track down all that other stuff. 464, it is relatively old. It's early in the 46 line. But yeah. So we'll just take a look at the logs for one of these. Oh, I want OpenShift API server. Yeah. So anyways, this is where I would start digging around if it's an API server issue of checking for issues inside of there. So yeah, he reboots the whole cluster to get it to work again. Okay. So yeah, your issues are gonna be living somewhere in the logs for the operator itself, I think, or at CD performance that Christian said. But I'm trying to think of a storage situation where all of a sudden at CD would get wonky. It could be hundreds of things, right? So many, it's, yeah, three to five days, did somebody scan your cluster and mess up an open port or something? I encountered one where there was one administrator who was configuring all the backups. And he configured all of them to start at like 1 a.m. Because it's the middle of the night, right? Nobody's doing anything. Well, which is true, except if 30 servers all back up at the same time, suddenly everybody's doing something. So that would cause a huge surge in traffic across the storage, the network, the compute, the backup system, and all of those other things. Yeah, any number of things can affect that. Okay, so, Raphael, please don't hesitate to keep chatting in there. We'll keep trying to address that. What was the next thing? So the next thing, so storage protocols, use whichever one works best for you, take into account kind of end-to-end, everything that needs to be inside of there. One of the things, like, again, going back to performance, do you need like 100 gigabits of throughput between two endpoints? You know, if so, then sure, you can achieve that with 100 gigabit ethernet or you could aggregate multiple fiber channel links together. On the other hand, if you need 11 gigabits of throughput and you've only got 10 gigabits of network, well, you're never gonna hit 11, whereas with fiber channel, and it's just link aggregation, LACP with networking, isn't a, it's not an addition. A single flow can only achieve the maximum throughput of a single link. With LACP, are you sure about that? Yep, so one point to one point can, we'll always go across a single link. Single link, yeah. One point to many or many to many will distribute across all of them, so your total throughput is more, but for an individual flow, it maxes out at a single link. Fiber channel, on the other hand, works the way you would expect with a link aggregate, of it is an addition. If I link, if I aggregate four 16 gigabit fiber channel links, I can do 64 gigabits of throughput across that aggregate on a single flow. Okay, so moving on. Choose wisely. Yeah. But so we've got, it goes back to, you need to know your applications to understand what your storage names are. Exactly. And in that same vein, so what happens if I need to add a second network adapter to my cluster? So we see this pretty commonly. It's a relatively expected, again, common pattern with virtualization especially, where my IP based storage will have a dedicated network. Whether it's a separate physical network or whether it's a separate logical network, a separate VLAN, it's pretty common mostly because, at least in my experience, both iSCSI and NFS are unencrypted traffic. So folks want to have some protection on that. So we have this KCS, and I will post this in if you haven't already, it doesn't look like it. So we have this KCS, that explains, that's okay. So it explains how to add that second network adapter anytime you're using a machine set. So whether it's IPI or UPI with machine sets, it explains how to do that. With UPI, without machine sets or with a non-integrated install, it's literally turn off the VM or server, add in the new network adapter and then turn it back on, and then you configure it in the same way. Regardless of how the network adapter got there, you configure it the same way, which is either DHCP or static network configuration. Static network configuration, if you're doing it at install time, I would recommend either the live ISO, booting to the live ISO using an MCLI or an M2E, or using kernel parameters. So you can use kernel parameters, configure more than one network adapter. Or you can use the, especially if you're physical and you have something like OpenShift Virtualization deployed, the NM State operator is fully supported with OpenShift Virtualization. I think it's still tech preview without, if I recall correctly. Okay, that makes sense, I think. So I guess we could, we could look real quick, right? Oops, I'll save, select all. Right. Let's go to here. Oops, nope. So if we go to networking and we scroll down here to the NM State and go to about, yes, still tech preview. So not fully supported without OpenShift Virtualization yet. Cool. So the last one that I have comes from our friends at Nutanix. Yeah, so a few weeks ago, if you didn't hear or if you missed it, Red Hat and Nutanix jointly announced a partnership around doing better integration with OpenShift on Nutanix. So it works exactly as you'd expect. I think at the time I shared a link to their documentation, which you can find in GitHub here. I shared it. Oh, thank you. So the core GitHub repository here, this is the instructions for deploying OpenShift to AOS. If you look in here, there's a branch for Calm Automation. So if you are using Calm on your Nutanix platform, there's a set of blueprints in here. So you can use those to deploy OpenShift clusters pretty easily inside of there. So do note and this is explained throughout their docs and everything. Today it is a non-integrated installation. There is no cloud provider for Nutanix, but there is a CSI provider, right? There's lots of other stuff inside of there. So yeah, it's a good platform. I've been experimenting with it a little bit and it does exactly what they say it does. And that's the best I can hope for from any infrastructure. It's predictable, yeah. Let's see. So yeah, FCOE, as far as I know, it should work. Because it's the protocol, it's just... Yeah, I think... It's the standard to what it's writing over. So, and don't quote me on this. I would have to find out. I think the issue with FCOE is oftentimes drivers are for those adapters are not in the core REL and therefore CoreOS, REL CoreOS distribution. So you would need to add those drivers after the fact, but after that, FCOE looks and behaves exactly the same as regular fiber channel from a storage perspective. So I'm waiting for somebody to ask about NVM. NVM over fabric, NVMOF. Because, so remember, NVM is a protocol. Yeah. It can, most of the time we think of NVME, like as a little device in our computer. M2-2280 thing that we slide in and store stuff on, yeah. It can absolutely traverse, it's just a protocol. So just like you can put fiber channel over Ethernet, or you can do Ethernet over RDMA, you can do NVM over fabric. So I don't know whether or not that's supported an OpenShift or Kubernetes, but I do know... I don't know. I used to... I had a conversation with a storage vendor who was doing something like single digit microseconds of latency for storage. Wow. Using NVMOF. Wow. So NVM over Ethernet and other networking. Yeah, I don't know the details there. That's an interesting one. I should reach out to some of our storage partners and find out what's going on there. Yeah, I was about to say, right, like I'm curious what's in the works there because that could be an industry game changer. Well, I know NetApp and EMC both have NVM over fabrics products. Those are the only two I'm sure of. Whether or not, again, whether or not they work with OpenShifts, that's an unknown to me. I'll track that down because that's interesting. That would be, yeah. I would like to see that demonstrated almost, right? Yeah. Bring some of our friends down. Let's see here. Okay, so that concludes my top of mind topics. We went a bit longer than normal, but that's okay because it's really good questions. Really appreciate you all participating. Yeah, and mad rush 21, if you have a question not related to today's topic, feel free to drop it in chat. We can tell you, you know, hey, email us if we can't answer it. There's no holds bar. Doesn't matter what the topic is. This is an office hour. So ask your questions. So microseconds of latency. Trading is one of them. Usually that one comes up pretty often. Yeah. So if you're doing it with microseconds of latency, it's all nine times out of 10 is trading. Yeah, those folks who like, they will deliberately, you know, find and use data centers that are as close as possible to Wall Street in order to reduce their network latency to the trading systems even further to all of that. Oh yeah. I've read all kinds of stories about the early days of New York City and fiber and all that stuff. Yeah. Our hope nine, will NetApp get tried an operator on operator hub? Yes. So we are working with them to do operator certification and all that other stuff. So speaking of which, I'll plug it because I can. So if you're a NetApp customer, keep an eye on your inbox or keep an eye on netapp.com. So we're actually doing a joint webinar with them in the future for Astra and OpenShift and we'll also be doing a live stream. Nice. Yes. Yeah, we haven't scheduled a live stream yet, but we are planning on doing that. And of course. Yes, it's definitely in the works. Yeah. I have read the email with, yes. Yep. So as the host with the most, if you will, I'll crystal manage all of that stuff. But yeah. So keep an eye on that. I think the webinar is towards the latter part of September. So yeah, well, lots of interesting stuff happening there and including operator certification. And at that point, they'll be an operator up. Anyway, from the OpenShift CLI to obtain the image ID of a running container. Oh, from the OpenShift CLI, the image ID of a running container? I'm not sure what image ID you're referring to. Yeah. Like a Podman PS or a Docker PS would show, but are you trying to track like what version of an image you're using? OC employment? And what do you mean? Deployment? Deployment? Yeah. Because you might be on mobile. Auto-correct. Gotcha. I mean, just doing a describe of a pod, we can come up and we can find, you know, which image it's using. As soon as I find the right thing here, right? So here's image and image ID. So both of those are in the standard output for OC Git layer digest hash. That I don't know. Like to get your digest. Yeah, so you might have to debug into the host and then use cryo in order to find that information. Do you want to go on that real quick? Yeah, so like here's the container ID up here. Yeah, I think you can use cryo. So Cree-cuddle, Cree-control, or cry-cuddle. Anyways, in order to do basically the same thing as you would with, you know, podman. So Cree-cuddle, cry-cuddle, describe on that pod. The, it's a neat way to verify that the containers I sent to the customers match the containers I'm running in production. So I think the answer to that is to use the digest mechanism. So that way you are sure that the digest matches when you do that image deployment. And then there's, you know, it hasn't been modified or otherwise mucked with. You know, OCP storage class, which is on VMware vSphere use only data store instead of data store cluster. So Turgan, apologies if I'm butchering your name. So that is a limitation of both the entry and the CSI provisioners. They don't work with data store clusters. So for the entry, even if there is an RFE, it won't happen quite frankly. So we are in the process and not just we meaning the entire Kubernetes community, Red Hat included, of migrating away from entry provisioners and towards CSI. So what you'll see is everything becomes CSI in the future. So whether or not it works, that works with a data store cluster would be an RFE for VMware because they maintain the CSI driver. And I honestly have not looked to see whether or not that's there in that case. So I did do some testing on my own as to whether or not it works. And using a data store cluster does not work. You can sort of get the same behavior using the tag mechanism. So let me switch back to my browser window. While you're doing that, I'm getting the link to the six-door show we did yesterday with Luke Kynes because using six-door would probably help you there as our hope nine points out. See, I need zoom windows to move out of my way. There we go. So I can find my bookmarks. There we go. All right, so let's bring it over to the screen so we can see it. So this is the entry, the documentation for the entry storage provisioner, which is the one that is deployed and configured by default with an OpenShift deployment. And if I come in here to configuration, one of the things that is an option in here is the ability to use a tag, a provider, blah, blah, blah. Am I gonna find it? It doesn't look like it. So one of these in here, default data store. Anyways, there's a way that you can use a tag in order to select, have the provisioner say, use any data store that has this tag associated with it. It technically works, but it behaves a little weird. By that, I mean it will first, like it will randomly select one of the data stores first and then all PVCs will be provisioned into that data store until it fills up and then it'll randomly select another one and it'll go in that sequential manner. So you can have unexpected capacity consumption, if you will, when you're doing it that way. So if you're, and it doesn't stop provisioning those until the data store reaches 100% either. So if you have, say you forget to set a limit or a quota or something like that and you have somebody come in and provision like a 60 terabytes, well, no, that wouldn't fit. I think it's 62 or 64 terabytes minus one byte or something is the maximum data store size. But if somebody filled up that data store, especially if it's thin provisioned, you could end up with a lot of potential issues. And just to clarify what I mean there. So with vSphere in a thin provision data store, everything is great until the data store hits 100% full and then nothing can write any data. So any VMDKs, any VMs that are in there come to a screeching halt at that point. So if you have, let's say an overzealous, Kubernetes administrator who starts provisioning PVCs inside of there and accidentally fills up the data store, you could impact every virtual machine that has a disk inside of there. Yeah, across the entire fleet of virtual machines that you have, yeah. Yeah. I'm gonna trust that there isn't any other questions. No, we're talking about making this a two hour episode. But we can't do that today. We have a hard stop. Yeah, I saw that on the email you sent out this morning about what's streaming today. I should bring it to another point. I should make that a proper mailing list so that external people can subscribe. Yeah. All right, I'm going to go back to sharing my terminal window here because that's what I want to use to illustrate today's content. So operator lifecycle manager. So way back in episode, gosh, like, oh, wow. Now you're just, I want to say, it was in the late teens, I think. We talked about maybe even earlier than that. We talked about operators and how they relate to OpenShift and OpenShift administrators as a whole. I'm gonna paste the link to that in the chat here. So digging it up. Yeah, so operators, as you pointed out, when we opened are a critical component of OpenShift, right? They are, I think most of us would agree that operators were the major change that led from version three to version four of OpenShift. So to say that they are important to OpenShift is a bit of an understatement. And in that... Very important. Yeah. And in that stream, we walked through what is an operator? Like why is it important? What does it mean and all of that other stuff? But what we didn't talk about is what's the underlying stuff, right? What's going on in the cluster? How do operators tick? Right. How do I instantiate operators in a meaningful and consistent way across my fleet? Exactly. So the answer to that is operator lifecycle manager. So anytime I do like an OC get CO short for cluster operator or we have this big list of things, each one of these is an operator that has been deployed by OpenShift to function or to provide a core set of capabilities to the cluster. So we were talking a moment ago about the OpenShift API server, right? But you can see there's all of these that exist inside of here. The core of all of that is managed by OLM itself, operator lifecycle manager. So it's funny that if you were to dig into the install process, what you'll see is that, and we talked about this in the installation episode that I don't have a link to, but effectively when you deploy an OpenShift cluster, we deploy CoreOS and it instantiates a Kubernetes cluster and then it puts operator lifecycle manager in there and then we use OLM to deploy all of these other components. So OLM is really what deploys OpenShift itself, each one of these is an operator that's managed through OLM. Operator lifecycle manager is also one of the core critical components for things like updates and upgrades, right? We, when you do an update, the cluster service or the cluster version operator, CBO goes in and basically says, hey OLM, go update all of these operators and then OLM does what it does. So what does that mean? What does doing what it does mean? I feel like I'm switching between windows a lot today. You're okay, you're doing a lot right now. So that's where this particular image comes in. So when we look at OLM, OLM provides all of these functions, all of these features to those operators that are deployed. So for example, this dependency resolution and collision detection are really important when we do, for example, an update. Hey, I need to update OpenShift API server. Okay, what dependencies does OpenShift API server have? Okay, what dependencies do they have? And it generates that lovely word that we all, well, I've learned from the installer actually, I directed a psilocetic graph. And say that fast. No, no, I'm pretty sure I butchered it saying it slowly. So it is this giant list of steps, right? It is the order of, you know, here are all the things that we're going to do. Here's what needs to be updated. Here's what needs to do this. Here's what needs to do that. Wait, wait for things to come up and all. Exactly. And it uses, you know, feedback from those deployed pods and applications. So that's when it says, you know, hey, I'm in a good status. I'm in a functional status. It moves to the next step, so on and so forth. Right. So if you remember, was it last week? I think it was last week when we had Mark Russell on and Mark did that little aside of like, hey, you see in the release notes how it says that now, now the name, the machine config operator, like it doesn't report that it's done until it's actually done because that was causing issues. Like that, that's one of these things. Yeah. So yeah, all of that plays an important role in managing the cluster as well as any application that is deployed using the operator paradigm inside of the cluster. So this is my really long-winded way of basically saying, let's dig in, right? Because operator lifecycle manager plays such a critical role, let's look at some of the components, let's look at what all of this means and what we can do with it. And Chris, I see you hopefully posted the docs links. I had posted the docs. It didn't paste how I wanted to, but, you know, and then it decided to rebroadcast itself all over the channels twice and three times, you know. That's okay. Have docs, we'll travel. So the first link that you posted in there goes to the operator framework.io site. Which is a grad, no, not graduated. It's a CNCF project. Yeah. So if I click this documentation link up here and go to OLM documentation, this is the specific link that you posted inside of there. This is really great. This is the, you know, the upstream OLM documentation. There's a ton of information inside of here about, especially if you're developing an operator, you know, all of the requirements that you have. Like here, here's how to create an index image inside of an operator bundle. And really great source of information, very low level information can be extremely helpful if you're troubleshooting issues, basically required if you're creating your own operators. So if we want a little higher level, we want to rely on the OpenShift docs. So we'll come here to the docs. And if we scroll down to the handy dandy operators section, and then we go to understanding operators, and finally operator lifecycle manager. We have a series of documentation in here that walks through a bunch of different concepts a bunch of different things that exist inside of here. So this is where I want to start is to look at some of these things inside of a cluster and how does that relate? What does that look like across different things? So the first thing that I want to talk about here, the first thing that I want to address is an operator and OLM managing that operator is not the same thing as an operand. I mentioned this before, I said that we were talking a little bit about today's topic of OLM manages operators. An operator is, you can think of it as being effectively automation that translates an application into Kubernetes objects. So I have, let's say I have an operator for, well, let's not say I have it, we literally have an operator for the registry. So when you say create me a registry instance, that's not OLM that's managing that, that's the operator itself. The operator creates an operand, the registry, and it manages that. When you change the configuration of that operand, the operator changes that instance of the application. If I create new instances of the operand, so effectively new instances of the custom resource definition, it takes action against one of those, maybe deploying a new instance of the registry, whatever that happens to be. OLM on the other hand has the same responsibility for the operator itself. So when the operator needs to be updated, the automation that then controls those operands needs to be updated, that's where OLM comes into play. So I know that that can be a little confusing. It's hard for me to describe. So I'm sure it's hard to, to, or it's taken me several iterations to get to that, but I still consider not great description. So just keep in mind that they are separate things. And OLM operator lifecycle manager does not control application instances. So when we look at an operator, it consists of several different things. So the first thing that any operator consists of, of course, is that application that, or that automation around managing applications, the controller, if you will. And that controller is literally a pod, right? It's containerized code. Just like if we, let's switch over here. And oh, I didn't install the thing today. I should have installed it. The thing? Yeah, the terminal operator. Oh yeah. So if we come back over here and see projects, am I still an API server? So let's see get pod. See projects, open shift API server operator, not pod, pod. So we can see that we have this open shift API server operator pod. So this pod represents the code, the automation that is managing the instances of the application up here. So I don't know, but let's say that there is a scale property, right? I can edit the configuration of the API server instance and change the scale from three to four. The API server operator would be the one responsible for that. OLM is responsible for managing this operator and doing its life cycle management. So replacing it with a new version or restarting it in the case, or recreating it if it gets deleted. Most likely this is a deployment. So it wouldn't be responsible, yes it is. So it wouldn't be responsible for restarting it, rather that's an intrinsic Kubernetes function with a deployment. Oh, how do you get the hat in your prompts? Yeah, Christian, if you have memo lists, you can do it on memo list, but effectively you set the Unicode character, so hat, and then you just set your prompt to use the hat. And then it's red because if you use the overpass monofont, the hat icon is a red hat icon. Because remember, overpass is one of red hats fonts. It's the old red hat font. Yeah. I'm going to need that. So yeah, I couldn't sleep last night. I was up at like 1030 and was fiddling around with my server. So. Okay. I like it. You know, it's the little things in life. So that is operator operand. And where did my other window go? So let's look at what all of this looks like in the background. So let's do OC get CO. And then I want to take the API server. And then we already did a describe on it. I showed all of those different resources that it's responsible for, right? Names, names, names. And all that other stuff. So it's an easy way to kind of start the troubleshooting process. In addition to just doing that, describe and looking at any error messages that show up in the status. So that is almost always the first place that I tell folks like always start with doing just a simple OC describe cluster operator and operator name. If it's one of these operators. If it's not one of these operators, it's not one of these operators. Then the best place to start is probably with the operator pods, but not necessarily. So for example, let's go to, let's go to the marketplace. So we see. Project open shift. So Caledas. If that operator. Goes down. Will you still have that? So Caledas. If that operator. Goes down. Will you still have the API server? Technically, yes. If the operator goes down. Yes. If the pods go down. No. So if we. And I think I, yeah, I cleared it. Yeah. So proceed gets pods. In open shift API server. So if these pods aren't running. Then that's an issue. Yeah. If. This pod isn't running. Then. It's still an issue, but it doesn't mean that the API server. Functionality is down or an opera operable. Yeah. Words. Yes. Words. So. Words. I know. Again, I've been. That's a great question, Caledas. I think. Folks often don't understand that like the operator instantiates and maintains. Where, you know, it's sometimes like, oh, the operator does everything. Well, if that operator pot is down, does that mean everything else is down? No, it doesn't necessarily. Yeah. It could mean that, but it's not a one to one kind of ratio there. Yeah. And it would mean things like. So let's say, again, in this instance, API server operator. Let's say it has a configuration for scale. If I change that config, if this. Operator instance isn't running, then it's not going to enact that configuration change. Right. Exactly. But again. It won't. If these API server pods start failing. They're a deployment there. They're running is not dependent on the operator running in that instance. Their configuration is. So what you'll see, and when you do updates, what you'll see is. OLM will update the operator code, the operator pod, and then the operator will take action to update the application pods. So you'll see kind of a multi-step thing that happens there. This is also if you are deploying applications, if you're using say crunchy DB or, you know, my SQL or the logging operator, any of those. So that has to do with the approval policy. So if we, and I'm going to switch back to my browser here. So if we go to operators and operator hub. Let's install the terminal operator. Cool. So we have this. I like the web terminal to be honest with everybody. It's kind of it. Yeah, it's functional. It does what it needs to do. Yes, it is more functional. It does update approval. This is whether or not when the operator gets updated, whether or not it will automatically update any instances of that application as well. How can you check whether an operator is managed by OLM or not? So that is excellent question. So one of the things, one of the questions that I had pre-staged here is, do you need OLM to use operators? No. And the answer to that is absolutely not. So the best way that I have found to do that is to query the catalog and see which ones are installed. Can you deploy an operator outside of OLM when OLM is also installed? Yes, you can. So it is possible to have both. Yeah. Yeah. So let's. We don't necessarily recommend it, but yeah. Yeah. We definitely don't, don't recommend it. So let's go back to this guy. So let's go see, get, um, Just so I don't embarrass myself here. Uh, catalog source. So first we have multiple catalog sources. So remember OLM, in addition to managing the deployments of those operators, it also provides that catalog of operators that are available inside of here. So up above in my cluster operators, I see we have this catalog. Operator. So that is responsible for using the package server, server. So basically what happens here, the order of operations is I add a new catalog source. So let's do an OC describe catalog source. Spell it correctly. You can certified operators. So this catalog source is actually an image, right? The certified operator index. So we'll pull down that image. And then if we scroll back up here, the package server is the one that's responsible for then serving that image, that gRPC endpoint, back to the rest of the cluster. This is so that when you browse the catalog, so if I do like a OC get package manifest. So when I query this to say, Hey, what operators are available to me? It's not reaching out to something external. It's asking the cluster. So that this is coming from package server operator. So essentially, if you have an instance of one of these deployed, so OC gets CSV, right? So now I've got two CSV is inside of here, right? This is where we'll see those from inside of here. So I deployed the web terminal that CSV. So cluster service version shows up inside of here. So let's pick on web terminal. So OC gets package manifest, grab terminal. So OC describe. I cannot type and talk at the same time. Probably not helping by mumbling in the background. No, no, you're fine. If we look inside of here, this is effectively the CSV. So if we come up here, right? Current CSV, CSV description, right? All of these things. So the CSV, the cluster service version is what describes all of the things that OLM needs to create to deploy the operator. So for example, here's the custom resource definitions, the CRDs that it's responsible for. See if we scroll down here, there will eventually be box and everything. Yeah, the images that it needs. So yeah, this, when we open it up in the web terminal, this is the stuff that we saw inside of there. So the CSV is what describes here, all of the things that make it up, including things like role based access controls. Right. If I do, hold on yards. Yeah. Yep. So do an OC describe CSV. Web terminal. So here are, here's our dependence, right? We can see exactly which our back roles need to be created, which service account it's going to use. So let's take a step back. Let's look at the overall flow. So now I've got, yep. So I've got a catalog that says here, all the operators available to you, right? And in those package manifest, it describes the CSV. When I say I want to deploy that, I create a subscription. So that subscription says, hey, OLM, I want you to deploy this. So going back to your other question, this is how you would see which ones are or aren't managed by OLM through the subscription. So I've got this web terminal. Subscription. So OC, describe. I don't think I've ever been asked this. Cluster operators are managed by OLM as well, right? Correct. Correct. Yeah. Okay. I didn't, there's not, and there's not a separate cluster operator OLM. There's only one OLM on the cluster. Correct. Yeah. So this subscription, if we can scroll up here to the spec is basically describing, you know, hey, the catalog has this operator. So the red hat operators catalog has this operator web terminal, you know, please deploy it using this CSV version. If I want to provide a specific one, that will result in an install plan being generated. So you can see the reference here. So OC get install plan. So I now have this install plan. The install plan is, so the catalog operator will look at that CSV, say I need to create these RBAC, you know, instances. I need to create these pods. I need to do these things. It will put all of that into this install plan. So yeah. And then OLM will go through and make sure that all of these things exist. So catalog operator takes the CSV and it turns that into Kubernetes objects, which are then deployed and managed by the OLM operator. Slightly confusing, but that's the way that it works. So we see these are all of the things that make up our operator deployment. If you are having issues where an operator won't deploy, this is where you want to start looking. So if you, if I were to click that, you know, hey, deploy the web terminal operator, and it just refuses to deploy for some reason, check the install plan and see if there's any errors inside of here or associated with any of the objects it's trying to create. Let's say the operator fully deploys, and I go and create to create a new instance of in this case, the web terminal rate. Hey, web terminal operator, create me a new web terminal. If that's failing, then you would check the operator logs or the pod logs. So sometimes it's like a crash loop back off. Hey, I can't find this image. That would be reflected in the created object logs. So I want to say two things here real quick. Coming up next on the channel, we are actually going to dive into writing a Java operator. I think it's important to point out that you don't have to write your operators and go. You could use Ansible. You could use Java, bash, Python. There's all kinds of frameworks out there for writing operators in a language of your choosing. So don't be scared of operators. You can make your own. Yeah. And to be clear, right. Red Hat has, you know, we support the operator framework, which focuses on Ansible helm and go. But as you just pointed out, those are not the only three options. Not the only things. So one thing we did not get time to do, I'll include a link in the blog post. So just a reminder every week after the app, after the stream, we have a blog post that summarizes everything with links to where in the stream it happened. So in that blog post, I'll include a link to the comparison between operators and helm, like when you should use one versus the other strengths, weaknesses, that type of stuff. There's a couple of blog posts about that as well as some other information. So we'll be sure to include that. I don't know if you can write operators in Pearl, but I bet someone has tried. Yeah, I'm, I like Pearl. I like to use all the keys on my keyboard. Pearl's great. Yeah, exactly. Right. There you go. So yes, real quick, I know we have a hard stop. Thank you everybody for tuning in today. If you have any questions, if there's anything that we can help with, please don't hesitate to reach out. You can find me at practical Andrew on social media on Twitter, just like my username here in Twitch has been and rebroadcast everywhere. You can also reach out to me via email Andrew dot Sullivan at red hat.com. Please don't hesitate to reach out. Thank you for posting the link, Chris. Join us next week. Next we will be talking about edge with Mark Schmidt. So fun. Yeah. And with all of that being said, thank you so much everybody for watching today. Thank you so much for your questions and interactivity. It's been great. Yeah. Great. And Chris, you stole my shirt. So I'm sorry. You know, it was next in the line. Yeah, I know how that is. Yeah. All right. Thank you all for tuning in. We really appreciate it. Stick around for OpenShift Commons briefing that'll be starting as soon as I can log off log back in kind of thing. Thank you all. Stay safe out there.