 Good morning. Good afternoon. Good evening. Wherever you're hailing from welcome to another episode of ask an open shift admin. I am joined today by Andrew Sullivan, the host with the most, I will say today. How about that? Wow. Throwing me throwing me a curve there. So yeah, with friendly names for everybody. How about that? You know, well, long time no see. It's been what two, two weeks, three weeks, something like that. Yeah, you know, I had a had a procedure on my neck and that went well and then I went on vacation to now kind of have some fun and then recover in the same time. So yeah, that was all very, very nice. The stream was away for a little bit. We were gone for summits. So if you missed summit, it was a virtual event, a free virtual event. So if you happen to miss that, be sure to register, check out all the content. There's hundreds of sessions, including I co-hosted a and ask the experts. So you can always come in, come and see us there as well. And then last week, you know, our apologies, we just had some technical difficulties where the audio just would not cooperate. So about 20 minutes and we just decided to pull the plug and canceled canceled last week's episode. But it worked out because that means that our guest, who is Brian Bodwin on our consulting team, I think, is able to join us this week. He wasn't able to join us last week. So, Brian, if you don't mind introducing yourself. Thank you. I'm Brian Bodwin. I work with the technology practices at Red Hat, specializing in OpenShift and currently a principal consultant. I've got several clients currently, mostly in FSI, financial services industry. And the OpenShift is pretty much what I work, live, breathe, eat, try to eat a little better than that. I got to say, I'm feeling a little, a little jealous today between you and Chris and your rig setups there with the cameras and audio and everything else and your beard is just absolutely majestic. Yes, you know, I'm just, I'm failing on multiple fronts today. So as someone who can't grow a beard, I am definitely envying yours, Brian. Yeah, so thank you. Anyways, thank you very much for joining us today. Really happy to have you here to talk about if you have seen the show title certificates in OpenShift. It's funny to me because like as we were going through and I was doing all the prep for the show and I have a document that I use to keep organized with like the show title and the abstract and Brian was very helpfully going through and correcting like, no, we don't do this. This is the wrong term and I'm like, that's exactly what I want. You are like, that just means to me that you are the perfect person to have on this on this show. So don't don't at all feel bad for being like, no, silly, you're wrong. Shut your mouth. That's that's exactly what we want. Yeah, don't don't feel bad about questioning the host. Don't feel bad about that at all. Especially not me. My comment was internally screaming. SSL is a bit of a data term. It really is because nothing is really using SSL nowadays, right? But it's like the historic, you know, we went through three layers of SSL and now it's all transport layer security and you know, CLS certificate. Yeah, exactly. I am historic. Right. And, you know, I've got my fair share to from many, you know, hey, the search kind of expire in a day and we have no idea how to install it on the load balancer. Good luck. So I guess I'll use that as a transition or as our segue into, well, the show. So this is ask an open shift administrator. This is one of our office hours series of shows or live streams here on OpenShift TV. So our goal with these shows is to have a conversation with you with our audience. Please, at any point in time, ask us questions across any of the various platforms that you happen to be watching us on. Your questions will get rebroadcast to all of the others and we'll address those as they come up. So in the lack of questions and also to help focus your questions. We do have a topic every week as I said this week is around certificates and how they're used some things that you the administrator should be aware of inside of your OpenShift cluster. And that of course is why we have Brian here because Andrew obviously does not know what he's talking about when it comes to this topic. Before we get started though as is custom or customary I guess at this point we're what 32, 33 episodes in I tried to do this every episode. Yeah. So I have some things that are topic top of mind. Effectively, these are things that I have seen either internally or externally that I feel are relevant to you our audience and I tried to bring up, ranging from security advisories rate that might be important to just other things that are going on. So the first one of those that I want to talk about is something that I was tangentially involved in, mostly just as as a reviewer. And that is particularly for our VMware customers so OpenShift on VMware. They recently released a new reference architecture. So I'm going to share my screen real quick here. Just as soon as I find the right window. I believe in you. This is always the always an issue. Right. There's no consistency when you hit that share screen button. You have no idea what's going to happen. Yeah. Why can't these show up in the right order. And by the right order I mean the one that I want to logically in your head. Yeah. So let me grab this link and I will post it into the chat. Thank you. So this is a reference architecture published by VMware for running OpenShift on VMware Cloud Foundation on Dell EMC VX Rail hardware. I know that's quite the mouthful. So essentially it just walks through all the different aspects of how to deploy, how to use kind of some considerations, the testing that they did for OpenShift on that stack of technology inside of there. So if you aren't familiar with the whole VMware Cloud Foundation thing where it's the integrated stack of using what do they call it the Cloud Manager VMware. I think it's Cloud Manager something like that to do automation around deploying vSphere, including ESXi, vSAN, NSX or all of those things. Where as the administrator you go in and say I want to deploy a workload domain and then you go in and you deploy OpenShift on top of that. And that is in combination with there is also another reference architecture if you didn't know about it. Let me dig up that link. So they also have an SDDC reference architecture. So post that one in here too or post it there and then post it here. Yes, Twitch. I know that animated emotes can be disabled. It's been so annoying with that message lately. So you can see this one. I think there's a newer version. I need to update my link. It was updated last year. Almost a year ago actually. So I think there is a newer version of this out there. But just FYI, right? If you are a VMware customer, if you are looking for reference architectures, this is the place to go for those. So kind of bigger picture. You'll notice that Red Hat has not produced any reference architectures for OpenShift on X or Y or Z. It's roughly the 3.10 timeframe. So the rationale for that is we rely on our partners to do that work, right? They know their hardware, their software, their infrastructure, whatever it happens to be better than we ever could. So we help them with the OpenShift part. They do their underlying part and then they post that. And I bring all of that up because, and let me see here if I can find the right window. Because there is a blog post that talks about all of these. There's a lot of duplication of that if I recall. So, and this one I will post down here as well. So this blog post, you can see it's a year old now, but it actually does get updated. For example, last week they just added this NetApp reference architecture on here. So it does periodically get updated, but this is one place where you can go to find all of those reference architectures for the partners. Brian, I don't know the precise process associated with it. I know that there's a team of folks. You know, Dave happens to be one of those team members, or at least he used to be, that are responsible for working with partners and doing that joint effort, the review and stuff like that. So I don't know the precise details, but yeah, wouldn't surprise me if there's a little bit of redundancy inside of there. Yeah, well, that's one of the reasons why we're having our partners do it rather than have us provide one and our partners create a duplicative one outside of outside of red hat so leveraging that relationship and letting our partners inform us the best way to do it on their platform rather than a little bit of a back and forth makes that a lot easier. Yeah. And I also can only assume, you know, I have three kids. So if I, if I do something for one of my children, the other two are going to want the same thing. Right. Right. And I can only imagine that if we create a reference architecture for a partner that all the others are going to say well why didn't you do that for me, whereas we can work around those types of conflicts of interest if you will. We're asking the partners to create and maintain that. So anyways, so if you are curious, that's why you don't see red hat publishing reference architectures you see them coming from our partners. And if you do have questions like hey you know where's this you know does this partner have a reference architecture feel free to reach out to me, you can reach out email Andrew dot Sullivan at redhead.com or social media practical All right. The second thing I had here. So we talked about when to char was on the stream. Gosh a month ago now maybe maybe five weeks. Wow. Excuse me. We talked about how the red hat developer entitlements so if you go to developers that red hat calm. You register for an account on here. Right. So if I were to go to log in and request an account and all that you receive a red hat developer entitlement you can deploy up to 16 nodes of rel, so on and so forth. And all of that is free. Right. We don't charge for that open shift is included in that. However, at that time, we discovered that there was an error where it doesn't actually show up. So if you were to go to if you register for your account here and then you go to cloud at reddit.com slash open shifts and deploy your cluster and then it shows up in cloud manager there. You wouldn't actually be able to entitle it with anything other than the 60 day eval. So I think that's fixed. I haven't actually validated. If, if it's not there and you're trying to do that entitlement. Just be patient. Let me know and I'm happy to associate with the internal workings that you're having issues but it is something that is supposed to be there. And, and all of that. And if you're curious about, you know, hey Andrew, how do you know that that's there. So we actually have a thing called appendix one. I won't dig up the link because I don't know where it's at at the moment. But I'll include it in the blog post. There's a thing called appendix one to the entitlement legal documents that includes a specific description of like what all is included in that an open shift is explicitly in there. All right, and I know all of this because we got asked internally, one of our one of our field folks said hey, this isn't working this isn't here. So the last thing that I have to talk about is something that comes up semi regularly. So back on the VMware side of things. We've gotten asked a number of times about, you know, hey, can I deploy open shifts and use SRM VMware site recovery manager for disaster recovery or can I protect this is one that was that was asked yesterday. Can I with two sites, can I span a cluster across those two sites and use VMware fault tolerance to protect one of the control plane nodes so that it's kind of at both sites at the same time right. And the question was specifically, is this supported. And the answer is, yes, if you're using SRM if you're using FT it doesn't invalidate support. However, the bigger question is, do you really want to do that. And I say that because not that there's anything wrong with fault tolerance but fault tolerance comes with a performance penalty. It has specific requirements that has specific needs that are not insignificant. I did a quick Google search or go search the other day and it was something like an anecdotal 50% performance penalty, you know just out of the box 50% performance penalty. So for at CD, which is something that is already performance sensitive, very performance. Yeah, you know having, you know, okay, even if it works, it probably will work if you can meet the requirements. But it's going to be, you know, severely limiting your scalability of the cluster and stuff like that. And, you know, we talked about at CD in depth during that episode I'll try and remember to include a link to that when I talk about this in the, in the I'll just, that'll go out Friday morning. But just, yes, you can use that. No, it won't invalidate support. Just be sure to be ultra aware and be prepared for any ramifications of that decision. So, yeah, that's all I got. Chris, Brian, anything. Just following along with the chat here, a little bit. I'm glad. I've lost traffic. That's quite all right. So one of the questions was whether or not there was a difference in the self signed certificates and red hat open container platform from other certificates you might need to use a bigger cluster. Functionally, the certificates are no different than standard, you less certificates might get from a certificate provider. It's the usage of the certificates and the number of types of certificates that are in use, the purposes is essentially if you go to a CA externally. You've got certificates to identify servers certificates to identify clients code signing certificates. Other certificate purposes list is longer than I can remember off the top of my head. OpenShift does use mutual LS or mutual authentication between client and server. So when the client wants to talk to the server, it checks the server certificate to ensure it's talking to a trusted server and presents its own certificate so that the server knows it's talking to a trusted client. And that's not any different than any of the other certificates, the way they work in practice. But externally consumers work with the API or with the web console, the CLI, those certificates by default are generated from an internal CA. And we routinely replace those with something that signed either by a public CA or an enterprise CA within an organization. So that kind of leads into some of the other questions that were asked here like self signed certificates use it all externally, but by default, yes, until that cluster has been configured with those certificates. So Brian, if you don't mind, I'm going to interrupt you because I want to, I want to reset a little bit. And as my 13 year old reminds me constantly, I'm a dumb guy. Right. So is there, I guess, so in my understanding of open shift, there is effectively, I think, three primary things or three things that I know we use certificates for off the top of my head. Right. So the most obvious one is the certificate that the ingress controller uses for external communication. Right. So here I'm still sharing my window. So when I browse to the console here, the certificate that this uses is signed by that. Let's look at this right so it's signed by this ingress operator right here. So that's one of those instances. If now there's others where it's, I think, node to node communication. So for example the control plane talking to worker nodes. And then there is pods talking to pods. So if a pod is using, you know, if it's encrypting its traffic, it's using those, those internal certificates. And I know that some of these things change based off of like the route policy. Right. Is it, you know, re terminate is it pass through that type of stuff. So if you don't mind, can you kind of level set all of these things, you know, first of all, please, of course, correct me and tell me where I'm wrong. You're not wrong at all in any of this. The service signing certificates that can help encrypt pod to pod communication. I'll go ahead and say that's not really automatic. The certificate issuance is the trust is provided by CA bundle that's provided to the container but the application needs to be aware that those secrets have been generated mounted on the pod and that there's a trust bundle that's been exposed to the pod by Kubernetes itself. So I'll say it's not transparent unless one is using something like service mesh to wrap the pod or encrypted pod to pod communication. Using using the service signer certificate to sign the services or to provide pod certificates automatically within the platform that requires a little bit of user interaction but I probably digress. One of the things I like about OpenShift is that one need not concern themselves with a lot of what's going on with the internal certificates, the signing process. There's so much automation in the platform that it's typically sufficient to say that all of the communications and is encrypted except the application communication by default. That ends up being our responsibility for customers to turn on and manage and choose how they're going to manage that, whether they're using the service mesh operator and configuring that with or without enterprise certificates, or whether or not they're going to use the OpenShift router to terminate and if insecure communication within the cluster is acceptable, that's an option as well. I'll go ahead and stop and see if you've got any more driving points clarification. And that's, so you're talking about so service mesh there wrapping that traffic but that's different than, for example, at the SDN layer. And when we had Mark Curry on, we talked about the IPsec configuration with OVN Kubernetes. That's all handled a layer below the pod to pod traffic or communication, right? Absolutely. And it's interesting you brought up IPsec because that's not one of the things I have really thought about too much in recent times. In OpenShift 3, we could configure IPsec on the SDN using standard Red Hat Enterprise Linux packages and conventions for encrypting that traffic node to node. The pod to pod traffic in that particular case is not encrypted. We're just encrypting the transport layer on which the pods are communicating between each other. But with IPv6 integration, that network encryption comes along with the feature and it's available for use with IPv4 traffic as well. While it does use TLS communicate, while it does use TLS in order to establish and maintain encryption, it's not TLS in the same way that we're generally talking about with client browser to server type traffic or pod to pod traffic. So just to make sure I'm understanding IPsec at the SDN layer happening below the pods, they're basically unaware of it. I'm an old storage admin, right? So my brain, this is the equivalent of a encryption happening at the storage system. So it's encrypted at rest versus encryption happening between the client and the storage system, like in transit versus the application encrypting and sending it. So, yeah, there's different layers inside of there and each one of those is possible, we translate over to the network traffic realm. Each one of those is possible, it's just implemented in a slightly different way depending on where you're looking at. Absolutely, and there's a nuance to that. I'm encrypting at the SDN level, and I'm not encrypting at the pod level. Some of that pod communication may be in clear text between pods if they happen to be on the same note. And even before they leave the node before the pod traffic gets encrypted by the SDN, that could be inspected on the node as well. Okay, so that could have ramifications from a security posture from an auditing type of perspective. Absolutely. I'm going to jump over to chat because I see that there's a couple of questions in here, or comments slash questions around cert manager. So, cert manager, as I understand it is a third party, I don't know if it's an operator or so much as a deployment, I think it's a helm deployment. Yeah. So you're creating basically trusted certificates from, if I remember correctly, it's a let's encrypt automatically. So essentially you create a route and it takes action to request from let's encrypt that certificate and then it uses that certificate with the route so that, you know, as you see here in my browser right you don't get this not secure right you're not using a self signed certificate that you have to it automatically shows up as green. So is that accurate. That is really accurate. Last year raffaele raffaele Stasoli, one of my associates did a blog post on that and linked that. Oh, I made too many comments on that document yesterday. So that is one of those things yeah that's exactly the article I'm talking about he's got a YouTube video on there where he does step by step walkthrough of that. It's, wow it's two years, nearly two years ago. That's last year. So that's been available for a while and that's one of those things that not only does it work with let's encrypt but it could also work with others. Yes, such as such an enterprise CA I've heard of few few people using that to talk to identify another other solutions, but I don't have a full list of everything that supported. But yeah, this is one approach using using the operator deployed via helm chart in order to manage the certificates for particular routes. There are possibly other layers in which this could be employed, although typically the route layers safest if I start working up toward other components and open shift, I introduce potentially more risk. But for application traffic this is one way of doing it the other way. Not talking about external routes let's say but internal route communication let's have got a certificate that's been issued by public authority and attached that to my ingress controller, I could use. What we call re encrypt it's actually an edge termination with the re encrypt on the back end, so that I could use the internal service certificate signer with an open shift and customers will only see that front end certificate. And then the rest of the certificate management is done with internal certificates inside of open shift. It's a very similar process to doing that but employed at a different level if I want pass through though, and there are a lot of reasons use pass through doing re encrypts going to strip some of the information off of the traffic. I may need mutual TLS if I've got no trust network infrastructure where everything's going to be authenticated using certificates. I may need pass through in that case, having a certificate signer using cert manager on the application rather than using the internal certificate signer would allow me to get that additional information through the connection. So it really depends on the trust side of things with so TLS if I remember correctly has like the SNI, if you need that SNI to reflect specifically that application then you would want to use pass through as opposed to re encrypt. Now, there are reasons use re encrypt as well if I'm doing any type of past path based routing or if I need some information annotated by the open shift ingress control load balancer. I would need to have headers be modified by the router I would need the router to be able to read the path in the request TLS one dot two has that wrapped inside the encryption packet. But if we look at newer versions of TLS open shift supports TLS one dot three right now, they support encrypting the headers separately from the data. So one can possibly use edge termination re encrypt with TLS one dot three without losing any functionality. But it depends on which version we're using. Got it. Hopefully you weren't able to hear all the banging of the garbage truck coming but just now didn't hear it. Okay good. Hopefully you didn't hear the lawnmower that just went by my place. I know it's Wednesday is a busy day I guess yeah public services. So I'm trying to catch up on chat here. There's a lot of questions. Yeah, so do we plan recordings about a deep dive into cert manager. We don't have anything planned at this time, mostly because it's a third party thing and we don't tend to focus on those. As Brian pointed out, Raffaele did do the blog post and there is a, I know there's a YouTube video associated with that so if you're interested in more information there, reach out to me or Chris, or both of us shoot us an email, and we'll reach out to any relevant folks and see if they're willing to volunteer either to come on a stream or to potentially update this blog post if it needs to be done since it's almost two years old as you pointed out, Ryan. What is the actual use of kube API server to kubelet signer and open shift for is it something like the certificate authority use for mtls communication between API to kubelet. Oh, so when we're talking about the API server to kubelet signer. We are indeed talking about the node signing certificate. There's a there's a little bit of automation when we deploy a cluster that's going to automatically approve all of the certificates required for the control plane but typically we admit the nodes if we're doing a user provisioned infrastructure type deployment installer provision infrastructure will automatically handle that so you don't have to worry about it. Along with machine set automation. Wow, it's does get a little bit more complicated now that I think about it in context. Once nodes have certificates they are those certificates are very short lived like four hours. There is a constant process of certificate renewal going on in the cluster and some of these processes services operators. They handle the evaluation of the request coming in the validity of the credentials that are requesting new certificates and handle the certificate signing and reassurance in the background. The node node operator factors in here pretty heavily. I don't know how involved the machine config operator is in this but So that that that raises two questions for me. So as you said when we do either non integrated or a UPI install a human has to approve those certificate signing requests. The first question is, what is the purpose of that CSR and I'll say my layman's answer to that is usually well that's somebody saying yes, this is the node I expected to be. And then the second and kind of follow on question is, we see a lot of times customers have issues when they shut down their cluster for some period of time. And when it comes back up, the only action that you need to do. I'm that's with the wink wink not always only but usually the most visible one is you have to reapprove CSRs for the nodes to come back. So what's what's actually happening there. Yeah, so that's the that's a four hour approval that I was talking about the four hour validity time of some certificates. It varies depending on what certificates I I'm not sure if the client in the server have two different life spans or if they are both issued for four hours. The certificates we have are valid within four hours of the shortest and 30 days, the longest when we're talking about this level of certificate. Sometimes we can survive cluster being shut down for a weekend, for example, if it's a lab cluster and we don't want to burn too many hours or if we want to relocate hardware from one part of the data center to another part that requires down time. But you're absolutely right if those nodes have been down for too long and we're within that period where the certificates need to be renewed. They can't request and CSR is a certificate signing request where I say I need a certificate for a particular purpose nice in that request over to a certificate authority. If, if my credentials can't be validated if if my credentials have expired, my CSR won't be automatically approved, which is where an administrator may need to go in after after somewhat of downtime of node and look for certificate signing requests that are in the system. And as an administrator sometimes we could be a little bit lazy. If we know we've got old bunch of certificate signing requests and we don't expect any malicious request to be sent to our cluster. One of the things we made just go ahead and do is approve all of the certificates all at once it's kind of bad practice good practice would be to actually open the certificate signing request, validate the source of the request and ensure that's one of the nodes that is actually part of the cluster and then do the approval. Once the approval hits the cluster will go ahead and handle the issuance of the pending certificate and the node will pull that down. One waits too long and this is something that I've seen pretty commonly cluster get shut down for a period of time for some sort of maintenance activity, and then it's brought back up all the nodes will send their CSR is at once. The system won't hold on to the CSR is forever. The CSR is will expire after one or two hours and then I might get a call. Hey, my cluster is down. I've seen somebody actually go through 200 requests one by one and approve each one of them. Oh, my God, paid by the hour. Yeah. It where we've got a long command that'll actually look at all the pending certificate signing requests. It's probably two or three lines of documentation. I hate that one because it's long I can't memorize I just type O.C. I'm going to paste it into the chat. I pasted it into the stream chat. Oh yeah, I'll go ahead and type in my shortcut here. So I'm there's a four or five questions in here I'm actually going to go a little bit out of order so Krishna I see your continuation we'll get to that in just a moment and john. Happy to host the topic Christian I also see your question there so let's and this is I'm not even going to attempt to say DMI three. So let's assume that we install open shift for an attest environment or a lab. Extend the lifetime of those certificates, you know, say, you know, to a year or more basically so you don't have to worry about that whole, you know, it expired I need to go in and manually do something process. I don't think so. I don't think so. Yeah, or it's that for me I was typing. Yeah, can you extend the lifetime of node search or Oh, functional easy way to do that. Am I easy. So one of the things I tell people quite frequently is if it's not in the documentation it's not something that is recommended or supported. The short answer is you could do probably anything you want to do if you know which operator which configuration were modify the images for example. But the short answer is no, and why. Why would we want to do that obviously administratively their purposes for it I want might want to shut down a note over weekend for maintenance and not have to approve the certificate request. When it comes back up it's administratively convenient. But the the impact it has on security can be potentially high and customers are often risk averse, both in terms of unexpected downtime but also in terms of security what happens if a certificate key gets leaked outside of the environment somebody gets access to the node and uses that certificate elsewhere what damage could they potentially do. So short certificate life spans factor into some of the security decisions. So I want to take a moment here because imprim you know shortly I'll be taking holiday and turn off the home lab cluster and basically will will live in fear during that time because the last time he did this he had to redeploy the cluster. And I think it's important to point out that, at least for me and I do this all the time right that exact scenario of I always have somewhere between one and five clusters deployed in my lab and I turn them on and off constantly. I'll be going on pto in a couple of weeks they'll be turned off the whole time right. So you want to make sure that your, your cube config or your coob admin config right so the one that gets created when you do open shift install. If you don't have that you want to generate a new one there's a case yes for that I'll dig it up in just a moment. But you want to make sure that you have that and then when the cluster comes back and by back I mean it's powered on but it's not actually running. You can use that coob config to connect as the system admin user and re approve those CSRs. So, and Brian, where I'm leading to all of this is I've also seen anecdotally internally some folks say, well that doesn't work because the API server isn't running because the certificates aren't valid so how do I work around that. Personally, I've never experienced that and I've left clusters off for months at a time, and it comes back up and I re approve the CSRs and, you know, 10 minutes later the cluster is back to functional. So is there anything else. So I've had the, I've had the same experience myself where I've shut down a cluster for 30 days and more than 30 days and brought it back up and the control plane came online without a problem. So there is a difference in terms of the certificates that are on the control plane servers and the certificates that are on the workers, and that the control plane is locally privileged. So there's local to each one of the masters, the certificates that authenticate at CD to CD. Those are maintained outside of the data store so that the data sort could be brought up without any problems. And the services the pods that are running on the control plane nodes do have direct access through through that to do their own certificate approval. That's not really an issue there. The nodes may have to be re approved. That can be that can be automated we actually had a node auto or prover pod and earlier version of OpenShift that I haven't seen an OpenShift for but it's essentially where some of our some of our code snippets and some of our administrative shortcuts or that the snippets we could remember come into play. So in practice, I haven't seen a problem with being able to shut down a cluster for a long period of time and be able to bring it back except for one case. And in that particular case, I messed up on the DNS server. So the reason things weren't joining sir properly was because my NS server setting was wrong. Okay, so I'm going to jump back up to those earlier questions that I mentioned before. Let's see where did they go. They scroll up. No, you want me to ask them you want me to handle it like I can. So continuation from so Krishna who asked before what is the actual use of kube API server to kubelet signer. So node certs. So the server dash client dot PEM files are signed by a different CA specifically kube dash CSR dash signer underscore random numbers. And for mtls, we think we need to have the same CA for peers. I couldn't see any cert that is signed by kube API server to kubelet signer. So I think where you're going with that I think the question here is more or less. Because that specific CA isn't signing the or isn't signing all of the certificates using SNIs or TLS 1.3 means that it doesn't with or if you want to use TLS 1.3 it doesn't work the way as expected. I think I will admit that I am speculating there because this is not my area of expertise. So I believe this may be a good time to bring up the number of CAs that are in open shift. Okay, there are in this I think directly directly to a comment. So I don't know if you see that. And where the certificates came from to the open shift blog. So one of the things that we might have and I don't know if you want to navigate there. Is this the considerations on open shift PKI and certificates. Yes, that's the one. I've got a comment half typed in that I was going to send that I didn't want to mix that up with. So, so PKI has the or public infrastructure has the ability to have a CA that is the ultimate trust for everything and underneath that CA I could have several sub CAs or intermediate certificates. And those sub CAs can also be certificate issuers. If they have the if they have the issuance flag enabled on in open shift though instead of having a single root CA. We actually have four root CAs. And the purposes of those four root CAs are for the platform for the aggregator proxy at CD and for the service signing. Now, I think I lost my thread here. What the question was and in which direction but more or less is there a conflict because certificates are signed by those different CAs and does that cause issues for. I think pod to pod communication. No, and the reason I would say no is there's two bundles that are provided to the pod one bundle has all of the relevant certificates that a pod may happen to encounter within the cluster from a pod to pod communication being signed by the open chef service signer for example, or pod API signed by the platform CA. Those are in one bundle that should be automatically trusted with. If it's not automatically trusted. Let's say. Let's say we've got an application that's using its own key store trust bundle that can be chained into the pod either with an entry point script or by pointing the application to where that's a bundle happens to be. And there's also client CA bundle for any other certificates that exist within an enterprise. For example, that can also be there are two bundles that are provided underneath that mounted secret. Another of the information in those CA bundles are secret. By the way, it just happens to be a secret object in which we communicate that to the pod. So that's of that's of some note, where that's located in the pod. I don't remember off the top of my head, but I think it's like Varlib Kubernetes. Let's see if we can find a pod is to pick a random one of these. Yeah, and one won't see that in the pod spec. It's something that's mounted outside of the pod spec Kubernetes adds that to the containers by default. Maybe Var service. Maybe not. No, it's here we've got the service CA underneath our service CA. We've got the trusted CA. These are all all volume mounts that it's pulling in. There you go Var run secrets Kubernetes.io service count trust bundle is under there. I didn't know it would actually show up in the spec like that unless it's I'm assuming this is because it's a running pod. Yeah, I'm assuming the deployment had been modified when the pod was instantiated. Yeah. So one, I'm going to take a slight detour here because I just I was just reading through chat and I saw the the command that you use the OC observe. I didn't know about that sub command that's a new one to me and that one is awesome. I'm not going to start using that. So if if you're not familiar with OC observe do an OC observe dash dash help. And it's it's exactly as Brian showed it being used it looks for something and then takes action it basically passes that through to some command. I did not know that. Thank you. Yeah, me either. That's a good one. Let's see. So just digging through chat here. I thought I saw a question from Christian up here if I'm doing read it. Do I need to provide the routers certificate in my route config. So we've got a new way of doing a secret reference. The route configuration so the certificates don't actually need to be in the route object they could be referred to as a secret. They can. We did we change that. So that was one of the big complaints a lot of people had of oh my if I want to use my own certificate with a route it has to be defined in the route I can't put it in a secret and for a long time my response was but secrets are no more secure than a route. Well, yeah, and and that's a that's kind of interesting to because I looked at the potential of encrypting the route objects back in my open shift three days and found that one could specify all the objects one wanted to encrypt like secrets config maps and but only secrets and config maps would actually get encrypted the route definition in the encryption config would be ignored so the routes were definitely not encrypted. Let me. The best place to validate that would be in the API documentation for the route API object. They can look at that unfortunately not on the screen share. I can dig it up with hate doing doing a dig through of the internals on a on a call. Ah, no that's totally acceptable here. So certificate provides a certificate contents already contents. Thought we were able to do a reference to a secret in here. I may be mistaken. That's, that's okay. I'm hopeful that we will change that someday. Yeah. Yeah, because it's a it's a check in the box. I don't know. Andrew's personal opinion is that I don't know that there is significant value to that but I do understand why some people want to have that separation. Well, and there's another reason to it's it's not just separation it's being able to share what the pod is seeing and the service is seeing with what the route is exposing to the open shift router. Now, in fairness, one of the reasons for not having this be a secret and this is somewhat historical one can't read secrets between two different namespaces. But recently we've had the ability to use exported exported secrets some what of a projection of secrets, where where it is visible from other namespaces so it looks like it just hasn't made its way but the open shift router needs to have a copy of all the information so even if it was stored in a secret, rather than in the route object itself, it would still need to be communicated to the open shift ingress so that the router could present that certificate and authenticate to a client that it is the server that the client expects so we can't hide everything. It's not possible to hide all of it because we need to be able to actually use it in some other objects. At some point you have to trust that our back is doing what our back is supposed to do. Right. So there's I see three questions here. I got a lot of questions if you got time. So, what are the best utilities to use for debugging issues with certificates. SSL is a great tool curl is a great tool as well. If, like, actually applied but you can you know if it's at the node level you can debug into the node and access those resources there you can also, you know, connect to the pod so open the here if I jump over here. If I go to the terminal here. You know, I can look at all of the mounts inside of here mount inside of here. And you can see, you know, all of the, here's exactly what Brian was talking about a moment ago right that's that's where you'll be able to see all of those things inside of there. Some pods of course won't have those tools inside of there you can always copy that file out and then do some analysis on it. And as always, all of these things are controlled by operators. So, if you and it's come all the way down to administration and cluster settings and cluster operators. So you can always go into the various operators and look at their logs they'll surface up high level information, and you can go in and then dig into various things. And one of the things that I like your related objects. So this related objects definitions and it shows you the same exact information if you do an OC gets yo, this is a great way to figure out like, Oh, I know it's something with the name server but I don't know where name server puts all of its stuff. Look at this related objects and it tells you it's, you know, the R a kube API server right oh it put stuff in this namespace open shift config manage so on and so forth, and then you can click inside of there to find out. So, Brian anything to add. So, one of the things I've noticed is open shift will reject some configurations that it needs to be invalid once they're applied. Some of the, some of the troubleshooting on the open chip basis isn't always necessary, although open shift does allow some certificates be replaced like if I replace the API or the ingress certificate, it would be those to be valid in and of themselves But if I do them out of order with the documentation. And my login expires in between, I may not be able to authenticate back into the cluster because now I've got a mismatch between the trusted to go off server yeah and the API server there are ways around that so it's not necessarily a fatal operation to do but order of operations definitely matters there. So there is troubleshooting certificates. Often these are done from client server basis open SSL as client is a very good location of the command. It could be used with dash so shirts to show the certificates of the client and or the server and dash connect in order to specify the host and port. Other options one of the things to note is open shift is very heavily based on server name indication. Essentially the host header needs to be outside of the encrypted packet so that other services know how to know which certificate is sent back to the client in or and know which service in the back end to route that traffic to if there's a mismatch in SNI. There may be inability to connect with the application in the back end. I've seen this quite frequently, especially with older clients that don't pass SNI information properly. Or even some misconfigurations I've seen in SNI packet incoming into open shift and I had to use TCP dump in order to see this. But the SNI header was actually requesting a host name somewhere else out on the Internet. I don't even know how that was being directed to the cluster but the host name did not match anything that should have been in that environment so there are levels for things can go wrong that aren't necessarily the fault of the certificate, the way it's issued, the way it's signed, but in just the way the client's communicating with the server and how server name indication works. Order dependency TLS 1.2 specifies that the intermediate must become before the certificate. A lot of clients are very lax on this and they'll take the certificates in any order and go ahead and say yeah these are related in the order in which they go. But some clients are strict and if the client certificate comes before the intermediate it'll be rejected and this often can be a little bit of a head scratcher when every other troubleshooting command succeeds but the connection still fails. So the list is pretty much non-ending of things that could potentially go wrong with certificates. Most often the problems I see are related to application communication. Platform certificates are pretty solid, pretty well automated and one's a task to make sure that any enterprise or external certificates are provided for the API and the ingress controller are not expired. Got it. The SAN, subject alt names, something not to be ignored. Sometimes it's good to have multiple things specified in a SAN for the wildcard certificate. If I've got a host that's serving, say US and Japan, and I want to host applications that can respond to either .com or .co.jp, I may want my SAN to have multiple top level domains listed. So if I get into a point where a pod's down, applications down, I don't get a TLS certificate error because suddenly the default wildcard certificate is answering instead of the intended route. So I think we've got maybe two minutes, Chris, do we have a hard stop today? No, we do have 30 seconds and yes we do have a hard stop. Okay, so I won't ask either one of these last two questions, instead we'll address those in the blog post. So I'll take these last few seconds to thank Brian very much for joining today. Your expertise has been immensely helpful. So thank you so much for joining to our audience. Thank you for participating. These questions have been phenomenal today. So I will be sure to follow up in the blog post. It will be out Friday morning with all of that information. So keep an eye on openshift.com slash blog. And I will see everybody next week. Thank you so much for joining us today. Thank you so much.