 Good morning, good afternoon, good evening, and welcome to another special edition of DevSecOps is the way here on Red Hat Live Streaming is what we've changed our name to, Dave. Sorry, the memo has not gone out about it though. So yes, the thing formerly known as OpenShift TV is now called Red Hat Live Streaming. I am for short hosting showrunner of that thing called Red Hat Live Streaming and often producer, but today we're using intern Bobby as producer. So thank you, Bobby, for doing that. Oh, look at this, look at this. This is what I call DevOps, folks, come on. Red Hat Live Streaming? All one word, yeah, live streaming. Awesome. Yeah, there's two ways you can do it, but yeah, that's one of them. Lower S, upper case S. Lower. Look at that. Thank you, look at that, so fast, so awesome. So speaking of the man behind the curtain, Dave, what's going on, buddy? How you been? Doing great. We were just talking about, you know, losing power in these summer storms. Obviously you're in Michigan, I'm down here in Florida, and we're hoping, knocking on wood, that storms don't come by and knock us out, but we should be good. But yeah, doing great. So I'm excited to talk to everybody today in this month's episode of DevSecOps is the way. We're talking about data controls, and I'm really excited about the guest we have, Matt Rogers, who's a senior software engineer, has a lot of good and interesting history at Red Hat. We're gonna talk about things about data controls, like encryption, and we might even get to some other items like the compliance operator, maybe file system operator, file integrity operator, sorry. Nice. So yeah, should be a good show, excited to get it going. Before I do though, I just wanna remind everybody that this is a monthly series that we're doing. So if you're really interested in security, tune in every month. We actually have two OpenShift TV shows that we have under this umbrella, DevSecOps is the way. This one is primarily around thought leadership. And so we bring really smart folks like Matt in to help us navigate through some of these security topics. Our other shows, a little bit more dedicated to our partner ecosystem, which is very, very important to Red Hat. Yes. And I focus on our security ISDs. So this month's, I think we had last week I'm trying to think. Yeah, it was last week and it was with ZetaSet. It was one of our good partners in data encryption. And you can see on the right-hand side, we've done certain security categories in different months. We started in March with vulnerabilities. And we're gonna end here in December for this year with platform security, which will be very interesting. We also like to put out up to three podcasts. So you'll find those on Red Hat's Buzzsprout site. And those are really well done, very interesting, about 20, 30 minutes of good security content. And then just a ton of other publications and thing around the topic of that month. So I just wrote a blog about data controls. It's a very short blog that you can find on redhat.com. And then we do webinars and other things like that. And you can find more information, red.ht or it's slash devsecops and the D and the S and the O, actually D need to be capitalized. By the way, this whole thing isn't just a show with pretty faces and I would have to say elegant beards. But it's also a practical framework. I know this is a little bit smaller font, but we use this in these categories with our joint customers and as we go to market with solutions that solve devsecops. So this is just a very high level view of it. We go down into deeper levels to say, hey, as you're making your journey into devsecops, you might wanna think about this security category and this security function, because there's many functions underneath these categories at this specific point in the DevOps pipeline. And we can map out, hey, here's what redhat can do. Here's what our partner ecosystem does pretty well. And that would provide you with a whole joint solution as you start your devsecops journey, which by the way are kind of synonymous. Yes, and that's, I mean, Dave, you didn't know or you might not have realized. We just had a nice devsecops conversation in the clouds with Andrew Clay Schaffer, Kirsten Newcomer and Jamie Scott. So that was nice and eye-opening episode. As always, you can check our archive for all the videos that we've put out so far. Yes, those three are very knowledgeable in devsecops. I work with all those three all the time. So yeah, we're gonna be doing a lot around devsecops. Actually, you'll see even more stuff in key three from redhat around our point of view and things like that. So we're excited to do that. Awesome. Cool, but so for this episode, let me just stop sharing. I will let our distinguished guest introduce himself, Matt Rogers. If you please, let everybody know who you are, what you do at Red Hat. Hey, so yeah, my name is Matt Rogers. I'm a senior software engineer here at Red Hat. I'm on the OpenShift Infrastructure Security and Compliance team. So that's a, that's an engineering, a small engineering team that where we are tasked with getting products like OpenShift be able to meet compliance standards. So we're doing some cool stuff there. And I've been with Red Hat for, I've been with them since 2010. I started off in the support department and worked there for a while and made my way over to engineering. Nice. So you've seen it from both sides. Yeah, definitely. Yeah. Yeah, you were, we were talking earlier. You had a pretty interesting start. Why don't you tell us a little bit about how you sort of navigated through Red Hat and where you got to where you are now. Great point. Yeah, so when I started, it was a rail support. The versions of the versions of rail were like, I think it was like the latest was 5.3D or something like that when I, 5.2 when I started. And so I came in kind of as a generalist just to work on base OS components and support those. Like there were cases, support cases for send mail, post-fix and things like that. But also some of the security components like Samba, Red Hat IDM towards the end of my time in support, I was pretty much just supporting the IDM suite of products, which is Samba and MIT Kerberos, those kind of things. I also spent a lot of time supporting VPN, VPN software for rail. It's Lieber Schwann. It came out of the open swan project. I spent a lot of time supporting those cases and also kind of swinging into development through that project. So as a part of the support engineer group at the time, submitting patches and fixing bugs for customers was highly encouraged. So of the projects that I worked on, I probably spent the most time debugging and just fixing stuff on Lieber Schwann. So that was a great opportunity. Yeah, sounds like it. Yeah, and it was cool because coming into Red Hat support, considering that things, if you were to do software support at Microsoft or something, you'd probably have a script that you would go through for cases. You're not gonna be in the code, but for Red Hat support, that was kind of like the boost. If you could go in and look at the code, then you were really able to be able to help the customers out as opposed to just expecting documentation or scripts or anything like that. So the support organization at the time, the team's working on, there was like a lot of good competition in a way of saying, hey, if you could go into your case, if you could just blow your customer out of the water, fix this issue, get them out of the way and you could do it in a way, especially if you wrote a patch and fixed a bug form, then that was highly encouraged. And we, the support texts played off of each other, supported each other to say like, hey, you can go check out this part of the code. So for doing software support, it was a great environment to learn as well. I mean, I pretty much got my start in software development just working on support related things. Yeah, which you rarely hear is the point you made. I've been involved in support organizations, where I ran one and we never touched code. It was all about, let me talk to the engineer. Yeah. And then I'll get back to you. But yeah, right out. I could see where, yeah, because everything's open and it's right to the open source and figure it out, that's pretty cool. Yeah, there's not many places like that, right? Like I think I've worked at one place where it was like, the support team was literally like in the cube farm right next to ours and like the support lead knew like, hey, can you guys help me with XYZ thing? Right, like for my team of quote, DevOps engineers and everything. And oftentimes we could, which was kind of nice. So yeah, the escalation path was quite simple then, yell over the wall, hey short, can you fix blah, blah, blah? Nope, that's not my thing. Yes, let me fix that real quick. Those kinds of things, yeah. Yeah, the support and the interaction with support and development, at least in my experience has always been one of those things that Red Hat has tried to foster really well to be able to get, even maintenance engineers and stuff that are just doing packaging of upstream things. It was a lot of collaboration with engineering, even just as a level one support tech. And that definitely improved over time. There was a lot more, back in the day occasionally, I remember having to ask Ulrich Drepper a problem about something and he didn't get back to me. And it's like, and Ulrich is the glib C guy and pretty notorious for closing bugs and just saying go away. So there were definitely some where you're like, I just can't get in contact with this guy or get a deep answer from him. So, but over time, that worked out. But as an engineer, we always keep an eye out for customer stuff, yeah. So when you started that lever salon work, was that your first sort of dabbling into encryption and security typing, Cody? Yeah, before that, I had done some basic security stuff on Linux, just securing a system or whatever. But in terms of for LibreSwan, I remember one thing that surprised me about that project was I think like the first case I got of it, I had the customer turn on the debug logs and they were very verbose, but what they did was give a lot of prompts through the code and LibreSwan is an implementation of a protocol called Ike and Ike version two, which is like a stateful protocol for IPsec VPN establishment and it ran through the, you had two end points and they each ran through the same debug log, but at different states and you could go and reference the RFC drafts and the RFCs that are out there and follow along the state machine. So even though I had these very big verbose logs that just tried to describe everything, I found it really useful for going through the code and that was kind of the first time that clicked in terms of like, hey, I can, I just found this log in the code. Well, it's right there. The issue is right there in front of me. So that was kind of an epiphany of getting started. And this was jumping into a case of a product I had never heard before. I had heard of VPNs, but I had no idea about IPsec or anything like that. That is definitely a different world for the standard engineer, right? Like having to dive into all the fun stuff of IPsec and all the underlying protocols and ways to do it, right? So there's more than one way to skin the VPN cat, really. Right. Yeah, and that was kind of the way it went with support was you'd get obscure stuff, especially since we were supporting all of the components, like all the packages in RHEL or most of them somewhere unsupported, but no larger majority of them. So you'd get cases on things that had never had a case open before. Right, like you were the first one to work on something. There was one for IPMI tool or something. It's like, no one, who's used that? So you get cases coming in and you go look for the guy, like, hey, who knows this? It's like, sorry, you gotta take a shot at it. And so being thrown into the fire is, it's daunting at first, but it was a good crash course in just everything, in pretty much everything open source development related. So. Yeah, that's pretty cool. How did the work that you did sort of manifest itself into OpenShift then? Yeah, like, how did you make that up? So I worked on Libre Swan mostly when I was in the support organization. When I joined engineering, I joined on to work on MIT Kerberos because there were some, and I worked on adding features to Kerberos for in support of free IPA and Red Hat IDM. And some of the, well, one of the themes that I kind of took along with working on different projects has been X519 certificates. And because I worked on certificate related stuff for Libre Swan, you can actually authenticate a VPN tunnel using a cert. And I also worked on PK in it for MIT Kerberos. And for OpenShift, when I joined OpenShift on the auth team, working on certificate stuff was, I was like, okay, I guess I'll be working on this. You know, the guy in the auth team that had the most experience with it. So that's what I'll be kind of covering today in my presentation is just some of the, I guess just some of the experience and some of the stuff just of over time on multiple products working on certificate related things. Nice, very cool. Yeah, so that was one of the main. I also like to, you know, I've liked the part of being a generalist and being able to kind of move around a different projects. So yeah. Nice. Well, cool, yeah. If we want to get into the meat of what you were planning to do, I think we're going to do a base study on OpenShift certificates. That's right. I will, all right, let me share. Shareaway. All right. All righty, nice. You got it. It's a fact I share. How do you go back? Okay, yeah, so this is X509 in OpenShift. It's a PKI case study. So my hope with this presentation is to just share some of the decisions and some of the history behind or some of the decisions behind the certificate management and handling that OpenShift does, sort of also Kubernetes as well, to some extent. And yeah, I'll get into it. Awesome. So yeah, I've already done the introduction on Matt Rogers. And yep, those are some of the projects that I've worked on. Now I'm on OpenShift Container Platform Compliance, where we do the compliance assessments of products. And lately I've been working on the compliance operator and the file integrity operator. And this has kind of been a general area of operator design for OpenShift for secure operators, ones that have some kind of security function in our case. I have a development blog at mrogers950.gitlab.io and my personal site is cryptotheatre.info. I have a, that's a painting of mine. So I've got my art up on cryptotheatre.info if you're interested in that. Nice. So this talk is kind of inspired by Peter Gutman, the work of his research in the PKI. It's actually about like 20 years old now, but a lot of the stuff is still pretty relevant. He's got a great paper called PKI. It's not dead, just resting. And he's got some other good stuff. Everything you never wanted to know about PKI, but we're forced to find out. That's a really good one as well. So- Sounds like a good one. Yeah, yeah, those are really good. So, and I cover some of the themes that he goes through in broader detail, but his work has kind of shaped a lot of the ways I think about PKI at a high level. So for PKI, like for X5 and I, what are we talking about? Well, it's a, it describes a system of a public key infrastructure. So you have sort of, I have a note here. Make sure. Yeah, so it's going to define, it defines a system and data formats that are supposed to answer certain authenticity questions about public-private key pairs. Namely, who does a public-private key pair belong to? And specifically, it ties the pair to a name. So it can answer, it's supposed to answer this question. Does public key K belong to FOO? It also describes certificate authorities, which is a certificate as well, but it is one that is allowed to sign other public keys for the name to attest or to be able to verify that name using the certificate authority. And it also outlines methods of certificate revocation. So this is the question, has FOO certificate been revoked? And I'll get more into that as we go along. So yeah, this is just one of the secrets. Actually, it's a secret from OpenShift. It's the at CD client. Secret, or the at CD client certificate, I mean. And there is a, it's signed by one of the at CD CAs. So there's a bunch of links, bunch of TLS links there. And this is one of the certs. So this is just to give an idea of the general outline of the format. You have an issuer name, you have a subject name, information about the public key or the public key itself. I mean, you have some other data, you have some other informative data here with extensions. Yeah, this is marked as like a web client. So this is a cert that's supposed to be used for client authentication or for a server to authenticate a client. And it has the signature from the, that was calculated or that was added to the cert by the CA. So, so yeah, a little bit of history behind X509. So it's a bunch of historical standards like from going back to the before the 80s throughout the 80s after RSA began to catch on. It was, its primary goal was to avoid the man in the middle with RSA. Yeah, so because the core drawback with just a plain RSA is you set up a session with Bob and you ask for Bob's public key. You get a public key, you encrypt your secret material. You end up doing it with Eve's public key and then you send it to Eve and you think it's Bob. So how do you make sure that, how do you make sure that that is Bob's actual Bob's public key and Bob is who he's claiming to be. So that's what it's essentially tried to address. So there was a directory that was supposed to come up there was a directory standard that was supposed to come along with X509. It's called X500. The closest to what X500 is today is open LDAP. So that was kind of, I mean, if you're familiar with LDAP, you have the distinguish name, format O equals OU, that stuff, that was designed in concert with X509. They were supposed to go together, but over time, of course, that didn't happen. So that was supposed to be one of the, that was supposed to help with the distribution problem was that it was assumed you had an X500 directory and you just get your keys and CA's and stuff from there. So I'll get to more of those ramifications. Yeah, and it has this name hierarchy concept of distinguished names, a made out of relative distinguished names. That's one of those things that when people see it in a cert, it just makes sense. Like, oh, you have this hierarchy of things, like you have my server form and then you have my servers. And so it would make sense that you would have a broken up into that way. But that is not how it's used today really for most cases. So the over time, the implementations varied. There were some that were implemented loosely, some that were implemented very strictly. So there's not like, so kind of what ended up happening was the IETF, there were RFCs published that just kind of ended up being broad profile recommendations. Like what do you do with this? How should you think about PI? How should PKI, how should you lay yours out? What should you consider? What extensions do you use all that stuff? So there was this, one thing that kind of ended up happening was you could get players in the PKI space, like Microsoft, where they would make decisions about what attributes and things would go in your certs and just by virtue of them being there and being the people doing it the most, their stuff would make it in a way of everyone else has to deal with certain attributes and work around bugs with this kind of stuff. So when it came to PKI interoperability, that was really not in the cards. It's one of those things that over time, people want to do, they want to be able to use certs from a PKI and other stuff. And as we'll see that, that's kind of, that's tough to do. It came along with a data format called ASN1, which I won't go too far into, but this is DevSecOpsCat and he's wondering if ASN1 and Microsoft Certificate Profiles were done on purpose. So he thinks that was done specifically for him. Don't let him know otherwise. Yeah, I know. He's got his Tim Foyle hat on. He's a little bit mad at Bill Gates, but oh well. So yeah, so interoperability stuff ends up being to where you have these certificates that have all this stuff in them that your parsers don't know about, your implementations may not handle well. There was, I remember a bug in OpenSwan where OpenSwan had a custom ASN1 parser just because it was written in 1992 or whatever. And there wasn't a general purpose TLS library or certificate library. So it had its own parsers written by Henry Spencer, actually, if anyone recognizes that name. But anyway, so it was some old code and there were instances where customers would plug in their certs from Active Directory or whatever else. As soon as it would hit the parser, boom, it would crash. Crash the Pluto daemon. So there are tricky things around there. So in this case, we have, it says feed me a stray cert. So where we're at today is, of course, certs are used everywhere. SSL TLS, of course, which is common use for WebPKI is what people end up interfacing with it on a day-to-day basis. That's your, the little padlock icon in your browser is supposed to warn you whether or not the server you talk to, if it's server certificate, if you were able to verify that with a CA in your browser's bundle. It's not telling you if it's using military grade encryption or anything like that, but that's what most people are going to be. And of course, if you don't have a trusted CA in there, then you get the warning. You have to add the exception. To use the smart cards. I guess that would be like in a lot of government agencies, public sector stuff like that would use smart card systems. Everyone does it. I say this because if you look, if you go, if you search stack overflow for PKI or TLS related questions, you'll see answers with almost a million views and things like that. So at some point, if you're in any DevSecOps or any kind of DevOps or security, work on security stuff, you're minting certs. You're going to be running, you're going to be using whatever tool out there to create certs and moving them around and you're going to get familiar with it. There's some progress, progress in terms of new protocols and stuff that's been going along to make things easier. But now we're in a restful, we're in a microservices world. So it's cert explosion now. You have, if you have multiple TLS endpoints, you have to secure them all. You're not going to get away with having, especially from a standards compliant standpoint, you're not going to get away with having unencrypted T, if your endpoints can, if you're using a restful services, then you're going to have to secure them. So it's not a good excuse to use insecure verify or just like not plug in your cert options. So this is just an outline of the SSL or the TLS, like a very brief TLS exchange. The verify server certificate, that is where it's checking against the CA on it. Inside. So now enter Kubernetes. So the problem just gets deeper here. You have tons of self-signed CA's, certificates to manage. You have a whole cluster control plane, it's got API servers, cubelets, etcd. Not to mention there's client authentication, which is like there's a cluster admin cert that you can talk to the, you can use to just talk directly to the API that's like used by some of the other components that have like a direct cert authentication. So that's even outside of the, just the TLS itself. You also have like your internal apps, things on your service network, MTLS ever like MTLS adds another headache to it. If you remember what I said about the DN structure of the certs, you may have that for your organizational CA and it's all dandy, but then here's Kubernetes and it's got custom profiles, meaning that it's going to use O for a user and CN. It has this tag, this system colon, like the one of the authenticators or the API server authenticator, like we'll do some stuff to this like look and make sure it's a system user. So there's some special certificate profile. There's a few of them in there. There's some stuff that is a necessity, I believe, but it's sort of questionably secure. Like there are some certs that have both client and server extended key usage on it. So it's like a dual use, like you could use it for as a client on the client side or the server side, at least for this attribute, at least as far as this attribute is concerned. I think it's a technical limitation with NCD for some reason, but I don't remember. I just remember it being one of those things that we looked at early on. So yeah, if you were going to do Kubernetes, you know, just from scratch, and I know I've met people at QCon that run for the bare Kubernetes. So I asked them, so when I meet them, I asked them about this. I say, what do you do for your certs? So they're like, I don't know if I should tell you, but they're like, man, like... So yeah, without a strategy around that, your clusters are just gonna die. There are some little convenience mechanisms in there from Kube, like automatic worker, KubeLit bootstrapping and rotation, but it doesn't cover the full range of certs you need. You have various things like certificate and key files and just sitting around on nodes. You could, if you're totally, you know, the cert, if the PKI that you plug into Kube is totally up to you, you can do ridiculous things with it, like sharing CA keys and stuff. So just use OpenShift already is what I would tell them. So OpenShift 3 already has improvements over vanilla Kube, automates the PKI setup. So it takes care of all those profiles and all the pain there, set up by Ansible Playbooks. And those Playbooks do the rotation and regeneration of the certs and stuff. There, 3 does have the automatic service CA, which is you can get automatic TLS certs for things for services that are in your service network. So it's very common, you know, if you have like a, if you have like a single workload transfer, boom, you just need a server cert for that workload automatically. You can annotate a service and OpenShift will give you a serving cert for that automatically. And it can distribute the CA, the service CA too. It's got a default app route, IDP certificates, support, et cetera. OpenShift 4, when we went to 4, it's even more improvements over 3, self-rotating control plane certificates. A lot of them are short-lived. They are, a lot of them are managed as, or they're managed as Kube secrets by the control plane operator. So rather than have all this, all the certs sitting on the master, you know, they're in the, they can be, you know, they'll be in EtsyD. So if you use EtsyD encryption, then you get some benefit there. Nice. And there's some other, some other convenient stuff, like no, no scale, like auto scale up. The service network, the service CA, there is a CA key rollover trick that we do. That I'll get into. So yeah, so PKI was just resting and now it's awake. So some of the approaches that we took, I don't know if I'm going a little bit over time, but that's okay. Some of the approaches that we took, one of them, this one is really important and it's kind of not, at least I don't think it is totally obvious at first. So there's a type of trust domain separation that we try to achieve with different CAs. So for the known infrastructure links, you load the client's CA store with only CAs for the server workload that it, that you know it needs to connect to. Kind of a little bit of a comparison, maybe this helps illustrated a little bit. You would have a, like for your browser, you'd have the browser CA bundle that might have VeriSign, DigiSert, CA, whatever. Let's encrypt. You can connect to a server that has a VeriSign cert or DigiSert cert, but as soon as you get to the unknown one, you get the exception warning in your browser. But if you have a headless TLS client, then whatever you put in the CA store, that's going to be, those are, you know, it's going to restrict it there. So, and when it tries to connect to a server that it doesn't, that it can't authenticate, it can't verify the cert because it has no CA, it's, the connection's going to stop there. There's no, there's nobody there to hit exception. I mean, you would have to go in and program the exception into your code or whatever. So this is a bit, this is really important actually for also the infrastructure pieces. The same goes with like server side CA's that you load for mutual TLS because you want to verify the client cert. You want to avoid that big wad. It would be easier is if you had all your CA's and you had this one config map and everyone goes and gets that config map and loads their clients, but then you've just, then that server, you know you've broken up the trust model there. So it becomes harder to, it's a little tougher to manage but it's better for the security when you're explicit about your CA's trust. And an example I give is like if you need like a temporary mutual TLS connection. We have one, if we use one of these in the compliance operator where we just explicitly load one cert we create an ephemeral, you know and mutual, we create it like really short lived certs. We set up the connection, it sends over its workload and then that's it. And we only load the CA that we need on each side. There's kind of a, there's a thing you can do with a PKI which is use intermediate certs or intermediate CA's you have, which is like a delegating like you delegate signing of certs to another CA. You really like in our case for microserts you really shouldn't do this. It doesn't add anything, doesn't add anything to security. Now, because you've broken everything up into this complicated hierarchy. It just adds a complexity. There's like a theoretical discipline with X509 called path construction. It's like one of the great rivals of continental philosophy is path construction. So for example, here is you can do single inter domain cross certification. But I don't suggest you go look at this. But anyways, so yeah, if we did it like that, we would, some of the effects would be that compromising a CUBE CA, you have to redo everything. Some clients, we ran into issues with clients that don't require the full chain. The clients don't agree on this. Like a go TLS client, like a go client will, if you load the intermediate cert, it's as like for here, if you have, for cert two, if you load the CUBE CA, the intermediate and you sit in and it doesn't need to go and check its signature from with the CUBE CA. So the distribution of CA's here becomes a real issue. If you provide the whole bundle and even servers will snitch on themselves, they'll send over their intermediate. So everyone can just have CUBE CA loaded and the server will just send you service CA, QLIT CA, whatever. And then our model of trust here ends up being broken. So we did it like the flat model. So CA's can be handled independently and there's explicit trust domains. So yeah, we did a fancy self cross signing trick actually to on the service CA, which if you wanna check it out RFC four to 10, I'm not gonna try to explain this, but it's a cross signing trick that lets you gracefully roll over a root like a CA's key, a self signed CA key. So we actually use this successfully in service CA. Bring your own CA is this thing that customers have wanted to do, but it's ridiculous. Like you have, if you have, oh, good corp, oh, you office here, you use it for other systems. It just makes sense to use your OpenShift PKI branch it off of this, but then you run into these situations. There's really not any, there's really not a feasible way to do this. In OpenShift three, you could give an intermediate CA. If you plugged in the private key, you could like, the installer would issue these certs based on this intermediate that you just retrofitted in, and which is what you're saying is you're saying, hey, someone go mint certs with this key. Like no one would, I don't think people would do this or want to. So yeah, so POKI Dependence has put your face into the glue if you do that. So revocation, and this is kind of, the revocation stuff is mostly the last part I'm getting towards the end. So revocation is kind of the elephant in the room. Like what if certificate key is stolen or replaced? There's been methods to do this. CRL is the oldest one. So the CA issues a dated and signed list of certificate serial numbers and distributes it to clients. The verifying client consults the CA CRL to see if the cert is revoked. And then there's also the online certificate status protocol OCSP, which instead a client consults a server to ask if a certificate has been revoked. So the OpenShift PKI doesn't do revocation now. First of all, revocation by itself is pretty flawed. You could just Google certificate revocation is broken. Plenty of people have written about its flaws. Asking is this revoked? It's not the same as is this valid? Right. Because you get into this position where the server doesn't have a reply to is this revoked. So there are these, there are issues with the failure modes with OCSP where the cert where you can't contact, you can't fetch, you can't contact the OCSP server. What do you do in that case? Like if you just say, if you kill the connection, well, then you could break automated services or else if you just let it go, if you just let it go by, then it's like, is this cert revoked unless I've heard otherwise? Like, or it gets into a weird situation. So yeah, de-dossable, just blow up the OCSP responder and no one can check. There are some extensions to help with that. That's like the OCSP must staple that Google uses I think for Chrome, for web PKI stuff. But this is only, but like revocation is only half of the solution. You have to replace this cert that you just revoked especially in the microservice infrastructure situation. And how do you know when to revoke at all? How do you know your key got stolen? I mean, you know, it's like if they... Yeah, like scan the internet for a key. Good luck. Right, right. And that's the reality of something like Heartbleed was people just got key material for free without you knowing. So that's a tough thing to determine is when to revoke it. So yeah, so revocation has big snags with this in OpenShift to Cube. A CRL can be huge. There's a one meg limit on config map. So CRLs can get, they can get 500 megs. So when you fetch, so clients having to fetch a 500 meg CRL to verify a single cert, that's like crazy. There's some stuff for Delta CRLs that can help with that. But it's this balance of which option is less worse when it comes to picking CRL and OCSV. So overall we opted for this self-rotating. So this is the future me telling the past me there'll be no order only chaos. So it would be good to see in the future limited revocation support. One thing about revocation is that, especially from the compliance standpoint, if there are going to be, depending on the, on this compliance standard that you're trying to adhere to, you'll get some of this demanded by your compliance where you have to have some kind of revocation. Oh, if you use certs, then you gotta revoke it in some way. So in some ways that becomes this thing where well if we could just limit, well the revocation is so bad if we can just limit the amount of certs that the platform uses, then we don't have a target where a compliance standard will say if there's any kind of cert foolishness in your stuff, you have to be able to revoke a certificate. So for things like external endpoints and stuff, I think that open shift would be, that approaching it as we'll have revocation or support CRLs or something for a, for one of these domains and open shift, like the external, I think the external routes and stuff like that, externally exposed services, I think that's the best case for supporting some CRLs. But now this future stuff is not me saying this is gonna be an open shift. This is where I would like, the direction I would like to see it go in. So don't take this as future product enhancements. If we can leverage ACME, which is the protocol used to request certificates, automatically used by the Let's Encrypt project, here's the, this is the, I actually, for my personal site, I chose the host because through Let's Encrypt, they have a little box in your control panel where you just click, give me a Let's Encrypt SSL cert. This is like, you can use Let's Encrypt for, like this is great. This actually takes a lot of, it takes a lot of the hurdles away from obtaining SSL certs and things from CAs and it is a good option for stuff. Yeah, so leveraging ACME where we can, internal applications, kind of like how we have our service CA stuff for internal applications. There are some things, there's some options, projects like cert, just to act cert manager, there's a, there's a Let's Encrypt, there's a plugin and I think it's called OpenShift ACME and that is one where you can use, you can get a Let's Encrypt certificates from for external stuff and that is something that is always good, I think would be good to leverage. Now infrastructure, I don't think a lot of closed infrastructure is even prepared to do this for like, like not Let's Encrypt usage, like you should be able to use ACME to get certificates from an ACME server but I think Let's Encrypt is only really the primary use case. So it hasn't caught on in infrastructure but I think it's, the ACME protocol is a good point to move forward for future like certificate generation methods and things. And kind of offloading the trust where possible, there's an example and one of the components of OpenShift is a cluster machine approval approver where it's basically asking the machine API if the name for a node cert is good. And that's, when you've done that, you've offloaded the trust, you've said, okay, I already trust this machine API to tell me I even have nodes available. So offloading some of those decision trust decisions to like official APIs and things like that would be cool. Kubernetes itself has a CSR API but it's a limited, it's limited only for one case. So if there were general, there are generalized CSR APIs then that would be useful. And just kind of, if being able to support CRLs, for compliance reasons would be great, not using the CRL as it is, but kind of abstracting away some of this stuff saying, maybe we take in a CRL, but maybe we just turn that into a flat like list that we just refer to. So there's some innovation that we can do in that space to just try to make, just try to get rid of the need for people to think too hard about these certs. That's my job. But yeah, that's about it. I hope it was informative. Yeah, absolutely, I learned a ton. Yeah, no, and this is all great info to have as someone that works with certificates on a regular basis because if you like find yourself, like I know I've built tools to help you check the chain, right? Like to make sure that the cert was in the right order because a lot of times you get, like here's your intermediate key, here's your private key, here's your public key. And sometimes that all needs to be in one file, not the private key, obviously, but that chain has to be in the right order sometimes, right? And that's not always given to you that way. Yeah, in fact, the privacy enhanced mail, the PEM format, I believe it is you have to have the CA, the root, then the intermediates, and then the insert. If you have them in the wrong, then you don't, then it doesn't parse correctly. The nice thing about Go Standard Library is that it can detect that they're in the wrong order and we'll just bail out, right? And that I think was one of the most beautiful parts of Go that saved my butt a few times. Yep, yep, that's right. Awesome. Yeah, I know, and Chris, you mentioned we could probably go a little bit over because yeah, if you wanted to show the things you're doing right now. Yeah, I do want to show off the compliance operator at least. Yeah, that'd be awesome. Go for it. One of our few opportunities to go along here. You might want to increase the font size if you can. Oh, yeah, yeah. Sorry, there you go. That's good. Looks good, yeah. I just installed this. I built the images and solved it from our upstream source. I'm gonna see if, just deleting the pods, I mean, let's see, we actually get a, ah, okay. Yeah, so the compliance operator to try to summarize it shortly, if, so it is an OpenShift operator, so a bunch of OpenShift controllers that they have a, it's got a compliance scanning API that you interact with. And what it does is you basically, you pick a profile that, a compliance profile that has a bunch of associated rules. These usually align to certain standards. Okay, so the profiles have not parsed yet. Okay, yeah, so there's pods that are at, that's actually parsing the profile content. So yeah, so you apply a profile basically to the cluster. And what it does is it goes through and it launches OpenScap under the hood. It launches OpenScap pods that mount the host file systems and read only with just direct privileges, SCC. And it does, and it uses the OpenScap scanner to scan the files based on the profile and give you the results. Is it, are you compliant? Are you non-compliant, et cetera? And then what it can do from that is generate remediations that get applied. So if you haven't, if there's something, if we can auto-tune a flag or something to bring the cluster in the compliance, it can do that automatically. Nice. See if this, okay, so we should have profiles now. Let's see. Okay, so we have a bunch here. So like I will pick the OCP for, I might do our cause for moderate. So, we actually, we have a, let's see. Okay, so what I'm doing here is we have a plugin tool called a cube control plugin tool called OC compliance. And what this, it's basically like a little helper that will create some of the objects that you need to, to just kick off the scan. You basically pick a set of scan settings, you bind the compliance, you bind the profile to that and it runs the scan. So I'll do this for our cause model. So this scan setting, oh wait, this is a scan setting binding. I actually have, so I've bound it to default. Nice. So what the scan will do is it will run on schedule. I think this is like the first, like Monday of the week or whatever. So this is just an example. Yeah, nevermind, it doesn't matter. It's like 1 a.m. every day or something, yeah. Yeah, I always have to look it up. It's gonna apply to worker and master roles. You can set some options, some raw result storage if you need to. And this is one of the things about this one is that, oh I don't know, it doesn't say, okay. Yeah, there's like one, there's a scan setting that's one for auto apply, but that one, which is it'll auto apply the remediation that get generated. So our compliance suite is still running. So all the scans together from the profiles that you select, they all get put into this. They get aggregated together as a scan suite. So you might have a bunch of different rules, some of them, or a bunch of different checks, some of them fail, some of them pass, some of them are info or warning or whatever. It'll aggregate a total status. So if you have every, if all your rules pass, then you'll get compliant here when it finishes. Break, this is broken out into rules, like the profiles, when the profiles are parsed, we parse it from like image, we parse it from a big image stream file that we ship along with the operator. That's like our default content set of the profiles that we've worked on and the content that we've written. So here's a bunch of stuff from FedRAMP Moderate. You can see it's things like, don't use a no password IDP. You know, don't. Kind of simple things in some places, but kind of not obvious in all places. Yes, right. Yeah, and they are applicable based on, like they are written to adhere to the particular standard. So a lot of our work is taking the standards and the requirements and seeing what rules do we need to, what checks do we need to have? Oh, so this is done. So let's see if we can check the views. So. Oh, many fails. Yeah. See if I can, okay. Oh yeah, take it back down. So we can look at like, yeah, so we got a bunch of fail, a bunch of pass. You know, a lot of these are like enabling, like disabling kernel modules and stuff. So a lot of these will get remediated, a lot of these that they can get, they can get remediated automatically. So sometimes there are some that are just manual, like we can't, you know, it may have other dependencies and things like that. So you can't, you know, you'll have to go and fix it. We give recommendations where we, you know, to address the controls, but then ultimately you have to go and address it manually. But you can check out the individual results, but you can also look at what would be remediated. So if I do, so here's the remediations that get generated and their application state if they were applied or not. So if I look at like this one, let me see. This one. You can see this is the remediation. It takes the form as a machine config. So we just post this as a generic cube, you know, as a generic API unstructured object. So this could be a machine config, this could be a config map, could be something else. So that's how the remediation is applied. So if I applied these remediations, they would all get kicked over to the machine config and then the nodes would start to roll, roll just like as if you were to just go and configure a machine config pool as you normally would. So this way there's no like, this takes away the need to have a privileged OpenSCAP pod because I believe OpenSCAP doing it on rel, it can run like a script and then it goes through and it'll fix stuff, like you can run a shell script. So we don't want to do that here. We want it to have the right privilege separation. So by getting it to the point where we're just using OpenSCAP and read only to do the scans and then we're going through the native mode of applying changes to nodes on your clusters, which is through the machine config API. And then there's also the ability to do the same with generic cube objects like some of the other checks worked. It's not that they were running, not that they were running OpenSCAP on the host file on the file system. But we fetch a cube object, like in the content we say, hey, we want to look at this config map. So we designate that in the content and the rule ends up causing the compliance operator to go and fetch and stage that object. And then we do an offline scan just on that object with OpenSCAP. So we still use OpenSCAP, even the look at native OpenShift or cube objects. So that's cool. And there is also the ability to remediate some of these. So you can post a config map, but you can also, if you write a rule and you need to do something that requires extra permissions, you can run the check and then address the permissions, like go and add the role binding to be able to go and change whatever object that the check needs. And then you can run the check again and it'll remediate it. And so we approach it on both fronts. You can assess the compliance of the Rcause nodes. You can also then, are we still on, still there? We are, yeah, totally. Everyone froze, okay. But then yeah, we can do the same for the cube objects. So you can assess the compliance state of the cluster, but also of the nodes themselves. And our focus for the compliance state of the nodes is with getting Rcause to be compliant with various standards. Cool. Very nice, yeah, thanks. Awesome. So if folks want to learn more about the compliance operator, where would you send them? So I would send them to, so under GitHub, under the OpenShift org, there's the compliance operator and that's there for the code. We also have a section in the official docs for using the compliance operator. So the official OpenShift documentation, it's under the security and compliance section. And check that out. And the compliance operator, yeah. Good stuff. And yeah, the code is open, it's another OpenShift repo. So you can go check it out, see how things are being done under the hood, meet your needs. If not, you can customize your own operator, I guess, if you really feel like you need to, but we try to do the work for you, at least the hard stuff. Yeah, and this is, compliance operator is cool because it is, I mean, it's innovative for our space now. I think the feedback that we have gotten on it has been fantastic. I think a lot of the, usually we'll present it and they have immediate questions like, oh, well what happens if OpenScap goes and modifies something on the node? And these things that we've fleshed out and, but we also try to keep an eye on that. If you're gonna run a security operator, operator is supposed to do something security-wise, you're gonna want that thing secure itself. We do try to apply, we apply a lot of careful thought to, the privilege level and everything that we do. So I think that's the tough, that would be the tough part is sometimes with, especially with operator design, sometimes you feel like you're redesigning the wheel and this kind of stuff, but yeah, I think we're leading in this space of cloud compliance. So if I may be so bold. Yes, you can be. Good. Awesome stuff. Thanks for going through the demo and going along with us here. I appreciate that. For sure. So Dave, anything you wanna wrap up with or? Yeah, I'd absolutely like to thank Matt for a great session. Appreciate it and thank you Chris. I'd like to tell everyone, be on the lookout for a couple of podcasts that we're gonna drop here in the next week or so around some more data controls, encryption and protection topics. Next month is a network controls month. So really excited about that and all things network security is August. So stay tuned. Rumbling down when you say service mesh. Yes. It's just not crumbling down per se, just different. Right. So thank you all. Another layer of complexity. Yeah, exactly. But that's it for me. I appreciate it, Chris. No, thank you Dave. Thank you Matt. Thank you Bobby for producing and we'll see y'all soon here on the Get Up Guide to the Galaxy which is at 3 p.m. Eastern 1900 UTC. So yeah, stick around.