 Good morning. Good afternoon. Good evening. Wherever you're handling from welcome to another edition of the Ask an OpenShift admin show here on OpenShift TV. I am Chris Short executive producer of OpenShift TV I am here with the one and only Andrew Sullivan. We're gonna talk about what today Andrew Gas and everything we do Etcd or Etsy D is it some sometimes cults and first I want to say I'm super excited about that intro animation I'm really happy with with the way that that came out and Can't can't say enough nice things and great things about the team that put that together for us So if they happen to be watching thank you this morning. Yeah, yeah, but seriously, it's awesome. Thank you very much Yeah, so thank you chris This is the open shift or excuse me. This is the ask an open shift admin office hours live stream So the goal of this is to give you our audience the opportunity to ask us quite literally anything That's on the top of your mind You know, we're here to kind of help you work through all of those issues any outstanding questions that you might have And help right whether or not that is We might have an answer or going back and finding the answer for you So please at any point in time, you know, feel free to post into chat across all of the various platforms that we're streaming across To ask us any questions However, in the absence of those questions. Hello, Ali in the absence of those questions, we do have a topic and Today's topic is one that I have been super excited about since the day I put it on to the agenda It's something that pushes all of those those good happy buttons for me because it's kind of in the weeds and it's kind of Super geeky and how it works and there's a lot of different stuff that goes into it So with that in mind, I am very happy to introduce our guest anand And anon, please introduce yourself. I'm not going to butcher your last name. I'm sorry No, no problem. Andrew. Thanks for the wonderful introduction. My name is anon chandra mohan And uh, you know glad to be again on this open shift tv, you know, chris is an awesome host I think a couple of months back. We did a You know a few sessions on you know windows now glad to be you know, uh working with andro and chris again for hcd Just as a background. I'm uh the product manager for hcd So i'm responsible for the roadmap planning and future direction of the product So, you know any questions you may have about the futures of the roadmap, you know here to uh, you know answer those away or you Know find out answers that you know, we can come back Yeah, and and as you pointed out you were here to talk about you know, the windows container side of things and like Basically, all of us you have multiple roles multiple responsibilities So for anybody who caught those episodes before, you know, don't don't be alarmed or don't be surprised Anon hasn't abandoned any of those. I don't do we ever get to abandon things? I don't I don't think so. We only get more Uh, we deprecate them. Yeah. Yeah, so It's it's just another hat that he happens to wear right, so um In in unusual and somewhat unusual fashion. Usually I have an agenda for these Where I It's a loose agenda, right? I everybody knows I like my sticky notes This one we have an actual like word document or or uh, yeah An actual google doc and it's like two and a half pages. So it's like it's ready for a book to be written It honestly is so I'm I really want to devote as much time as possible to today's topic because I think it's a big topic It's a complex topic. There's a lot of questions about it. It's a little bit of a black box, right? Etcd is to a lot of folks, uh, but I do want to quickly cover, you know in traditional ask an open shift admin our uh, uh Tradition I use traditional twice there. Didn't I? Um Just a couple of things that are top of mind things that I think you all should be aware of that have come up either internally or externally Mm-hmm. Uh, so the first one very short very sweet very simple Uh open shift 4.6.21 is on the cusp of being the first uh 4.6 stable upgrade to 4.7 So it is in the and I'm gonna go ahead and share my screen here Share there We'll move you your lovely pictures over here. So it looks like when I look at you. I'm looking at the camera If we go to github.com I don't think I'm typing that right, but I can't see because your pictures are there Move our picture And if we go to the Cincinnati graph data repo So you see I'm just in github.com slash open shift slash Cincinnati graph data So this remember is the repo that it pulls all of that information in for updates upgrades, etc You can see that there's an open pr to enable 4.6.21 in the stable channels. So Usually takes a couple of days after these are submitted. Um, so like I said, it's on the cusp of being available That doesn't mean that it will that doesn't mean that they still won't you know, there couldn't There might be some leap breaking thing That will cause it to delay but for anybody who's on 4.6 And ready to go to 4.7 in the stable channel. We're almost there You know, I think all three of us were here back in the 4.4 days Right the upgrade from 4.3 to 4.4 and stable took a long long time and The engineering team the product management teams have done a lot of work to make sure that doesn't happen again That we can make those those updates happen a lot faster Yeah, and it just gets better and better with each release. I feel like to Agreed, uh, so the second thing that I wanted to bring up real quick. Um, this this one has caused a little bit of reoccurring drama So if I go to the docs here and I want the 4.7 docs and I go to installing And I want to install to vSphere If I look at the requirements here for installing to vSphere it's Interpreted it's not a stretch to interpret that we're saying the minimum supported vM vSphere version right Using nsxt, right? It's easy to misinterpret that and say oh well in order to deploy open shift to vmware. I need an sxt Uh, so we've got a couple of bz's open with the docs team to basically clarify It's not required to use nsxt if you want to deploy to open shift or open shift to vSphere Um, but if you do choose to use nsxt then there are some version requirements associated with that so Just a quick one. This comes up every week or two. I get an email asking, you know, oh It can is nsxt really required? Well, no, so just be aware of that uh, and Last but not least I had a question come into my inbox About the default storage classes created with ipi and upi so If you deploy to for example vSphere using ipi or upi The installer will automatically configure a storage versioner the entry vSphere storage versioner in that case And will automatically configure a storage class. It's named thin It is configured as the default class and it's pointed to whichever storage domain or data store the vm sit in So you cannot delete that storage class If you do it will get recreated and that is done by the cluster storage operators and go back to github And if we search for storage here, you know, as soon as I spell storage right Still can't spell storage, right? Here we go cluster storage operator Uh, so this cluster storage operator is responsible for making sure that there is always at least the one defined Storage class for each platform in the cluster So you can for example change that storage class if I want to keep the thin storage class I want to keep it as the default I can go in and modify it to point to a different data store for example I can create a new storage class and mark it as default But that thin storage class in the case of vSphere and it's different for each one of the platforms But that thin one has to be there and if I delete it the operator will recreate it and remark it as default So just be careful if you're doing that Make sure that you're aware of that behavior. Um, if it seems weird like oh, I can't delete it Or oh every time I you know try and set a new default it goes back Well, make sure you're not setting a new default and then deleting the original which will cause it to recreate Yes Okay So, um, I'm going to ask you a question Anand and then I'm going to catch up on on chat while you answer And I'll be I'll also be listening to you, of course So the first question I wanted to ask great. So coming back to our topic of the day, right etcd so What is Etcd what what what how does it What purpose does it serve kind of how does it function inside of you know, not just open shift or kubernetes, but in general Yeah, etcd, you know keep it simple right so etcd is basically a key value store for you know storing the state of your You know cluster So your config maps your secrets and you know other you know protected resources are stored in the etcd And you know like the title of this, you know session says it's really the heart of you know the control plane so that's you know, that's essentially what it is and you know, it's an open source project and It's you know very much a critical part of the control plane And you know, we have built a etcd, you know operator that watches for changes to the cluster and you know reacts in a certain way and You know over the course of this session we talk about how you can you know back up the store how you can restore the store How you can encrypt decrypt the store, but it's really you know at the core It's a key value store for storing the state of your you know your control plane And I think that's important because it's kind of a database, but Kind of not a database, right? You specifically use the term key value store And how is that different from a database? Yeah, it's different from our database in the sense that this is not you know, your traditional oracle rdbms Where you store inventory data or you know customer, you know fleet management data This is you know meant purely for storing the internal state of the cluster And this is not meant for any user-level workloads like for storing You know like like I said inventory information of your you know your shipping application, right? That's how it differs from a traditional database And etcd it was originally created by the coro s team before they were a part of red hat That is right Okay, so Etcd key value store it is used for the persistence layer inside of kubernetes or or not all kubernetes is But many kubernetes is and most relevant for us today open shift So what types of things does open shift in kubernetes store inside of there? Yeah, you could use it to like I said store config maps, you know secrets Uh You know routes, you know all the access tokens all the authorization tokens Uh, you know, those are a bunch of things, you know, you can store inside at cd. Okay. Yeah, usually The the way I think about it is Anything that's yaml is inside of etcd, right of anytime I create some sort of object it is being stored inside of there and It's important because okay I create a new pod, right? I submit it to the api as that yaml definition the api server then modifies that cd right etcd it adds in that object And then the scheduler does some work right the scheduler says hey, there's a new pod definition I need to I need to schedule it and it will update that object With the chosen node for example and notify that node that it has been You know, hey, you need to go in and start executing this pod and then that node will instantiate it And it's making updates to that object as well. It's adding status information. It's adding in all of these other things So it's not so much that you know, it's not a database in the sense of you know row values being stored But it's critically important to the functionality of open shift of kubernetes Every object has many different operations that are happening around it and with it and against it So, yeah, I think that was accurate i'm looking to you to to No, that's that's yes for no, okay so I I want to and usually Usually these conversations when we talk about etcd turns into a performance conversation And I say that because performance is most frequently the issue that people have with etcd Right and it ripples out into all different components all different aspects of the cluster So as I said, you know, if I create that object, you know a pod If it takes too long for that to be committed into the database Right that can cause Other issues right it takes longer to schedule once the scheduling happens It takes longer to actually be instantiated once it's been instantiated It takes longer for each one of those operations and there's You know hundreds or thousands of operations depending on the size of your cluster that could be going on at any point in time so slow performance effects basically everything and I want to start the performance conversation and and kind of build into it by starting with how does etcd actually work And usually for me this starts with the raft protocol And what does that actually mean and I actually have a very convenient graphic here that I'll point in I'll plug into a chat here This is a very the secret lives of data.com It's a pretty simplistic explanation chris. I see you smiling right this one is is pretty well known about how to Break down and understand the raft protocol and how it works with data persistence, which is what etcd uses I'm just going to click continue here to more or less show What it looks like and actually I think I want to jump ahead Yeah I want protocol overview so essentially the distributed consensus is the important part so one of the nodes will be Elected as a leader come on we'll get there or and then the others will be followers So in this instance we have The three nodes that will go through they all start as followers They will elect amongst themselves a leader based off of whatever criteria happens to be defined and eventually we end up with that That defined elected leader So from a change perspective, so let's say this green dot here is the api server The api server communicates with the leader and says set some sort of value So that value goes to the leader the leader then logs it And then it sends that data to the followers The followers then log that data And then reply back. Hey, we've got it And then it goes back and it says okay. We have enough Nodes right enough of enough of us have agreed that we have this data now. Let's commit the data right and at that point the leader will return back to the Back to the requester the api server in the case of kubernetes that hey your data has been saved So the really important part here is you noticed that Once the data was sent to etcd to the leader It had to go across it had to get written It had to go across the network to the followers who then had to write it Who then had to come back to the leader and say, hey, we got it Who then had to go back and commit it who then had to come back and say, okay I've actually stored the data So you notice there's a bunch of different operations happening there, right? There's at least You know six writes and what eight network traversals happening across the three nodes in order for that data to be committed So this is why when we talk about in just a moment, I'll start, you know, we can start talking about the actual requirements for the actual Values right so the the latency the network for storage and network etc that we suggest This is why they seem super low Because there's a lot of that traffic that happens and they stack on top of each other So one request is a bunch of difference one request to write data Is a bunch of additional requests on the back end at the database level or at the etcd level rather All right, so I will pause here Chris some questions please How does that cd I mean we're gonna talk about this in depth, but how does that cd work? What are the dependent services or objects for etcd, which I think is an interesting question because like etcd is hard But there is some dependencies out there Like storage Yeah, I'm and anons I'll throw out what I think my answer is and please add on correct me if if needed So etcd itself doesn't have other dependencies in kubernetes And I'll put an asterisk on that But it does have infrastructure requirements So of course you need to have the storage available right you need to have the network available of that type of stuff So the deployment inside of open shifts Really has a dependency on the cluster the etcd cluster operator So this is what I have here. I'll paste this into the chat And I say that because the operator is what's actually configuring the nodes or the pods in the cluster So let's let me switch over to My my terminal here and if I do a oc get pod open shifts Etcd namespace so we can see we have these three pods up here at the top One on each one of the control plate nodes If I do a describe Against that pod Oh, yes, I know I have to Supply the object type So if I describe that pod And come up here to the let's make this a little bigger. So hopefully it doesn't wrap quite as badly So if I find my where is it? Etcd pod right here You can see that it has a lot of values a lot of data that's being passed to it So all of this configuration data comes from the operator So the operator is the one that says hey, this is where you can find your certificate And importantly once we get down here into the actual Script Right, we can see down here. It's saying hey, this is where all of your peers are at This is where you can find the other nodes inside of the cluster so While it isn't The concept or the implementation of etcd itself doesn't require something else in kubernetes OpenShift does And so this is why we have like the whole bootstrap thing and when we instantiate a cluster We stand up the bootstrap and it creates a single node cluster That then instantiates the control plane and hands off to that now three node cluster and So that's how we work around the chicken and egg. So I'll be quiet now and let Anand Yes Yeah, can I share my screen for a second? Absolutely. Yeah Yeah, so I think, you know, Andrew, you were in the money with all the statements, right? So etcd, you know has no prereqs It's a day one, you know operator like you, you know mentioned So it's installed, you know with the cluster a key part of the control plane One of the prerequisites though for having good etcd performance is, you know, a good storage backend, right? And you know as you know open shift to support on a wide variety of platforms, you know private clouds public clouds You know edge networks and whatnot. So one of the key things to get good performance from your etcd operator or from your etcd Database is to make sure that the storage backend that's you know back in the cluster is, you know, performant enough And here is where, you know, this utility called file comes in, right? And so one of the things that you might want to do before you install your open shift cluster And you're worried that you might run into, you know etcd, you know performance degradation issues later on Is to run file, right? So if you run file that will give you a status of whether your storage backend is, you know, good enough or not, right? And again, even if you're on a cloud platform, let's say, you know, AWS or Azure or GCP You know, you might you have access to, you know, wide variety of disks, right? You have access to, you know, SSDs, you know, ultra HDDs Ultra SSDs, you have access to NVMEs So ephemeral storage, you have access to so many, you know, types of backend storage and each of them, you know, offer a different, you know, You know, type of performance. So one of the things you can do is run this, you know, file command So I'm going to copy this, put it up here And so when you're on this command, it usually takes a few seconds You want to watch for one metric, right? You want to watch for a metric called the fdata sync And you want to make sure that the 99 percentile of that metric is less than 10 milliseconds, right? So we let this guy, you know, run for a few seconds and we observe, you know In my case, my cluster is I think on a cloud platform And we will see in this case if this installation is going to be good enough to support a good CDP forms So that's the output and this is the different, you know, percentile metrics for fdata sync And you can see the 99 percentile is 465 microseconds. So that's around, you know, 0.4 Milliseconds, that's, you know, way less intense. So we are, you know, in pretty good shape here You're in real good shape. Yeah, actually, the fdata sync is the second second block Oh, sorry. It's still really good at five milliseconds. That's right. Sorry. Yeah That's right. So yeah, so this is, you know, five milliseconds and way less than 10 milliseconds. So I think we're still good in this case so Sorry to interrupt you. I think we should have you on more because you're you're literally doing what was my next step So thank you. You can read our minds. So this is good. Um, I I don't want to interrupt you if you're going to lose that thought, but we do have a couple of questions If that's okay, let's take the questions So we say that etcd goes across the network. Um, does it utilize its own configuration for connecting to contacting the other members of the etcd cluster Or does it rely on things like kubernetes services to find and discover those members? Good question. I will need to find the answer to that. That's a good question So I think the answer is it uses so this is one of those things that the uh, the operator does the operator Basically points it to the other cluster members automatically so Before the operator so prior to um, so inversions open shift 4.3 and earlier We use the dns srv records. So if you happen to go back in the docs and look you'll for during install You'll see that we require all of these additional records for etcd for srv records pointing to etcd Uh, so that's how we did that peer discovery But the operator I think takes care of that and I thought that was one of the things that we see over here Yeah, we have got rid of you know, the uh, the dns requirements But I'm not sure whether that goes through services or ingress or you know, what type of network protocol It uses to communicate with other etcd pods lying in Yes, so if you look at um, so that command that I did earlier, which was the uh OC described pod on one of those etcd pods and the open shift etcd namespace You'll see for example, there's some environment variables like all etcd endpoints or um node underscore control plane underscore identification It's etcd name etc url host Right. Those are the values that it's using to find those other members Um, and importantly, I think this is why um, the the operator specifically is why Like we can only tolerate losing one of the reasons we can only tolerate losing a single node or recovering a single node of etcd at a time Because the operator has to be there to reconfigure the surviving nodes To point to the new node so that way they can all recover Let's see. There was another question up here Uh, the revision pruner. Um, so what what is the purpose of the revision pruner? And I don't know that authoritatively I think it's um, I very much to your point while lead I believe that it removes older versions So essentially if there's a configuration update it is responsible for pruning right the old pods that that happened to be left there um So i'm going to share my screen again real quick to help answer your question about running Fio on coro s there sam So and the reason i'm going to share it is because if we switch to the documentation here So we have this recommended etcd practices. Let me paste this into the chat here If we scroll down in this documentation session, you can see that we have some podman commands that we can use to run Um, and it would be pretty easy to translate that into right running inside of a pod or running inside of Open shift So that being said, um, I actually did create let me see if I have the link handy A while ago I created a Very simple gist that shows how to Run etcd inside of an open shift cluster But I think it's important to understand that this just schedules it as a regular pod Right, let me scroll down here Right, so it's not going to be running on a control plane node, which is really the important one Because etcd is actually running on a control plane node and while Most of the time it is going to be very similar performance It is not always going to be the same and in particular what I mean by that is um, for example, the operator and the installer Are designed by and created by and tested by engineering to take into account a lot of different quirks The biggest one or arguably the most well known one is azure So with azure We deliberately tune and and I want to get to these in a few seconds We deliberately tune some of the etcd settings as well as we use things like I think it requests a one terabyte Drive for the control plane nodes To account for all of the iops and all the other things that it needs So the etcd or excuse me the control plane nodes may be substantially different than a worker node In addition to just the worker nodes, you know, they come from machine sets oftentimes So it it's Close but not always the same. Just keep that in mind and anond. I think you had something to add Yeah, so it was a preplanned a few questions in chat. Yeah So, you know a couple of things again, you know going back to the fire example, right? So from a product perspective, one of the things you want to do is you want to integrate fire As a part of the install process, right? So that way you don't have to run it and then you know run the install If we find that most of our customers are actually, you know having a need to run this We want to make it as a part of our, you know, day one install Maybe when you you know generate the manifest or you have any generate install configs or maybe when you say, you know cluster create Uh, you want to make sure the fire utility pops up, you know checks the status of your disks and gives you a report Right, whether you're good enough, uh, your back ends are good enough or not It's not good enough. It's going to splash a warning We don't want to block install at this point, but we want you to be at least aware that, you know Your f-sync is at this level of percentile. So you could expect a good or a bad, you know performance from Ed City So that's one of the things we're looking at is to bake in fire as a part of the open shift install So you're aware, you know prior to time, uh, but the what kind of performance you're going to be getting, right? That's you know point number one point number two. Let me actually again share my screen here point number two is Uh, it's the fact that okay, so now you Have determined that, you know, your hcd is not going to give you a good performance, you know What what are the options you have, right? So one is obviously you can upgrade to better storage like you can, you know upgrade to nbme Maybe you can upgrade to ultra, you know hdds And whatnot the second option you have is you can always mount vad lib hcd to a separate You know directory, right and you're going to provide documentation I believe it's only there for most of the cloud providers aws azure and vSphere and I think even bare metal To show you how you can mount external volumes For your vad lib hcd to a separate, you know high-performance disk and attach that to your cluster, right? and i'm going to dig that out so Give me a few seconds as andro is walking through the presentation I will actually provide documentation on how to mount vad lib hcd to a separate secondary disk and uh One of the things we're looking at doing from product side is how you can bake this in again as a part of You know, maybe day one or as a day one install, right as you're installing through day one Uh, if you want to use an external secondary disk for hcd We want to provide you the not we want to provide you an option of doing that as a part of your day one install Yeah, I know that's something that has been asked about somewhat frequently since you know OpenShift 4 was released and you know OpenShift 4 with CoreOS has been a much more rigid configuration And a lot of people with OpenShift 3 would dedicate, you know a disk to etcd So right, I know that is something that is very much looked looking being looked forward to Um, so i'm going to share my screen again as well And the reason why I wanted to do that is because I want to share this kcs Um, so this kcs kind of summarizes all of the things that we've been talking about thus far And kind of breaks it down into different component pieces So for example, this paragraph here is arguably one of the most important or this section here So applying a request should normally take fewer than 50 milliseconds Where a request refers to the amount of time that it takes for The etcd cluster to commit a or return a piece of data And that 50 milliseconds is going to be the aggregate of all of those storage and network latency operations And that is why we make kind of stringent, you know requirements around the Around the performance so we can see here slow disk You know p99 duration should be less than 10 milliseconds So database size related issues So we'll talk in a little bit about what defragging and compaction is and how that can impact the performance as well Overall latency comes from network latency as well as storage latency, right? So here you can see that we don't provide a specific network latency Requirements instead it's wrapped up in this 50 milliseconds number So if your storage is slower, then you need maybe need faster network to be able to reach that 50 milliseconds, right that type of stuff So it's it's very much a A lot of balancing acts. Yeah There's a lot of knobs as well I was about to say that you can dial in and dial out to kind of get to where you need to be On a large infrastructure kind of set up Yeah, and the other one that I wanted to uh to share is this kcs So I created that um super simple gist So this one is very simple of here They recommend doing an oc debug into the master node to do that same podman command Nice, I'll post that kcs in there as well. I'm gonna save that one because that's a handy one. Yeah And let's see what other links do I have out here, uh cluster storage operator. We already talked about that We already talked about that so Let me check my notes in a non, please interrupt me at any time if you're uh, whenever you're ready um, so one of the questions that we get a lot is basically what happens if My storage performance is right on the cusp, right? What if it's right at that 10 millisecond range? What if it's You know when I when I measure Operations, oh one thing I should uh talk about here is etcd control So if we go to Where am I looking here? Uh, so if if we go to the official etcd documents We have a command which is etcd control, right? This is how we interact with it And you can connect to you can utilize From a control plane node. You can use etcd or etcd control to interact with it and specifically if we come down here to Come on If we come down here to the operations guide and we look for performance There's going to be a set of commands that we can run against the cluster And the one that i'm looking for that i'm not seeing at the moment. Uh, so there's the benchmark CLI tool There is also etcd control etcd cuddle check perf Dash dash load equals and then we have small medium large extra large So effectively what that's doing is The etcd control is running a workload, right? It's doing a bunch of queries and a bunch of other stuff against the cluster and then returns back some Data from that some values and that was where where did I just do? What did I do with that gist? I thought that or I think that one of the things that I was showcasing in there is the The output of some of those etcd control values So essentially this is a good way to determine or Identify what that Let me go back over here to our Kcs right what that request latency or what that overall request time happens to be And gauge whether or not your cluster is capable of meeting those requirements So let me catch up on chat real quick here Fastest disk you've got to etcd. Yes, chris. I see that's you Yeah, do you want to check simulate disk events? Thank you will lead for the link. I'll have to check that one out Oh anond, I see you're answering the question about how the pods communicate with others. They use the host network Thank you for saying that because I was about to type it when you hit enter Christian. Yeah, I I'm an equal opportunity offender. I try and switch back and forth between all of the different pronunciations of all of the different control cuddle and ctls Whether it's for You know kubernetes open shifts etcd Trident Trident's the the net app csi permission Yeah Okay, so I think I'm caught up on chats. Please just reminder anybody feel free to submit questions at any time. Yeah Yeah, and andrew as your You know looking at the questions the two links I would like to share with the group here one as I mentioned is You know how you can use the secondary storage Uh, you know for etcd. So this is a case is article on how you can mount secondary storage I guess this is not just for etcd, right? So I mean any you know file system in open chef, you know could be wildlife containers So this is that generic article and specifically for uh etcd And specifically for cloud providers like, you know, uh bare metal and v-spear There is actually detailed instruction on how you can mount bad lip containers bad lip etcd And for that matter just flash wall on a separate, you know file system and Let me paste a link to that as well And the key thing I want to make at this point is as you can see these are, you know, day two tasks, right? After the cluster has been created if you want to mount slash bad lip etcd to a separate, you know secondary disk that is, you know, highly performant here are the steps to do it as a part of day two But you're looking at taking some of these steps and making it as a part of the day one operation So you can do it right as the cluster is being installed and you don't have to worry about doing it Day two, so that's something we're looking at from a product side nice Okay, so in the interest of time, I think I'm gonna Move ahead a little bit. So we've we've beaten the performance side of the the house We've beaten that in a shape. I think if anybody has any questions Please let me know We have a bunch more resources that I'm going to include in the show summary blog post So those go out Friday mornings on openshift.com slash blog So I'll have just a huge amount of links literally we're chris and I were talking like two and a half pages of Notes leading into this session. I'll include all of the links and stuff there So troubleshooting, how do we identify? How do we know that there are issues happening with our cluster? So Let's go to my cluster over here And the first way as always will be you'll see logs about it You'll start seeing some Start seeing some events. What's going on here cluster Oh, you gotta log back in apparently it logged me out. Oh, there you go. That's my token expired Um, so you'll start seeing some events. You'll start seeing some alerts happening inside of here, right? There'll be all kinds of things that are basically saying some things really really wrong and usually it'll also ripple out into other things Other operators. Yeah, there are other pods that'll all start complaining as well So The usually when it becomes a problem, it's very obvious. So now, how do I actually determine how bad the problem is? And fortunately and the performance dashboards here. We have one that's dedicated to ATCD Thank you So, yeah, and this one has been here. I think since day one Yeah, I don't recall an open shift that didn't have this Yeah, and we we sometimes add and remove charts and stuff like that that are here And there's also a tremendous amount of additional information available in Prometheus that isn't in the dashboard So if I jump back over here to this recommended host practices documentation page You'll see that it it suggests some additional values that you can check for that you can monitor Inside of or via Prometheus So the dashboard is just meant to be a quick and easy way So for me personally The one that I pay most attention to or the ones that I pay most attention to our pc rate is generally how busy it is overall Right, so this you would expect this to go up As the cluster increases in busyness And I specifically use the term busyness here and not scale so When or or because I can have Three nodes that are absolute monster nodes that have 500 pods deployed to them and those pods are You know super busy And they are generating a lot of activity or I can have a cluster of 100 nodes That have one pod each that are not doing a whole lot and therefore So it's when we talk about scale for etz d and how it affects the scale of open shift It is more about the number of objects in The database in etz d rather than the number of or the size of nodes, right type of thing So arguably the most important single metric is this disk sync duration It turned that off so we can make it a little bigger So this distinct duration is it it's literally the latency for writing values to disk for each one of the nodes So pay special attention to that one. This is a good indicator if you're having issues with With that storage latency problem If we look down here in the various pure trafics and client trafics, this can sometimes be an indicator of Network issues, right if you start seeing one node or two nodes that are having a surge in traffic Or one node that's way below the others This doesn't count So you can kind of always expect one node to have more pure traffic because that's going to be the leader The leader is going to be having additional traffic But yeah, if you see one that's really an outlier or something like that then that can be an indication And then of course the total leader elections per day. This one will be obvious So remember when I showed the raft image, the first thing it did was elect a leader And then that leader is the one that handles all of the read write operations for the api server So if it's having trouble or if the other nodes are having trouble with it So they can basically say I don't trust the leader. I need to elect a new leader When that happens it basically pauses everything It doesn't do any reads. It doesn't do any writes It just stuns the whole cluster and that will very much ripple out across the rest of open shift very quickly Oh, and that means the api server stops responding right all kinds of other stuff So if you have you know something as simple as like doing an oc logs dash f oftentimes it'll cause the api servers to To to drop those connections and stuff like that So you'll start to see those leader elections happening That is a strong indicator that there is something Something bad happening inside of there And then of course, okay, we've identified something right we think that it's You know x or y or z. We see it in the events. You can also of course check the pod logs I showed how to do that a moment ago on the cli just oc logs Against the pods that are in the open shift dash etcd namespace Okay I said before that we were going to talk about some maintenance tasks so Anand, I think you probably have a better method of describing this than I do and I'll catch up on chat while you're While you're running through it. So can you tell us what compaction and defragmentation are? within etcd And if not, I do have an answer And that I may just need you to we need to rely on you to say yes, and or you're right or no andrew you're wrong because No, why don't you go ahead and do an answer. I'll correct you. Okay so my my two cents or or my layman's version is that compaction removes key history So it's etcd is versioned right as I add new data It basically creates new versions of those keys So when I compact the database and we can see here in the metrics we have this db size When I compact the database it just goes in and it removes all of those old versions But when it does that it leaves holes In that capacity and that in that space utilized And this is not necessarily a bad thing, you know, particularly if we're talking flash media It's not like we have to wait for seek time or something like that But at the upper end when we're talking about a very busy cluster of a very, you know Scaled up cluster with lots of pods lots of other things stored in etcd. This database can be pretty substantial So after doing a compaction It's usually recommended to do a defrag which takes all of that data and brings it down just like those of us who've been You know administering systems for a while right just like we did with spinning media Right, it brings all of that data. So it's contiguous on the disk And Effectively at that point it frees it returns to the system all of that's unused now unused capacity So it reduces the size of the database Uh, so this is why it can impact performance right lots of other things So a couple of important things here If you look at the etcd documentation what happened to the documentation here we go, uh, it'll make a lot of Here frequently ask And I lost where I had it inside of here anyways inside of here inside of the faq It'll make recommendations and keep in mind This is the upstream the the official etcd documentation not specific to open shift Um here recover from low space quota compact defragments and then do away with the alarm rates stuff like that It'll make these recommendations It is technically possible to do this manually inside of open shift However, we don't recommend doing that And the reason is because the system does it for you So Compaction and a non, please correct me if I'm wrong. I believe compaction happens every five minutes automatically And defragmentation happens whenever the nodes are rebooted The rationale being compaction. It's happening frequently. It's a relatively low i o workload Uh when it happens Defrag on the other hand is a relatively high i o workload So if we were to do that against a running node in the cluster it would Basically, the expectation is it would impact disc performance, which would ripple out into You know other aspects of etcd performance, right? So we automatically do it when the node is rebooted before it has rejoined the etcd cluster So that way it's not impacting other things Inside of there go on zoom banner get out of my way Yeah, that is right. Andrew just to you know summarize, you know Compaction is you know, deleting key history and defrag is you know reclaiming the empty space and returning it to the os Compaction to confirm occurs every five minutes by default And defrag needs to be initiated by the admin, right? So the etcd cuttle commands or the etcd control commands andrew that you're showing You know has a flag for defrag, right? So you could just say etcd cuttle defrag Something that can be initiated by the cluster admin if you take a few minutes and that should complete the defrag The compaction happens every five minutes Yeah, so generally speaking unless something has Something else has gone adverse has gone sideways. We wouldn't expect for those to need to be done Uh manually even in a very busy cluster Assuming and I know this is a big assumption as as I have learned the hard way Assuming you are rebooting the nodes regularly, which means you need to be updating your nodes regularly. Yeah So effectively, you know, we release z streams every two weeks ish So, you know, we can kind of reasonably expect most nodes to be rebooted at least once every two weeks And of course if you're configuring or changing other configuration with mco that would result in a node reboot That may be more frequent So just just be aware of that Yep. Well, yeah, every every one to three weeks is usually kind of the average I I would think anecdotally for between node reboots Um, so the last thing that I wanted to talk about Just very quickly is and then Ananda I want to hand over to you with at least five minutes to go to talk about roadmap is backups a lot of people talk about backups and I think it's really important that we understand that for etcd backups Are not necessarily great for disaster recovery And you may be thinking that's literally why we create backups, Andrew Right. Why are you telling me that I can't use that for disaster recovery? So the backups are literally a snapshot of the database of the etcd data space And remember all of those objects represent the literal configuration of kubernetes of open shift So if I were to suffer a catastrophic event at my site right with my database I can at that same site with that same open shift, right? It's the nodes are still up and running just something happened to etcd I can use that backup to restore it and it'll come back relatively quickly Right. So the support folks are super good at figuring that out and getting things up and running again But let's say that it's a disaster recovery scenario, right? Something happened at the primary site and it's not coming back not for a good long while I need to recover on the dr site That dr site is almost guaranteed to be different in some way And if you recover that backup etc database It doesn't know that it's going to come back up and it's going to think that everything is exactly the same as it was So my personal opinion and I think I can't see christian But I'm pretty sure he's going to be nodding his approval is to take kind of a get ops approach to dr Right of right don't don't literally move everything instead reinstantiated on the destination side I know that's probably a little bit controversial. Please feel free to beat me up in the comments or You know publicly on social media, etc. I'm happy to defend my position on that But sometimes we have to think a little bit differently about how things are done in kubernetes So with that and on I want to hand over to you We've got seven minutes left to one correct me. Tell me tell me how I'm wrong and to talk about the roadmap Sure, I just want to complete your conversation on backup and if you don't mind and then I'll wrap this up So on backup, I just want to you know show a couple of you know ways you can back up the etcd cluster So here is you know some good documentation around it. This is for 47. I'm going to paste this link here But essentially if you you know do oc Let's go back here We'll see get nodes So now I would say oc Debug node Let's pick one of the masters. Let's pick the first master And then you would you know navigate to The host directory And then you can run this command, right? This is use a local bin cluster backup.sh And that's specify where you want the backup to go in this case home code assets backup Once you do that it again, you know takes a few minutes It backs up the etcd database and it also backs up all the etcd pod resources So if let's say now we go to this directory home code assets backup You will see two resources or at least two resources. These are two backups So you see four but the point is you should see Two resources one is the backup of the database itself and then the backup of the static pod resources Which also has the encryption key So make sure that if you're backing these two things up, you don't leave them at the same place It's kind of like you know locking your house and leaving the key, you know right outside the door step. Yeah, just right there in the lock So yeah, so make sure that you know, there are two separate files are you know Stored in separate places, but to restore it you will need this file which contains a static pod resources because it contains the encryption key So that's you know basic, you know example of a backup. Let me see if I missed anything Yeah, the other thing to note is the backup only the values and not the keys So if you have resource types namespaces and other object names, they will not be encrypted So make sure you don't put you know confidential information and namespaces and you know other resource types So that is around backing up at cd. There is again a you know procedure for restoring from the backup I'm not going to go through the sequence here, but I will paste the instructions and you can I just did it. No, you did it too Yeah, you can easily you know restore from the backup and Let's see what else. Yeah, last but not the least you know encrypting at cd right as you know, you know a lot of the important you know state information of the cluster is stored in at cd and if my browser cooperates here I'll show you how you can actually encrypt at cd Let me actually uh, so one of the things you can do is you can say exit out of my Masternode and say oh see edit api server You got a dollar sign beforehand Yeah, of course I'll be based magic Yeah, so if you see we'll see at a Edit api server this contains a configure Configure information of the api server and you can see that the encryption has been turned to a s cdc which means you know all the uh ncd resources which includes the open ship api server the cube api server and the oauth api server all of them are encrypted If you want to turn this off you will just change the type to identity So you'll just change the encryption type to identity and if you want to turn it back on you will just in a specified encryption type Once you specify the encryption type you uh, you know save this file out And you take a few minutes, you know for the api server to recycle and You know apply encryption Now if you want to check if all the resources are encrypted, you know starting with the open shift api server You'll punch in this command You can see encryption is complete all resources are encrypted including the routes Next let's check the cube api server Again, you can see that secrets config maps are encrypted as a part of the cube api server And last but not the least the authentication operator or the oauth api server As you can again see encryption is completed for oauth access tokens for oauth authorization tokens So pretty much all across the board all you know resources have been encrypted And again, like I said if you wanted to you know go back and You know unencrypted you just go back and you know flip the encryption type back to identity And again, you know wait for a few minutes and you know the encryption should be turned off So again, you know, it's pretty straightforward. I'm gonna you know paste this link Let's see. Did I miss anything else in encryption? That's pretty much it so we know you know how to do a backup to do a restore to encrypt and be encrypt at CD and all the resources that are encrypted as a part of the process And I want to wrap. I think we have two minutes. I want to wrap with the roadmap Yeah, you know what's coming up next So at CD has been bumped to 349 is what the slide says, but I believe the latest update is 3.4.14 I believe that's what is the latest version Um We want to you know improve disaster recovery and backup, right? So you saw me doing manual backups You saw me doing manual restores But where you want to get to in the future is to have a more automated way of doing backups and restores, right? So what we want to do is you know provide a config file like a yaml file where you can specify How often you want to take these backups where you want to store these backups? Uh, you know, how you want to name these backups, you know For instance, if you want to take periodic backups every 24 hours and then you know Send those backups to an s3 bucket of aws, you know We want to provide a config file where you can you know specify the automation around that, right? So we're doing a lot of you know, work around improving the backup and disaster recovery You know scenario like taking snapshots destroying from snapshots storing those snapshots Specifying how recurring those snapshots should be backed up and you know things like that So that's one of the major things we're working on The other thing we're working on like I said is moving secondary disks for hcd as a part of day one install so you don't have to you know, worry about setting it up as a day two operation and With that you can get access to you know faster desks as you're setting up the cluster The other key thing I want to point out is we are looking at you know Scaling up and scaling down the hcd operator So as you know, you know the default install of open shifter comes with three masters But let's say you want to scale down to a single node open shift You know, maybe for you know coder containers or you know, maybe for edge deployments You know for one of the reasons you want a single node, you know deployment open shift You know make sure that you know hcd supports, you know those single node, you know deployments So those are some of the things we're looking at, you know in the near future that we are, you know actively, you know working on and also, you know always improving performance always trying to you know gather more metrics from prometheus and You know always trying to make sure that the cluster is you know reliable those You know those things will never go away All right, well, thank you anand And I know we are right at time. I know we are backed up against The open shift common session that is coming up. So thank you very much anand We greatly appreciate you coming on and sharing with us today Thank you very much to our audience greatly appreciate you sticking with us asking all of the phenomenal questions Yeah, if you have additional questions, please don't hesitate to reach out to either chris or I So I am on twitter at practical andrew just like my username here in twitch Or chris is at chris shorts on twitter. You can also So and you can also reach out to me via email andrew.sullivan at redhat.com We're happy to answer any and all questions that you have whether they're about etcd or not Don't don't hesitate to reach out and thank you very much Thank you all see you in a few seconds