 Good afternoon, everyone. My name is Shane Gibson with Zero Stack. Just found out I've got a 10 minute block on a lightning talk here for a 40 minute presentation, so a slight snafu in scheduling. So I'm gonna try and jump straight into the meat of my presentation and skip a lot of the run-up. Some of the run-up is kind of boring in the first place anyways, just talking a little bit about availability metrics and some of the basics you would do in an organization in terms of determining your needs, analysis, and requirements and stuff. So let's skip all that boring stuff. Let's just get right into the technical depth of the presentation. I'm a cloud infrastructure architect at a company called Zero Stack. We do a private cloud solution with a bunch of hybrid cloud stuff layered on top of it. In the past I worked at Symantec as a cloud platform engineering team as an infrastructure architect doing very large scale cluster designs for OpenStack. Been around technology for a long time from mainframes that predated my existence by 20 years in the United States Marine Corps all the way through cloud infrastructure architects have a broad range of experience and expertise. Today's talk is on OpenStack control plane architectures in the various ways you can get the various types of high availability or non-high availability solutions that you might apply to an OpenStack infrastructure architecture. And there are four primary design patterns that we're gonna talk about today and those four primary design patterns are standalone, active, passive, redundant systems and a distributed system. So we're kind of jumping right into the middle of all this. We're gonna talk a little bit about the first design solution, which is a single system. And a lot of people when I say maybe a single server is a sufficient control plane platform solution for you and your architecture. And they say, are you crazy Shane? Were you on crack or something? No, the reality is Linux has proven for decades now that it is a very reliable platform and it's a very good platform that runs if it's well tended for years and years and years with zero downtime. So if you have a small enough OpenStack environment you might seriously consider a single standalone platform for your control plane solution. Now it doesn't mean there isn't redundancy you would bake into the hardware from dual power supplies, multiple NIC cards maybe RAID on the drive itself to redundancy in the storage subsystem. So you still have some redundancy baked into the hardware platform itself. You would also consider broader aspects like the power distribution in your data center making sure you have AB power from two clean sources, battery backup, all of those things would contribute to it. But ultimately it gives you a simple, easy to manage platform. And in a complex system like OpenStack, simplicity rules. One of the slides that we skipped over is a discussion from Kyle Kingsbury from the Jepson tests and Jepson articles. Have any of you heard of the Jepson tests? Anybody? Nobody? Okay, so go look it up please. So Kyle does a bunch of torture tests essentially on distributed systems and he writes very long comprehensive articles on how distributed systems break down and fail and why. And smart people take his test information and actually learn how to better architect their distributed systems. And we're talking about fundamental solutions that you might use, MongoDB, Cassandra, how core sync and pacemaker behave under duress, how a lot of these distributed quorum protocol solutions that we all rely on to be rock solid, how they all fail. And if it doesn't chill your blood on how some of your most favorite pieces of software can fail, I'm not sure what will chill your blood if you don't check out his solution. And I only bring that up as a warning because if you start architecting a distributed system for high availability you have to understand how things can fail and what can get you in trouble. Complex systems are, by very nature of the word complex, complex and they often break down. So consider a single solution, it's a very good solution, you can easily generate around 99.9 or better percent uptime that correlates less than four minutes of downtime a day, 14 minutes a week, something to that effect. One of the additional solutions that you can look at if you wanna scale up a little bit more, you want just a little bit more peace of mind is an active passive system. In an active passive system you essentially have one solution that is active at a time and you have another service that's actually a shadow service, what I call a shadow service that's running ready to take over. There are a number of protocols and solutions that enable you to run active passive. In some cases you have pretty much two patterns. You have an application that's capable of actually replicating its data and its information and its state on its own. And examples of that is something like MySQL. MySQL has active passive replication. So if you're looking for an active passive solution I encourage you to use the services inbuilt replication capabilities and mechanisms because they're designed to do that with its data most effectively and recover in a failure scenario. In some cases you have services that don't have the ability to replicate their own state or their own data and you have to do that via external mechanisms. You might use that something like a replicated data storage, DRBD is a distributed replicated block data daemon which allows you to do synchronous and asynchronous byte by byte replication of data underneath the hood of a service. So you can take one service, service A that's active and replicate all of its data and state and information to another service. Then you would use some orchestration over the top typically something like chorus sync, pacemaker, keep alive D, there's a number of solutions that would enable you to stone shoot the other node in the head which allows you to do a takeover or essentially a hostile takeover. You kill your master because it's no longer performing the way it should be and you bring up your services or your services are already running in a standby capacity on your passive mode and you switch over. So sometimes you use the inbuilt mechanism of the service itself. Sometimes you need to provide that via external mechanisms. One of the other design patterns is a clustered solution. Now this is a generic discussion about clustered solutions. Clustered solutions typically have a mechanism by which they have a leader and followers. This is sort of a common terminology for that. So one leader is elected by all of the quorum to become the leader and he's responsible or she's responsible, let's not be sexist for orchestrating all of the services and who is the actual responsible for making decisions. Now an important thing in quorum based clustered solutions is you always want to operate on the principle of odd numbers. So remember in the real world, network partitions and other partition issues and events occur. If you have a network partition that partitions a four node cluster and you have two nodes and two nodes partitioned, none of them can form quorum and decide who can be a leader because two is not consensus in a cluster. So always remember odd numbers three, five, seven so you can ensure quorum if you have an even network partition event. And this is also when you start talking about solutions like this is where a lot of the Jepsen articles start coming into play. It's very important to understand how some of these solutions can break down and fail on you. And I'm not trying to scare you away from them again. You want to understand how all of those things will affect the service that you're trying to provide. One of the last solutions continuing on with, again my slides are all a little, not in the flow I'm expecting for 10 minutes here but one of the other things you can do is you can apply both virtualization or containerization as an orchestration mechanism by which you can manage the services in a distributed environment. And if you do that, there are a number of things you need to understand. You need to be able to, if you're doing virtualization you pretty much have to do the work yourself. You have to do something like Etsy D console serve some sort of distributed key pair consensus mechanism for doing state distribution in your model to be able to orchestrate who is Matt leader and who is follower and provide data amongst your cluster. And if you use a solution that's pre-baked like a containerized solution they're doing that for you but they're doing all of that heavy lifting for you. So we've all seen probably 652 presentations this week on Kubernetes and Helm and Kola and all of these other things to be able to do open stack on top of containerization. They're doing the heavy lifting in this solution. In the last and fourth model that we're gonna talk about today is a distributed service solution where you might actually distribute your control plane amongst your clusters. So one of the typical scenarios that you have in an open stack environment is you have some very dedicated control plane servers and those control plane servers are very important to you. I hopefully most of you understand that the pets versus cattle analogy. Your control plane servers end up being pets. You name them Fluffy and Fido and whatever. You care about them and it's very important to you that their health is solid and good. And if you, and you treat your data plane services your compute nodes as cattle. Well that's not potentially a good idea or you have a small cluster you wanna do distributed scaling if you're using some additional tools to be able to do placement of your control plane services you might actually use something like a VM container that sits on Libvert underneath the open stack VM workloads. If that's the case, you wanna do some very important QoS mechanisms through namespaces and C groups to ensure performance, reliability and throughput requirements for your control plane solution. I have about 30 seconds here so we're not gonna talk about the detailed solutions. We have, oh we gotta see this one is my favorite slide. Early Clint shooting the other node in the head, okay. All right, I just had to get that one in there. This whole presentation is posted on SlideShare.net. Look, just search for the presentation title you can get the whole entire presentation. My email address is there, has all of the other additional information or surrounding it related to this presentation. You have any questions about all the information that's in there, let me know. There's my 40 minute presentation in 10 minutes. Thank you very much, everybody. If there are any questions, I'll take them offline over here on this side.