 I want to start with a little story from my life. The story is set in 2012, or more precisely in April of 2012. Now, 2012 was a little bit weird at the time. The world was a very different place compared to right now. Among smartphones, iPhones had a majority market share, kind of unthinkable today. There was a cloud platform called Eucalyptus that somehow was still a thing. And there was a distributed storage platform that was called SEF and that was new that nobody knew about, all of which is kind of unthinkable at the time. Or right now, actually. And I, at the time, was in San Francisco. April 2012 in San Francisco, we had the Folsom Design Summit. It was the first OpenStack Summit that I attended. Who here in this room was at the Folsom Summit in San Francisco? Could you raise your hands, please? Okay, great. Leave your hands up, please. Everyone else, look around. The people with the hands up now, all people. So I was in San Francisco along with a handful of people in the back of the room. And at the time, we were having our first summit sessions where we were actually talking about high availability. And at the time, I was very confused. So I ran a main conference talk session and I ran a design summit session in which we discussed what OpenStack needed in terms of high availability. And I thought this was gonna be relatively smooth sailing. So we're gonna, okay, so OpenStack didn't have any high availability features or not many at the time. So we're gonna discuss this and I'm gonna come up with a plan and wonderful. And here I was, very, very confused, because as it turned out, I was essentially having to argue the merits of high availability to a group of OpenStack developers who basically served the best and brightest cloud developers in the world. So we were actually arguing over whether or not OpenStack needed high availability in the first place. And there was sort of an opposing argument and the opposing argument was along the lines of, well, actually that's not the cloud infrastructure's job to do, rather it's the application developer's job to do. So they have to figure out how to make their application in such a way that they can essentially deal with the failure of any component of your cloud at any given time and everything should still be happy and wonderful. And then the discussion sort of emerged to the point where we got to, well, actually making the cloud infrastructure really, really highly available and resilient and whatnot is actually a real challenge. So basically the main opposing argument to why OpenStack needed high availability or not at the time was it's hard. And then I was a little like this. And as it turned out, luckily, the majority of the developers in the room were like this as well. I was like, oh my God, no, we can't argue about this. We actually need to make this happen because it's ultimately what people need and what people want. And I will mention that sort of the original discrepancy there was, there's two ways that you can look at a cloud platform or two motivations from which you can approach a cloud platform. One is you're essentially building the next Twitter or Etsy or whatever you would like. Any kind of application needs to scale massively and whatnot but over which you and this is important have full engineering control. That is to say that if you need to meet a different kind of availability requirements or different performance indicators or whatnot when you have an engineering team that can do this. The other approach that you can deploy a cloud platform from is essentially re-architecting your data center along cloud lines. And this is classically what we do in private cloud. And there the situation is quite different which is you have dozens or maybe even hundreds of applications that you need to maintain and you don't have full engineering control over this. Good luck rewriting SAP on your own if you figure out that it doesn't match your cloud platform. So we quickly realized that something had to be done here and thankfully things happened. Things actually did happen. So we are in a much, much improved state here now. Things have truly changed for the better in OpenStack. And so let's forward to today and who in here has children and occasionally takes them on road trips? I'm sure the old people that raised their hands earlier did. Okay, and there's more of those. So whenever you take your kids on a road trip like 20 minutes after the departure you get the question, are we there yet? And that is the thing that I want to ask that's the question that I want to pause it here for OpenStack High Available today. And actually it looks pretty good as of this point. So what do we need for building a highly available OpenStack cloud? Let's start with sort of the minimum viable OpenStack deployment that you could possibly consider. And by viable I mean that yes, of course, we all know we can deploy DevStack onto a single node and we can essentially have a one node OpenStack environment that's not really a viable cloud. The minimum simple viable cloud looks roughly kind of sort of like this, okay? You have an API node or endpoint node that your API requests that your clients use or that your clients issue come in on. And this API or endpoint node will typically also run your OpenStack dashboard. The incoming requests that we get then get handed off to what we call a controller node. And that controller node then decides, for example, scheduling and creates and hands off to, for example, Neutron to create your virtual networks and so forth. And then ultimately you're firing up a virtual machine and that virtual machine runs on one of your compute nodes, your hypervisor nodes. And in a minimum viable deployment that would be like a typically small number, it could be a single compute node, it could be two, three, four, something like that. And then you also have a separate network node and that's the thing that takes care of both your east-west routing, that is to say the routing between individual tenant networks inside your cloud and also the north-south routing which is the routing between your tenant networks and the outside world such as the internet. And the reason why I've colored all of this in different colors is that these node types have very, very different high availability requirements that we need to consider. So ideally what we wanna do, of course, is from this very, very simple minimum viable cloud, we want to sort of eliminate all the single points of failure and all the bottlenecks that we have in there. And yes, they are inherent in this kind of setup. So ideally what we want to do is we want to get from this to this. We typically want any number of API nodes, so we need more than one in order to achieve some degree of high availability and then we typically need more if we're actually bottlenecking sort of our API calls. We definitely need more than one control node because if we don't have a Nova scheduler, for example, then we're not gonna be able to schedule any new VMs. If we don't have a neutron server or a Cinder volume or a Cinder scheduler, for that matter, then we're not gonna be able to create networks or persistent volumes and so forth. Then we of course want as many workhorses as we can possibly get, so compute nodes, maybe dozens, maybe hundreds, who knows. And then also for the network nodes, we want them to be highly available which means we definitely need more than one. And we also, since they actually handle most of the North-South traffic and also a fair amount of the traffic that is inside our cloud, namely the one that goes from one virtual network to another virtual network, we also want those to be highly available because without them we don't have any network connectivity at all and because they're also, they tend to be sort of a scalability bottleneck, we also want several of them. So we want to get from a single, a minimum viable deployment to one that we can actually call highly available and that would be sort of a wonderful thing to have. Now, as I said, all of these have sort of different considerations that apply to them, these node types of different considerations that apply to them in terms of high availability. Now for the API nodes, it's actually relatively simple. We always need one, at least one, so we have to have more than one node that actually runs these services but the services themselves are actually inherently stateless. So the only thing that we really need to care about is we need to have multiple instances of these services and we need to make them available or accessible somewhere on an TCP-IP basis from the outside. That's essentially what we need to do and so therefore for the API nodes, it's a relatively simple challenge to make those highly available. For the controller nodes, things look a bit different. There are some services still that we have where we really only want one of them active at any given time and that doesn't necessarily mean that only one instance of, say for example, Cinder schedule or Nova scheduler can run. But like I said, for some of these services, good if only one of them is active at any given point, which means that for example, we redirect everyone to one of those instances that sit behind a load balancer, for example. Then we have our compute nodes. Of those, of course, we have many, many, many. They have their own availability requirements. Whether or not they are stateful or not depends largely on how we have configured our virtual machines and also how we've configured our block storage backends. And then we have our network nodes. Now these are really pretty tricky because we always want at least one of them because without one, none of our VMs, none of our guests actually have any outside connectivity and none of them also have any connectivity to any other VM in the same cloud on a same tenant network. But here we also want these basically to bear the brunt of the incoming and outgoing network traffic. So it actually happens fairly quickly as we scale a cloud that these hit their scalability and their throughput limits. So here we really want not only active passive high availability but also active active high availability. And then there's another set of nodes and that other set of nodes are the nodes that run services that are absolutely crucial and critical to OpenStack but are not part of OpenStack as in the OpenStack code base proper. And these we call infrastructure nodes, such as, for example, that would be the node that runs our relational database management system or the node that runs our message queuing server. And these again have different HA requirements depending on which service we're talking about regarding whether we only want a service to be available at any given time and not have to worry about replication or the service is actually stateful and we do have to worry about making the data available which usually means we need some form of replication. Okay, so what are the conventions and best practices that have emerged over these about two and a half years and actually there's a lot of work and very exciting work that has been happening here between the ISAs and Juno releases. What are these conventions, these best practices that have emerged for building highly available OpenStack clouds? And it's actually very encouraging to see a fair amount of convergence between vendors which is sort of a healthy departure of the wheel reinvention and NIH that we frequently see in well I was about to say other projects but in fact we're guilty of that ourselves as well. But in the HA world it's actually pretty nice to see this convergence. So let's talk about the infrastructure first. And I want to get started with the high availability that we build for our relational database management services. Now as you're all aware, pretty much every single OpenStack service in some shape or form puts some data into a relational database management back end. And there are several of those back ends that are supported. Most vendors have essentially standardized on MySQL as their relational database back end. And when we're talking about how to make this data highly available, then all major vendors with one exception have standardized on Galera which is a synchronous multi-master replication facility based on writes at replication for MySQL. And this is, of course, this is pretty challenging to get right in the sense that this is a service where we absolutely need at least one instance available because every other OpenStack service depends on a relational database. But this service, the database service is of course entirely stateful and we need to make sure that if we have several instances of those, then their data is in sync, their data is available to each other and the preferred way of doing that in the MySQL context for OpenStack happens to be Galera. For RabbitMQ high availability, and again, pretty much all the vendors have now standardized on RabbitMQ as their messaging bus, for RabbitMQ high availability, that's a little more complex in the sense that most OpenStack services always need an AAMQP service and basically need at least one RabbitMQ that they can talk to. But the messages that are being put on the message bus, they're essentially volatile and all of the OpenStack services are expected to resend them if they're not processed in time. So therefore, as long as you're not bottlenecking on your RabbitMQ itself, on your RabbitMQ throughput itself, it's generally fine to just have multiple RabbitMQ instances, one virtual IP that you flip around from one to another if necessary and then in fact, everyone just talks to one of them and you have a high availability manager that flips over the virtual IP which works really, really fast. Only if you're actually bottlenecking on the throughput of your message bus, then you should actually worry about the replication of your queues themselves. So again, then it becomes a little more complex but generally that's relatively straightforward as well. When we talk about API endpoint load balancing, so that is to say the load balancing of your RESTful, HTTP, HTTPS services that your OpenStack API is exposed, again, there's a very clear winner at least for the time being and that's HAProxy. Every vendor basically does this kind of load balancing with HAProxy. This wouldn't be the only available option that we have in the open source community, there's several others. There's currently no real reason to move off of HAProxy but I assume there will be at some point as OpenStack Clouds grow bigger and bigger because HAProxy is what it says on the tin, it's a proxy that means that requests actually go through it back and forth through this proxy and eventually what's gonna happen in really, really massive OpenStack Clouds is that HAProxy is gonna become a bottleneck and then there's other options that I'm guessing we'll see emerging. So for example, LDirectorD with LVS can do this in a slightly more efficient manner. You could arguably also push this off to the clients altogether by using host names and DNS round robin which then requires that you do some dynamic DNS zone management but for the time being it's HAProxy, it's what everyone uses and it's fine and it works. When we talk about HA service management, yes, it is absolutely coarsing and pacemaker that finally everyone has understood that they should be using. This stack is the default high availability stack in Linux and in Open Source and it has been for like a decade. It has one major issue and that major issue has never been stability or reliability or fencing or anything like that and a major issue with pacemaker's always been usability. It's been hard to get right. It's been hard to configure right in such a way that things will not break at the most inconvenient time. And this is something that all of the vendors have basically solved by completely automating this pacemaker configuration. In other words, you will only be interacting with this thing if you're a cloud operator by basically checking its status and you're never gonna need to configure it anymore. That was actually Freudian, sorry. But you're never gonna need to configure this actually manually, but the deployment facility will do that for you. And as far as cluster storage is concerned, we're seeing a very clear contender here. Pretty much every vendor under the sun, again with one exception, support Ceph as a storage backend. Of course, there's also support for legacy sand environments and things like that. But when it comes to distributed software defined storage that is inherently reliable, that's inherently highly available, Ceph is for most vendors and for most users, the way to go. Now I wanna talk about deployment automation and because this is the way that open stack of vendors typically and I'm not affiliated with either vendor so I can basically speak my mind freely here on all of these. Deployment automation is how vendors actually differentiate from each other. It's a major differentiator that vendors use to distinguish their open stack product from the open stack product of the competition. And that's sort of a clearly defined boundary there. And vendors basically prefer certain deployment automation facilities. Red Hat and Mirantis both prefer ultimately Puppet, although they use different management facilities to manage Puppet. Sousa prefers Chef through Crowbar, Ubuntu prefers Juju, but this is an important aspect to consider here for high availability. So what does the set aforementioned vendor support for open stack HA look like? And I wanna go through some of the major vendors here and in rehearsing this talk I found out that I only have time for four and I'll have to leave all the others aside, but I think those four are sort of the most important ones. And I'm doing this in completely random order. I'm certainly not doing it in order of quality or in order of my personal endorsement of it or anything like that. I'm just doing it in completely random order and I'm starting out with what we've already seen here in the previous talk, if you were in this room previously and that's the HA solution that we have in Mirantis OpenStack which is based on or which is deployed with their fuel deployment facility which in turn is based on Puppet or uses Puppet heavily. And I have a slide here that has pulled straight from the Mirantis documentation and I've put it up here because it's relatively indicative of what everyone else uses as well except that of course if you talk to people from these individual vendors they will obviously tell you well we're doing it better than someone else but generally speaking this is sort of the general arrangement that you have. You have more than one and this is just about just talking about control nodes because that's sort of where most of the action happens. So you've got control nodes, more than one, Mirantis supports two, there's other vendors that basically say you will have to deploy three or we're not taking you seriously. And on those controller nodes we have a coarsing pacemaker cluster that then manages the rest of the services. And some of these services that are being managed are for example the Neutron Agents, MySQL, Galera, the various API services also RabbitMQ and then another important service that's being managed is HA Proxy and there is a public and external IP address that's the E over here that is again in turn managed by pacemaker that HA Proxy can then bind to. So you've got all these back end services that are configured as back ends to HA Proxy and HA Proxy itself basically shifts the VIP around, shifts the virtual IP of HA Proxy around. And that's essentially it. So that's sort of at the core of what every other vendor is doing as well. So that's sort of the Mirantis approach and this is from the Mirantis 5.1 documentation if there's any Mirantis guys in the rooms and you want to chide me for using something that is newer then please point me to that and I'll be happy to update the slides. Next up I want to talk about Ubuntu. Now for Ubuntu the default or the preferred deployment facility is of course Juju and their bare metal deployment service is called MAS. And again this is a slide that's pulled directly from the Ubuntu documentation. So what you have here is you've got the admin or deployment nodes that run MAS and Juju and then you have a controller node of which you typically have three instances. So a three node cluster. This they call the Nova Cloud Controller that runs all these API services which can optionally also act as a Ceph Rados gateway and also runs the OpenStack dashboard. You have something that in their Juju charms names they still call the Quantum Gateway. For some reason they never renamed the charm but that's the thing that basically acts as your neutral network node. Then you've got RabbitMQ and MySQL and you can also use Juju obviously to deploy Ceph nodes and compute nodes. And what's kinda nice about Juju is the fact that it distinguishes between what it calls a service and a relation and the idea is that you can essentially deploy a service anywhere in the cluster and only when you define a relation between one service and another do they actually reconfigure themselves automatically to talk to each other. So for example you would deploy MySQL anywhere in your cluster, you would deploy RabbitMQ anywhere in your cluster and you would deploy Ceph somewhere in your cluster and then you only define relation and you also deploy Glance and then you only define a relation from Glance to all others and that's how they actually can talk to each other. What's not in this slide because I guess the Ubuntu product management is don't consider it super important is the fact that yes, this two uses Spacemaker, yes this two deploys and configures Spacemaker automatically and the way they do this is in my humble opinion it's kinda elegant which is they basically define that if a service has a relation to its own service type then that means it's a cluster. Simple as that. So that's actually, I think that's pretty cool. Next up, I wanna talk about SUSE Cloud. For those of you who were in the tutorial that myself and IntrepidCelest turned SUSE Engineer, Adam Spires did on Monday. This will not be news to you so SUSE Cloud is based on Crowbar and this again is a page from the or a slide from the SUSE documentation. By now you will see a pattern emerging here. Again, we have an admin node or a control node which in SUSE's case is running Crowbar, is running a Chef server, can potentially also act as a software mirror and runs a DHCP and TFTP service so you can pixie boot your other bare metal nodes. Then you've got a control node or ideally two which runs your database, your RabbitMQ message queue, your open stack APIs, your dashboard, your schedulers, your Keystone, your Glance and so forth. One thing that's somewhat special about SUSE is they're the only vendor that does not concentrate on MySQL as their database backend. Instead they favor Postgres and for some reason or another they're using DRBD for Postgres database replication which is a bit weird because it doesn't scale out across two nodes and so I'm hoping that they're eventually gonna add some Postgres 9.2 style synchronous replication to that. And then of course, further on you've got your compute nodes and you've got your storage nodes, compute nodes running Nova and storage nodes running Chef. And then finally, I think the team that was latest to the party but I'm not entirely sure. And that's what we find in Red Hat. And as you're all well aware, those of you who are Red Hat customers or Red Hat or CentOS users for Red Hat products there's always a kind of a cool and exciting name that all the developers use and then there's something that product marketing comes up with. So in Red Hat's case, there's this thing that's called StayPuft which is kind of cute. And then there's this other thing that's called the Red Hat Airprise League's OpenStack platform at Star. Sorry. So this thing is, so StayPuft is actually a form and plugin. So for those of you familiar with form and that's essentially a deployment automation facility that adds a nice little GUI and some orchestration to puppet. And StayPuft is a form and plugin that you can use to deploy Red Hat OpenStack or the Red Hat Enterprise Linux OpenStack platform. And this is sort of their documentation gets credit for basically having the most detailed overview of this whole thing. But again, you will sort of see a pattern emerging here although the admin node is actually out of the slide here. So you would have an admin and deployment server that runs form and with StayPuft. And then you've got your pacemaker managed clustered load balancer again using HA proxy. Red Hat uses a unique approach where they actually set a separate virtual IP address. They configure a separate cluster of virtual IP address for every single service that they're managing every single API service that they're managing. And that has the added benefit that those can essentially fail over and recover independently of all other services which is kind of neat. Then you've got again, pacemaker managed services in an active, active cluster configuration with your horizon, your glance, your Nova and this also uses Galera for active, active database clustering. This slide is a bit of a cop out because it doesn't say which Galera it is. But Red Hat seems to standardize on MariaDB cluster now, generally for all intents and purposes. It's gonna work exactly the same for you, no matter whether it's MySQL slash Galera or MariaDB cluster or Pacona XRDB cluster, they all do the same thing. They do right set replication in a master, master fashion for MySQL. And another thing that is not in RELL OSP at this time is support for SEF. There is some support for GloucesterFS, so you can achieve some of the same, for example, Nova backing and Cinderbacking facility with GloucesterFS. But as of this time, there is no support for SEF in there. Considering they just dumped $175 million on the company that makes it, I guess it's fair to assume that we're gonna see that relatively soon. And again, the Red Hat documentation also says that if you're not deploying three nodes at least, we're not taking you seriously, which is entirely reasonable in an HA configuration because then you can actually get decent quorum, you can implement fencing properly and so forth. This is important. I work with a lot of customers and I frequently get the question of, well, why can't we just forego everything the vendor does for us and just deploy open snack packages and configure them manually? You do not want to do that. Really, you don't want to do that. And I wanna show you something here. Let me do this here real quick. This is gonna take a few seconds to load provided the Wi-Fi is not failing us here. So what you're gonna see here, hopefully, in a little bit, is you're gonna see a puppet run of a configuration that is being applied to a highly available pacemaker control node. And it's quite possible that the Wi-Fi is basically saying, hell no, Florian, I'm not gonna allow you to do that. But I gave you the slides at the top of the talk, the QR code that you saw at the top of the talk is a direct link to the slides and you can totally replay this by yourself. Let me just reload this really quickly here and see if that helps. Cause if it does, then that would be great. And if it's not, then I'm gonna chalk that up to the Wi-Fi and I do apologize for that. But what you're gonna see here when you actually look at this sort of from the comfort of your home or office or hotel room or whatever you'd like, is this thing actually runs, it runs for about three minutes and it produces, I think about something like 7,000 lines of debug code. Now the debug code itself doesn't do a whole lot except basically it's puppet checking for things that are happening or checking for things that are configured or not configured. But the point is, if you're trying to do this manually, then you're gonna have to make all these checks manually. And it's about 50 different configuration options that are being set. And like I said, it takes about three minutes to run and this thing runs at double speed and the kicker is, that's a no op. So that's after having configured the whole thing and then just run puppet again. And it just not doing anything is just working for like three minutes. So that's, please, don't try this at home. Don't do this manually. Whatever you want to do, we generally recommend to people use a deployment facility that your vendor makes available to you. Just use that. If you want to build a heterogeneous cloud, meaning you want to not put all your eggs in one basket, which is entirely reasonable and you want to support more than one open stack vendor in your private or public cloud, which also is perfectly reasonable, then build separate open stack regions, which basically means that you have separate open stack clouds that are backed by a different set of technology and then federate them. And that's it. That's how you do that. Okay, so let's move on here, because oh, now you're arriving. There's Adam and I'm sure he didn't bring his sacrificial chicken, which totally explained, oh, you've got the chicken. Well, now it's too late. You know, this is not gonna work. Okay, and then again, we have another one of those. Let me try that, maybe that works. There we go, that's much better. And this is basically a CRM mod failover. This is on a Red Hat platform. You see all the different IP addresses here for the various services. What we're gonna see in a moment is one of those nodes failing, and then essentially automatic failover happening. So there, boom, that node is offline and has now been fenced. All the services failover and within about 30 seconds we're back in business. And that's it. So this is actually really nice and reliable HA configuration there, something that you can actually use. And here is the, there we go, hang on, that's it. And this is the same thing on SUSE, like I said, a slightly different approach where you have got the, where there's a single IP address, a single VIP for all the API services, and another one for the database, and another one for AMQP. Again, same thing, node failing, node is unclean now, is being fenced, and then we see failover, and that whole thing basically completes inside of 30 seconds and everything is fine and wonderful again. So this is something where we can actually talk about, yes, I call that HA, that's actually pretty cool. Okay, so not everything is Cinnamon Rolls and Sunshine in HA in OpenStack as yet. Admitted, you're surprised. So what are the open issues that we still have in OpenStack? So a classic pain point that we've had for a long time in OpenStack HA were the Neutron L3 agents. Because as I mentioned previously, a Neutron L3 agent basically maintains all of our virtual routers, and so it is responsible not only for upstream routing or north-south routing, as we call it, but also for east-west routing. And finally, if you don't have a working router, then if you're working with Neutron and the Nova metadata API proxy, then you're also not gonna be able to fire up new VMs because your cloud init is just gonna go nowhere. So we want this thing to be highly available, but we also don't want it to be a bottleneck because if we have only one of them, which basically means active passive high availability, then well, we can make sure that we always have L3 agents and their routers available, but they might still be overwhelmed with the traffic that's passing through them. So that doesn't work too well. So instead, what we want for the L3 agents is active active high availability. And there we got something back in Grizzly, which at the time was called the quantum scheduler and is now called the neutron scheduler. So what this does is it allows us to have multiple instances of the Neutron L3 agent prior to the advent of the Neutron scheduler. We could only ever have one. With the Neutron scheduler, we can have several. And then as we create a new virtual router, we can either have Neutron randomly assign it to one of the existing L3 agents, but we can also manually shift it to a different agent. The problem with that is this thing knows nothing about at least up until Icehouse knew nothing about whether that other agent that is also hosting routers is actually still alive. And the assignment of virtual routers to L3 agents was static. So what you would have to do if one of your network nodes went down, you would have to basically manually go in, have some administrator intervention, enumerate those virtual routers that were assigned to the agent that had just gone down and reassigned them, which is LA, low availability. Not something that you would typically want to have. The first way of, the first fix that we had for that was already available pre-Juno, actually came out of SUSE. It was a thing called NeutronHAtool.py, and what it did is it basically, when invoked, it would use the Neutron Python API to enumerate those routers and automatically switch them over to another. And that would then be invoked from the SUSE HA infrastructure. So from within the pacemaker cluster, with notifications, this would be enabled, and then we would shift those over, which worked, but was, I mean, this is functionality that kind of should be in OpenStack itself. So now in Juno, we have automatic agent rescheduling. This was already mentioned in the previous task. This is the configuration option that you need to set for it. Problem with that is, agent down detection isn't exactly fast in Neutron. So this means, because Neutron itself doesn't use anything like pacemaker or whatever, so it has to be much dumber about essentially doing agent down detection. So this can take a really, really long time, really, really long time, like it could still be well on the inside of one minute, but that's something that your users might still hate. And then what happens is the router is simply being kicked back to the, the virtual router is simply being kicked back to the Neutron scheduler and is being rescheduled on an agent that is still alive. So this is in Juno, it works, but it's slow. Then we have something that I, when I first read about it, I thought it was really, really cool and that is HA virtual routers. That is the ability to have more than one L3 agent and then just identifying a virtual router as highly available. The way this works is on the network nodes, inside the appropriate Q router namespaces, we effectively have KIPA LiveD running. And KIPA LiveD makes sure that the inside IP address, the inside gateway IP address of that virtual router is kept available with VRRP. And that's really kind of cool because now if one of your agent instances dies, then the other node can simply fail over using KIPA LiveD, VRRP and whatnot and you're not even losing a ping, so you can literally ping through the failure. The problem with that is that as of now at least, this does not replicate connection state. So including a contract D in this solution simply didn't make it into Juno. And so therefore while you're not gonna be losing a ping, you will have to re-initiate anything that's, for example, a TCP connection. And then we have another thing that's called distributed virtual routers or DVRs, also brand new in Juno, also marked experimental for this release and expected to be fully supported in Kila. And this is the neutron equivalent to Nova Network multi-host where you're actually running router instances on your compute nodes. This requires a change to your network topology because your compute nodes now need to be connected to your external network. It only, as of now, works for virtual machines that actually have floating IP addresses assigned. It doesn't work with a default SNAT. That still goes through the regular network node, the basically fallback L3 agent on a network node. And unfortunately, this is a bit of a downer right now. We can't combine for single virtual router DVR and HA. So sadly, at least as of Juno, you can basically pick and choose whether you want to do away with your single point of failure there or with your bottleneck, but you can't kill both at the same time. And we're hoping for that to improve for Kila. So that was all the opening issues in neutron. And then there's this other thing which I consider sort of the holy grail. I would love for OpenStack to be able to do Nova boot dash dash HA or Nova boot dash dash keep me running and don't bother me again. And there's been a lot of discussion about this topic over the last two years. It has repeatedly been rejected because people basically said, well, this shouldn't be in Nova. As of about three weeks ago, we have a really, really nice proposal from ex-Nova-PTL Russell Bryant which involves again using pacemaker and using pacemaker remote for virtual machine high availability. There was a design summit session on this on Monday and I'm really, really hoping that we're gonna see that in either Kila or a subsequent release. You can already build highly available virtual machines using kind of clutches and crutches, but it's not very elegant. But this will sort of be the last thing that we are still waiting for. I am almost out of time. I will let you know that if you want and the almost there was not about me being almost out of time, but the almost there was about, we're almost there in terms of high availability. Recall what I said about kids in road trips earlier and recall how often you have told your kids we're almost there and under what circumstances. But it's looking a lot better, a lot, lot, lot better than it did two and a half years ago. You can build, you can already build a highly available open stack cloud with there's some things that you should be aware of and please don't try to do it manually. Before I get to your questions, two things. One, if you would like to reuse these slides for whatever you'd like. All of my slides are under CC by SA. This is the link and feel free to share those, copy those, adapt those and remix those. And finally, one shameless plug. My company, Hasexo actually does open stack training among many other things and this will get you to a link on our website that will let you know how to score a 15% discount on the trainings that we have from now until the end of the year. If of course you're already here so you already don't need that anymore because you're all experts but maybe you have a friend back home at the office that couldn't come, let them know and we'll be happy to help them out. And with that, I think I'm actually anomaly over time but I guess I can take one or two questions and then I'll take all other questions outside in the hallway. Or is everyone already rushing to get to the next session? That's perfectly fine too. If that's the case, thank you very much for your attention. Enjoy the rest of the conference. Have a great week. Thank you.